All Products
Search
Document Center

API Gateway:Gateway types

Last Updated:May 27, 2026

AI Gateway offers two editions: Dedicated Instance and Serverless. Compare their capabilities, quotas, and capacity specifications to choose the right edition for your workload.

Edition comparison

  • Serverless: Auto-scales elastically with no resource management. Pay-as-you-go billing enables fast integration at low cost.

  • Dedicated Instance: Independently deployed with advanced capabilities including plugin extensibility, hardware acceleration, and WAF integration. Delivers a higher SLA for enterprises that require greater stability and security.

Category

Feature

Serverless

Dedicated Instance

Model proxy

Text-to-text

Supported

Supported

Multimodal

Supported

Supported

Built-in policies

Supported

Supported

MCP Server

MCP proxy

Supported

Supported

HTTP to MCP

Supported

Supported

Agent proxy

Model Studio

Supported

Supported

Dify

Supported

Supported

Custom

Supported

Supported

Plugins

System plugin

Supported

Supported

Plugin marketplace

Not supported

Supported

Custom plugins

Not supported

Supported

Specifications

Capacity specifications

Automatic scaling

Different capacity specifications are available based on queries per second (QPS) and client connections.

Hardware acceleration

TLS hardware acceleration

Not supported

Supported

QAT hardware compression and decompression

Not supported

Supported

Security

WAF integration

Not supported

Supported

Observability

Monitoring and alerting

Business metrics only

Note

The underlying system is fully managed, so system-level monitoring is not required.

  • Business metrics

  • System resources

  • Custom configurations

Endpoints

Fixed EIP

Not supported. Uses a shared endpoint.

Supported. Uses dedicated endpoints.

Inbound bandwidth

Shared bandwidth, up to 400 Mbps per instance.

Dedicated bandwidth. Default limit: 4 Gbps per instance, dynamically adjustable.

O&M

Configuration changes

Scales automatically. No manual configuration management required.

Configurations can be changed as needed. Cannot be downgraded to the Serverless edition.

Stability guarantee

SLA

99.9%

99.99%

Dependent middleware

Shared, logically isolated

Dedicated, physical isolation

Version updates

Automatic

Manual

Capacity specifications

Dedicated instances offer multiple instance types differentiated by QPS and client connection capacity.

Instance type

QPS

Client connections

aigw.small.x1

1500

20000

aigw.small.x2

3000

40000

aigw.small.x4

6000

80000

aigw.medium.x1

12000

160000

aigw.medium.x2

24000

320000

aigw.medium.x3

36000

480000

aigw.large.x1

48000

640000

aigw.large.x2

96000

1280000

aigw.large.x3

144000

1920000

aigw.large.x4

192000

2560000

Quotas

Global quotas

Quota item

Quota

Instances per region

100

Instance quotas

Quota dimension

Serverless

Small

Medium

Large

Number of MCP Servers

100

500

1000

2000

Total active routes (including Model API and Agent API routes)

2500

5000

7500

10000

Number of published domain names

100

200

500

1000

Number of associated services

200

800

2000

4000

Number of service nodes

400

1600

4000

8000

Number of Kubernetes service sources

3

3

5

5

Number of installed plugins

N/A

10

20

30

Number of uploaded custom plugins

N/A

20

50

80

Number of authorized consumers

500

2000

6000

10000