AI Gateway is available in two editions: Dedicated Instance and Serverless. This topic describes the features, parameters, quotas, and limits of each edition to help you choose a suitable edition and instance type.
Edition comparison
Serverless: This fully managed edition supports automatic elastic scaling, so you do not need to manage underlying resources. It is billed based on the number of calls, which allows for quick integration and a low-cost start.
Dedicated Instance: This edition provides dedicated and independently deployed resource instances. It supports advanced features such as plugin extensions, hardware acceleration, and WAF integration. It ensures high security and provides a higher Service-Level Agreement (SLA). This edition is ideal for enterprises that require high stability, security, and scalability.
Category | Feature | Serverless | Dedicated Instance |
Model proxy | Text-to-text | Supported | Supported |
Multimodal | Support | Support | |
Built-in policies | Support | Supported | |
MCP Server | MCP proxy | Support | Supported |
HTTP to MCP | Supported | Supported | |
Agent proxy | Model Studio | Supported | Support |
Dify | Supported | Supported | |
Custom | Supported | Supported | |
Plugins | System plugins | Supported | Supported |
Plugin marketplace | Not supported | Supported | |
Custom plugins | Not supported | Supported | |
Specifications | Capacity specifications | Automatic scaling | Different capacity specifications are available based on queries per second (QPS) and the number of client connections. |
Hardware acceleration | TLS hardware acceleration | Not supported | Supported |
QAT hardware compression and decompression | Not supported | Supported | |
Security | WAF integration | Not supported | Supported |
Observability | Monitoring and alerting | Business metrics only Note The Serverless edition is designed to host the underlying system. You do not need to manage system-level O&M. |
|
Endpoints | Fixed EIP | Uses shared endpoints with non-fixed elastic IP addresses (EIPs). | Supports dedicated endpoints with fixed EIPs. |
Inbound bandwidth | Shared bandwidth across multiple instances. A single gateway instance has a limit of 400 Mbps. | Dedicated bandwidth. A single gateway instance has a default bandwidth limit of 4 Gbps, which can be dynamically adjusted. | |
O&M | Configuration changes | The Serverless edition is designed for automatic performance scaling. You do not need to manage service configurations. | Configurations can be changed as needed. You cannot downgrade to the Serverless edition. |
Stability guarantee | SLA | 99.9% | 99.99% |
Dependent middleware | Shared and logically isolated | Dedicated and physically isolated | |
Version updates | Automatic | Manual |
Capacity specifications
For Dedicated Instance AI Gateway instances, different instance types are available that differ in performance based on queries per second (QPS) and the number of client connections.
The following table lists the parameters for different gateway instance types.
Instance type | QPS | Client connections |
aigw.small.x1 | 1500 | 20000 |
aigw.small.x2 | 3000 | 40000 |
aigw.small.x4 | 6000 | 80000 |
aigw.medium.x1 | 12000 | 160000 |
aigw.medium.x2 | 24000 | 320000 |
aigw.medium.x3 | 36000 | 480000 |
aigw.large.x1 | 48000 | 640000 |
aigw.large.x2 | 96000 | 1280000 |
aigw.large.x3 | 144000 | 1920000 |
aigw.large.x4 | 192000 | 2560000 |
Quota description
Quota dimension | Serverless instance | Dedicated instance | ||
Default quota | Maximum quota | Default quota | Maximum quota | |
Number of instances in the same region | 50 | 100 | 100 | 500 |
Total number of Model APIs per instance | 50 | 100 | 100 | 500 |
Total number of routes per instance | 100 | 200 | small: 200 medium & large: 500 | small: 1000 medium & large: 2000 |
Total number of MCP Servers per instance | 50 | 100 | small: 100 medium & large: 200 | small: 500 medium & large: 1000 |
Total number of Tools per MCP Server | 50 | 100 | 100 | 1000 |
Total number of Agent APIs per instance | 50 | 100 | 100 | 500 |
Number of consumers | 20 | 50 | small: 50 medium & large: 200 | small: 100 medium & large: 500 |
Number of associated domain names per instance | 20 | 50 | small: 50 medium & large: 200 | small: 100 medium & large: 500 |
Number of associated services per instance | 50 | 100 | small: 200 medium & large: 500 | small: 1000 medium & large: 2000 |
Number of plugins installed on a single instance | N/A | N/A | small: 5 medium & large: 10 | small: 10 medium & large: 20 |
Number of uploaded custom plugins | N/A | N/A | 20 | 50 |