AI Gateway offers two editions: Dedicated Instance and Serverless. Compare their capabilities, quotas, and capacity specifications to choose the right edition for your workload.
Edition comparison
-
Serverless: Auto-scales elastically with no resource management. Pay-as-you-go billing enables fast integration at low cost.
-
Dedicated Instance: Independently deployed with advanced capabilities including plugin extensibility, hardware acceleration, and WAF integration. Delivers a higher SLA for enterprises that require greater stability and security.
|
Category |
Feature |
Serverless |
Dedicated Instance |
|
Model proxy |
Text-to-text |
Supported |
Supported |
|
Multimodal |
Supported |
Supported |
|
|
Built-in policies |
Supported |
Supported |
|
|
MCP Server |
MCP proxy |
Supported |
Supported |
|
HTTP to MCP |
Supported |
Supported |
|
|
Agent proxy |
Model Studio |
Supported |
Supported |
|
Dify |
Supported |
Supported |
|
|
Custom |
Supported |
Supported |
|
|
Plugins |
System plugin |
Supported |
Supported |
|
Plugin marketplace |
Not supported |
Supported |
|
|
Custom plugins |
Not supported |
Supported |
|
|
Specifications |
Capacity specifications |
Automatic scaling |
Different capacity specifications are available based on queries per second (QPS) and client connections. |
|
Hardware acceleration |
TLS hardware acceleration |
Not supported |
Supported |
|
QAT hardware compression and decompression |
Not supported |
Supported |
|
|
Security |
WAF integration |
Not supported |
Supported |
|
Observability |
Monitoring and alerting |
Business metrics only Note
The underlying system is fully managed, so system-level monitoring is not required. |
|
|
Endpoints |
Fixed EIP |
Not supported. Uses a shared endpoint. |
Supported. Uses dedicated endpoints. |
|
Inbound bandwidth |
Shared bandwidth, up to 400 Mbps per instance. |
Dedicated bandwidth. Default limit: 4 Gbps per instance, dynamically adjustable. |
|
|
O&M |
Configuration changes |
Scales automatically. No manual configuration management required. |
Configurations can be changed as needed. Cannot be downgraded to the Serverless edition. |
|
Stability guarantee |
SLA |
99.9% |
99.99% |
|
Dependent middleware |
Shared, logically isolated |
Dedicated, physical isolation |
|
|
Version updates |
Automatic |
Manual |
Capacity specifications
Dedicated instances offer multiple instance types differentiated by QPS and client connection capacity.
|
Instance type |
QPS |
Client connections |
|
aigw.small.x1 |
1500 |
20000 |
|
aigw.small.x2 |
3000 |
40000 |
|
aigw.small.x4 |
6000 |
80000 |
|
aigw.medium.x1 |
12000 |
160000 |
|
aigw.medium.x2 |
24000 |
320000 |
|
aigw.medium.x3 |
36000 |
480000 |
|
aigw.large.x1 |
48000 |
640000 |
|
aigw.large.x2 |
96000 |
1280000 |
|
aigw.large.x3 |
144000 |
1920000 |
|
aigw.large.x4 |
192000 |
2560000 |
Quotas
Global quotas
|
Quota item |
Quota |
|
Instances per region |
100 |
Instance quotas
|
Quota dimension |
Serverless |
Small |
Medium |
Large |
|
Number of MCP Servers |
100 |
500 |
1000 |
2000 |
|
Total active routes (including Model API and Agent API routes) |
2500 |
5000 |
7500 |
10000 |
|
Number of published domain names |
100 |
200 |
500 |
1000 |
|
Number of associated services |
200 |
800 |
2000 |
4000 |
|
Number of service nodes |
400 |
1600 |
4000 |
8000 |
|
Number of Kubernetes service sources |
3 |
3 |
5 |
5 |
|
Number of installed plugins |
N/A |
10 |
20 |
30 |
|
Number of uploaded custom plugins |
N/A |
20 |
50 |
80 |
|
Number of authorized consumers |
500 |
2000 |
6000 |
10000 |