AI Gateway offers two versions: Serverless and Dedicated Instance. This topic describes the capabilities, parameters, quotas, and limits of each version to help you choose the right version and specifications.
Version Comparison
-
Serverless: Supports automatic elastic scaling. You do not need to manage underlying resources because the service is fully managed. Billing is based on the number of calls, which helps you integrate quickly and start at a low cost.
-
Dedicated Instance: Provides a dedicated resource instance. It supports advanced features such as extensions, hardware acceleration, and WAF integration. It delivers high security and a higher Service-Level Agreement (SLA). This version meets enterprise requirements for stability, security, and scalability.
|
Category |
Feature |
Serverless |
Dedicated Instance |
|
Model Proxy |
Text-to-text |
Supported |
Supported |
|
Multi-modal |
Supported |
Supported |
|
|
Built-in policies |
Supported |
Supported |
|
|
MCP Server |
MCP proxy |
Supported |
Supported |
|
HTTP to MCP |
Supported |
Supported |
|
|
Agent |
Model Studio |
Support |
Supported |
|
Dify |
Supported |
Supported |
|
|
Custom |
Supported |
Supported |
|
|
Extensions |
System extensions |
Supported |
Supported |
|
Extension marketplace |
Not supported |
Supported |
|
|
Custom extensions |
Not supported |
Supported |
|
|
Specifications |
Capacity specifications |
Automatic elastic scaling |
Multiple capacity specifications, based on queries per second (QPS) and client connections |
|
Hardware acceleration |
TLS hardware acceleration |
Not supported |
Supported |
|
QAT hardware compression and decompression |
Not supported |
Supported |
|
|
Security |
WAF integration |
Not supported |
Supported |
|
Observability |
Monitoring and alerting |
Business metrics only Note
Serverless design. Alibaba Cloud manages the underlying system. You do not handle system-level O&M. |
|
|
Access Point |
Fixed EIP |
Non-fixed EIP. Uses shared endpoints. |
Supports fixed EIP and dedicated endpoints. |
|
Inbound bandwidth |
Shared bandwidth across multiple instances. Maximum bandwidth per gateway instance is 400 Mbps. |
Dedicated bandwidth. Default maximum bandwidth per gateway instance is 4 Gbps. Supports dynamic adjustment. |
|
|
O&M |
Change configuration |
Serverless design. Performance scales automatically. You do not configure the service. |
Change configuration as needed. You cannot downgrade to the Serverless version. |
|
Stability assurance |
SLA |
99.9% |
99.99% |
|
Dependent middleware |
Shared. Logically isolated. |
Dedicated. Physically isolated. |
|
|
Version updates |
Automatic |
Manual |
Capacity Specifications
AI Gateway dedicated instances are available in various capacity specifications. These specifications differ in performance metrics such as queries per second (QPS) and the number of client connections.
The following table lists the parameters for each gateway instance specification.
|
Instance type |
QPS |
Client connections |
|
aigw.small.x1 |
1500 |
20000 |
|
aigw.small.x2 |
3000 |
40000 |
|
aigw.small.x4 |
6000 |
80000 |
|
aigw.medium.x1 |
12000 |
160000 |
|
aigw.medium.x2 |
24000 |
320000 |
|
aigw.medium.x3 |
36000 |
480000 |
|
aigw.large.x1 |
48000 |
640000 |
|
aigw.large.x2 |
96000 |
1280000 |
|
aigw.large.x3 |
144000 |
1920000 |
|
aigw.large.x4 |
192000 |
2560000 |
Quota Information
Global Quotas
|
Quota item |
Quota |
|
Instances per region |
100 |
Instance Quotas
|
Quota dimension |
Serverless |
Small |
Medium |
Large |
|
MCP Server count |
100 |
500 |
1000 |
2000 |
|
Total online routes (including Model API routes and Agent API routes) |
2500 |
5000 |
7500 |
10000 |
|
Published domain names |
100 |
200 |
500 |
1000 |
|
Associated services |
200 |
800 |
2000 |
4000 |
|
Service nodes |
400 |
1600 |
4000 |
8000 |
|
Kubernetes service sources |
3 |
3 |
5 |
5 |
|
Installed extensions |
NA |
10 |
20 |
30 |
|
Uploaded custom extensions |
NA |
20 |
50 |
80 |
|
Consumer authorization count |
500 |
2000 |
6000 |
10000 |