All Products
Search
Document Center

API Gateway:Gateway Types

Last Updated:Mar 24, 2026

AI Gateway offers two versions: Serverless and Dedicated Instance. This topic describes the capabilities, parameters, quotas, and limits of each version to help you choose the right version and specifications.

Version Comparison

  • Serverless: Supports automatic elastic scaling. You do not need to manage underlying resources because the service is fully managed. Billing is based on the number of calls, which helps you integrate quickly and start at a low cost.

  • Dedicated Instance: Provides a dedicated resource instance. It supports advanced features such as extensions, hardware acceleration, and WAF integration. It delivers high security and a higher Service-Level Agreement (SLA). This version meets enterprise requirements for stability, security, and scalability.

Category

Feature

Serverless

Dedicated Instance

Model Proxy

Text-to-text

Supported

Supported

Multi-modal

Supported

Supported

Built-in policies

Supported

Supported

MCP Server

MCP proxy

Supported

Supported

HTTP to MCP

Supported

Supported

Agent

Model Studio

Support

Supported

Dify

Supported

Supported

Custom

Supported

Supported

Extensions

System extensions

Supported

Supported

Extension marketplace

Not supported

Supported

Custom extensions

Not supported

Supported

Specifications

Capacity specifications

Automatic elastic scaling

Multiple capacity specifications, based on queries per second (QPS) and client connections

Hardware acceleration

TLS hardware acceleration

Not supported

Supported

QAT hardware compression and decompression

Not supported

Supported

Security

WAF integration

Not supported

Supported

Observability

Monitoring and alerting

Business metrics only

Note

Serverless design. Alibaba Cloud manages the underlying system. You do not handle system-level O&M.

  • Business metrics

  • System resources

  • Custom configurations

Access Point

Fixed EIP

Non-fixed EIP. Uses shared endpoints.

Supports fixed EIP and dedicated endpoints.

Inbound bandwidth

Shared bandwidth across multiple instances. Maximum bandwidth per gateway instance is 400 Mbps.

Dedicated bandwidth. Default maximum bandwidth per gateway instance is 4 Gbps. Supports dynamic adjustment.

O&M

Change configuration

Serverless design. Performance scales automatically. You do not configure the service.

Change configuration as needed. You cannot downgrade to the Serverless version.

Stability assurance

SLA

99.9%

99.99%

Dependent middleware

Shared. Logically isolated.

Dedicated. Physically isolated.

Version updates

Automatic

Manual

Capacity Specifications

AI Gateway dedicated instances are available in various capacity specifications. These specifications differ in performance metrics such as queries per second (QPS) and the number of client connections.

The following table lists the parameters for each gateway instance specification.

Instance type

QPS

Client connections

aigw.small.x1

1500

20000

aigw.small.x2

3000

40000

aigw.small.x4

6000

80000

aigw.medium.x1

12000

160000

aigw.medium.x2

24000

320000

aigw.medium.x3

36000

480000

aigw.large.x1

48000

640000

aigw.large.x2

96000

1280000

aigw.large.x3

144000

1920000

aigw.large.x4

192000

2560000

Quota Information

Global Quotas

Quota item

Quota

Instances per region

100

Instance Quotas

Quota dimension

Serverless

Small

Medium

Large

MCP Server count

100

500

1000

2000

Total online routes (including Model API routes and Agent API routes)

2500

5000

7500

10000

Published domain names

100

200

500

1000

Associated services

200

800

2000

4000

Service nodes

400

1600

4000

8000

Kubernetes service sources

3

3

5

5

Installed extensions

NA

10

20

30

Uploaded custom extensions

NA

20

50

80

Consumer authorization count

500

2000

6000

10000