All Products
Search
Document Center

API Gateway:Gateway types

Last Updated:Dec 16, 2025

AI Gateway is available in two editions: Dedicated Instance and Serverless. This topic describes the features, parameters, quotas, and limits of each edition to help you choose a suitable edition and instance type.

Edition comparison

  • Serverless: This fully managed edition supports automatic elastic scaling, so you do not need to manage underlying resources. It is billed based on the number of calls, which allows for quick integration and a low-cost start.

  • Dedicated Instance: This edition provides dedicated and independently deployed resource instances. It supports advanced features such as plugin extensions, hardware acceleration, and WAF integration. It ensures high security and provides a higher Service-Level Agreement (SLA). This edition is ideal for enterprises that require high stability, security, and scalability.

Category

Feature

Serverless

Dedicated Instance

Model proxy

Text-to-text

Supported

Supported

Multimodal

Support

Support

Built-in policies

Support

Supported

MCP Server

MCP proxy

Support

Supported

HTTP to MCP

Supported

Supported

Agent proxy

Model Studio

Supported

Support

Dify

Supported

Supported

Custom

Supported

Supported

Plugins

System plugins

Supported

Supported

Plugin marketplace

Not supported

Supported

Custom plugins

Not supported

Supported

Specifications

Capacity specifications

Automatic scaling

Different capacity specifications are available based on queries per second (QPS) and the number of client connections.

Hardware acceleration

TLS hardware acceleration

Not supported

Supported

QAT hardware compression and decompression

Not supported

Supported

Security

WAF integration

Not supported

Supported

Observability

Monitoring and alerting

Business metrics only

Note

The Serverless edition is designed to host the underlying system. You do not need to manage system-level O&M.

  • Business metrics

  • System resources

  • Custom configurations

Endpoints

Fixed EIP

Uses shared endpoints with non-fixed elastic IP addresses (EIPs).

Supports dedicated endpoints with fixed EIPs.

Inbound bandwidth

Shared bandwidth across multiple instances. A single gateway instance has a limit of 400 Mbps.

Dedicated bandwidth. A single gateway instance has a default bandwidth limit of 4 Gbps, which can be dynamically adjusted.

O&M

Configuration changes

The Serverless edition is designed for automatic performance scaling. You do not need to manage service configurations.

Configurations can be changed as needed. You cannot downgrade to the Serverless edition.

Stability guarantee

SLA

99.9%

99.99%

Dependent middleware

Shared and logically isolated

Dedicated and physically isolated

Version updates

Automatic

Manual

Capacity specifications

For Dedicated Instance AI Gateway instances, different instance types are available that differ in performance based on queries per second (QPS) and the number of client connections.

The following table lists the parameters for different gateway instance types.

Instance type

QPS

Client connections

aigw.small.x1

1500

20000

aigw.small.x2

3000

40000

aigw.small.x4

6000

80000

aigw.medium.x1

12000

160000

aigw.medium.x2

24000

320000

aigw.medium.x3

36000

480000

aigw.large.x1

48000

640000

aigw.large.x2

96000

1280000

aigw.large.x3

144000

1920000

aigw.large.x4

192000

2560000

Quota description

Quota dimension

Serverless instance

Dedicated instance

Default quota

Maximum quota

Default quota

Maximum quota

Number of instances in the same region

50

100

100

500

Total number of Model APIs per instance

50

100

100

500

Total number of routes per instance

100

200

small: 200

medium & large: 500

small: 1000

medium & large: 2000

Total number of MCP Servers per instance

50

100

small: 100

medium & large: 200

small: 500

medium & large: 1000

Total number of Tools per MCP Server

50

100

100

1000

Total number of Agent APIs per instance

50

100

100

500

Number of consumers

20

50

small: 50

medium & large: 200

small: 100

medium & large: 500

Number of associated domain names per instance

20

50

small: 50

medium & large: 200

small: 100

medium & large: 500

Number of associated services per instance

50

100

small: 200

medium & large: 500

small: 1000

medium & large: 2000

Number of plugins installed on a single instance

N/A

N/A

small: 5

medium & large: 10

small: 10

medium & large: 20

Number of uploaded custom plugins

N/A

N/A

20

50