AI Gateway editions Serverless vs Dedicated Instance - API Gateway

AI Gateway offers two editions: Dedicated Instance and Serverless. Compare their capabilities, quotas, and capacity specifications to choose the right edition for your workload.

Edition comparison

Serverless: Auto-scales elastically with no resource management. Pay-as-you-go billing enables fast integration at low cost.
Dedicated Instance: Independently deployed with advanced capabilities including plugin extensibility, hardware acceleration, and WAF integration. Delivers a higher SLA for enterprises that require greater stability and security.

Category	Feature	Serverless	Dedicated Instance
Model proxy	Text-to-text	Supported	Supported
	Multimodal	Supported	Supported
	Built-in policies	Supported	Supported
MCP Server	MCP proxy	Supported	Supported
MCP Server	HTTP to MCP	Supported	Supported
Agent proxy	Model Studio	Supported	Supported
	Dify	Supported	Supported
	Custom	Supported	Supported
Plugins	System plugin	Supported	Supported
	Plugin marketplace	Not supported	Supported
	Custom plugins	Not supported	Supported
Specifications	Capacity specifications	Automatic scaling	Different capacity specifications are available based on queries per second (QPS) and client connections.
Hardware acceleration	TLS hardware acceleration	Not supported	Supported
Hardware acceleration	QAT hardware compression and decompression	Not supported	Supported
Security	WAF integration	Not supported	Supported
Observability	Monitoring and alerting	Business metrics only Note The underlying system is fully managed, so system-level monitoring is not required.	Business metrics System resources Custom configurations
Endpoints	Fixed EIP	Not supported. Uses a shared endpoint.	Supported. Uses dedicated endpoints.
Endpoints	Inbound bandwidth	Shared bandwidth, up to 400 Mbps per instance.	Dedicated bandwidth. Default limit: 4 Gbps per instance, dynamically adjustable.
O&M	Configuration changes	Scales automatically. No manual configuration management required.	Configurations can be changed as needed. Cannot be downgraded to the Serverless edition.
Stability guarantee	SLA	99.9%	99.99%
	Dependent middleware	Shared, logically isolated	Dedicated, physical isolation
	Version updates	Automatic	Manual

Capacity specifications

Dedicated instances offer multiple instance types differentiated by QPS and client connection capacity.

Instance type	QPS	Client connections
aigw.small.x1	1500	20000
aigw.small.x2	3000	40000
aigw.small.x4	6000	80000
aigw.medium.x1	12000	160000
aigw.medium.x2	24000	320000
aigw.medium.x3	36000	480000
aigw.large.x1	48000	640000
aigw.large.x2	96000	1280000
aigw.large.x3	144000	1920000
aigw.large.x4	192000	2560000

Quotas

Global quotas

Quota item	Quota
Instances per region	100

Instance quotas

Quota dimension	Serverless	Small	Medium	Large
Number of MCP Servers	100	500	1000	2000
Total active routes (including Model API and Agent API routes)	2500	5000	7500	10000
Number of published domain names	100	200	500	1000
Number of associated services	200	800	2000	4000
Number of service nodes	400	1600	4000	8000
Number of Kubernetes service sources	3	3	5	5
Number of installed plugins	N/A	10	20	30
Number of uploaded custom plugins	N/A	20	50	80
Number of authorized consumers	500	2000	6000	10000