AI Gateway connects AI applications with model services, tools, and agents through a unified gateway that provides protocol conversion, security, traffic governance, and observability.
Overview
As large language models (LLMs) expand AI use cases, application architecture is evolving from microservices and cloud-native to AI-native. This shift creates challenges in integration, stability, security, compliance, and management.
Cloud-native API Gateway provides AI Gateway as the central hub connecting AI applications, model services, tools, and agents. It delivers protocol conversion, security protection, traffic governance, and unified observability for AI-native applications.
AI integration challenges
Unlike traditional applications, AI applications are model-centric — they rely on inference, prompts, tool calling, and memory to serve business needs.
AI application traffic falls into three categories:
-
Accessing model services: AI applications depend on models for inference and planning. Securing and stabilizing this access path is critical.
-
Calling external tools: Tools bridge AI applications and external systems, typically through protocols such as Model Context Protocol (MCP).
-
Being accessed externally: End users or other AI applications access your services, often through protocols such as A2A.
Each scenario presents distinct challenges:
Model access challenges
-
Multiple models: Providers differ in API specifications, authentication, and calling methods. No standard abstraction layer exists for unified integration or parallel calls across providers.
-
Multiple modalities: Multi-modal models lack a unified standard. They vary in transport protocols (SSE, WebSocket, Web Real-Time Communication (WebRTC)), communication modes (synchronous or asynchronous), and request-response structures, which increases integration complexity.
-
Multiple scenarios: Different scenarios require different throttling, fault tolerance, and quality of service. For example, real-time speech-to-text demands low RT, while long-text understanding demands processing stability.
-
High security requirements: Calling external or open-source models risks data breaches. Sensitive data transmission must meet compliance requirements for privacy protection, audit trails, and access control.
-
High stability requirements: Model services have low throttling thresholds and less stable RT and success rates compared to traditional APIs. These fluctuations affect upstream AI application continuity and user experience.
Tool access challenges
Tool access requires balancing efficiency and security.
As available tools grow, sending large tool lists to an LLM increases token consumption and inference costs. Too many candidates also degrade model selection accuracy.
Tools are often linked to core business logic, so improper calls expand the security risk surface. Threats such as malicious MCP poisoning demand robust tool access security.
Agent access challenges
Developers build AI applications in three main ways:
-
High-code development: Use frameworks such as Spring AI Alibaba, ADK, and LangChain. Offers maximum flexibility but requires deeper technical skills.
-
Low-code development: Use platforms such as Alibaba Cloud Model Studio to orchestrate flows with a drag-and-drop interface. Suitable for rapid prototyping and iteration.
-
No-code development: Use tools such as JManus to build AI applications through prompt configuration alone. Suitable for simple scenarios.
Without a unified standard for connecting AI applications, centralized governance and control — as in cloud-native architectures — is difficult to achieve.
AI application behavior depends on the underlying LLM, making output stability uncertain. Without isolation and fault tolerance, a single failure can cascade across dependent business systems.
How AI Gateway addresses each scenario
AI Gateway bridges AI applications, model services, tools, and agents. The following scenarios show how it addresses each challenge.
Model access
A business deploys a fine-tuned model on Platform for AI (PAI), integrates Model Studio as a fallback, and uses an open-source model on Function Compute for image generation. AI Gateway provides a unified model access entry point with traffic governance and authentication at the API layer.
AI Gateway addresses these challenges:
-
Multiple models: AI Gateway routes requests by model name, request ratio, or request features such as headers. It converts provider-specific protocols into OpenAI-compatible interfaces, enabling seamless switching across models.
-
Multiple modalities: AI Gateway proxies multi-modal model calls over HTTP and WebSocket through a unified endpoint. Applications call text-to-text, text-to-image, and speech recognition models consistently. Plugins can enhance security and stability.
-
Multiple scenarios: Create a separate model API per scenario (text generation, image generation, speech recognition). Assign each caller a unique consumer identity for per-consumer observability, throttling, security, and billing.
-
High security requirements: AI Gateway provides layered protection across network, data, and content security.
-
Network security: Integrates SSL certificates, WAF, and IP blacklists and whitelists to defend against malicious traffic at the entry layer.
-
Data security: Authenticates consumers to avoid direct API key exposure. Manages backend model service keys and supports hosting keys in KMS to keep sensitive data off the gateway.
-
Content security: Integrates AI security guardrails for real-time interception of non-compliant content and risky inputs. A data masking plugin removes sensitive information before forwarding requests.
-
-
High stability requirements: AI Gateway improves stability through observability and controllability.
-
Observability: Records source provider, target model, consumer, time to first byte, and token count per request. Marks events such as throttling, interception, and fallback. A built-in dashboard provides end-to-end visualization.
-
Controllability: Provides load balancing, fallback, throttling, and caching. Set per-consumer governance rules such as token limits and concurrency controls. Use monitoring data to optimize policies and adjust resources dynamically.
-
Tool access
A business faces security risks in tool access and needs unified governance. The architecture team selects MCP as the standard protocol and uses AI Gateway to automatically convert existing HTTP APIs into MCP Servers.
AI Gateway ensures tool call accuracy and security:
-
Accuracy:
AI Gateway connects to existing HTTP services and hosts MCP servers. You can dynamically update tool descriptions and create virtual MCP servers that combine tool lists for different scenarios. Providers and consumers can define their own MCP servers independently. Intelligent tool routing filters relevant tools based on request content, reducing token consumption and improving selection accuracy.
-
Security: AI Gateway provides multilayer tool access control — authentication at the MCP server level and fine-grained permissions for individual tools. Access permissions are assigned based on caller identity and tool risk level.
Agent access
As AI applications grow, a business unifies them under AI Gateway for centralized management, using the A2A protocol and Nacos AI Registry for service registration and discovery.
AI Gateway provides a unified proxy for AI applications with stability and flexibility:
-
Stability: AI Gateway connects to multiple Alibaba Cloud platforms — Container Service for Kubernetes (ACK), Function Compute, and Serverless App Engine (SAE). It isolates unhealthy nodes through active and passive health checks, reduces change risks with canary releases, and prevents overload with multi-dimensional throttling.
-
Flexibility: AI Gateway uses service discovery to expose AI applications across computing platforms. REST-to-A2A protocol conversion automatically upgrades existing HTTP applications. Model Studio low-code applications get unified proxy access with secondary authentication mechanisms.
AI Gateway integrates with the Alibaba Cloud observability system. One-click enablement covers the entire call chain — from the application layer through MCP tools to model calls — for end-to-end tracing and fault localization.
Core capabilities of AI Gateway
Unified proxy for models, MCP servers, and agents
AI Gateway provides unified proxy and management for multiple service types:
-
AI services: Proxies model services from Model Studio, OpenAI, Minimax, Anthropic, Amazon Bedrock, Azure, and self-built models (Ollama, vLLM, SGLang). Supports API key configuration and custom DNS for internal addresses.
-
Agent services: Supports agent platforms including Model Studio, Dify, and custom agent workloads. Configure API keys and app IDs for authentication.
-
Container services: Supports services on ACK or ACS clusters. Each AI Gateway instance supports up to three container clusters.
-
Nacos services: Accesses service instances registered in an MSE Nacos registry, for both microservices and MCP Servers.
-
DNS services: Accesses backend services through DNS resolution. Supports dedicated DNS servers for private networks or internal domains.
-
Fixed addresses: Configures backend service addresses as fixed
IP:Portentries. -
SAE: Supports services running on Alibaba Cloud SAE.
-
Function Compute: Integrates with Function Compute directly, bypassing HTTP triggers for improved call efficiency.
-
Compute Nest MCP services: Supports MCP servers hosted by Compute Nest.
AI Gateway supports both active and passive health checks for services:
-
Active health checks: The gateway periodically probes service nodes based on configured detection rules.
-
Passive health checks: The gateway evaluates node health based on actual request performance against configured detection rules.
Load balancing and canary release for models and agents
Load balancing and canary release for models
The model API provides three load balancing modes:
-
Single-model service: Specify a single LLM service. Supports passing through the model name or specifying a model name. When a model name is explicitly specified, the model name in the user request is ignored.
-
Multi-model services (by model name): Configure multiple LLM services with model name matching rules. For example, route deepseek-* requests to the DeepSeek service and qwen-* requests to Model Studio.
-
Multi-model services (by weight): Configure multiple LLM services with request allocation weights per model. Suitable for canary releases of new models.
The model API also supports custom routes to forward requests to different backends based on request features such as headers.
Canary release for agents
The agent API also supports canary releases based on request features such as headers, routing requests to different backends.
Consumer-based authentication, observability, and throttling
AI Gateway supports per-consumer authentication, monitoring, throttling, and metering for fine-grained management.
Consumer authentication
Create consumers and assign request credentials in AI Gateway. Enable authentication for model APIs, MCP servers, and agent APIs as needed.
AI Gateway supports three consumer authentication methods: API key, JWT, and HMAC. For high security requirements, host consumer credentials in KMS.
Consumer observability and metering
AI Gateway provides multi-dimensional observability with per-consumer monitoring. Key metrics:
-
QPS: AI requests and responses per second, split into request QPS, streaming response QPS, and non-streaming response QPS.
-
Request success rate: Success rate at 1-second, 15-second, and 1-minute granularities.
-
Tokens consumed per second: Input, output, and total tokens consumed per second.
-
Average RT: Average response time (ms) at 1-second, 15-second, or 1-minute granularity. Includes non-streaming RT, streaming RT, and time to first byte.
-
Cache hits: Cache hit and miss counts per time period.
-
Throttling statistics: Throttled versus normally processed requests per time period.
-
Token statistics by model: Per-model token consumption over a specified period.
-
Token statistics by consumer: Per-consumer token consumption over a specified period.
-
Risk statistics: Risk requests identified by content security detection, broken down by risk type and consumer.
This data supports per-consumer metering and billing — for example, tracking tokens consumed by a specific consumer calling a specific model over a given period.
Consumer throttling
AI Gateway supports throttling by consumer, model name, and request header. Limits cover requests, concurrency, connections, and tokens per unit of time.
AI security protection
AI Gateway integrates the content security protection feature. Enable it per API to prevent sensitive words, compliance issues, prompt injection, and brute-force attacks during model calls.
AI Gateway supports independent interception policies for different protection dimensions:
-
contentModeration: content compliance detection
-
promptAttack: prompt attack detection
-
sensitiveData: sensitive content detection
-
maliciousFile: malicious file detection
-
waterMark: digital watermarking
-
customLabel: custom agent
Each dimension supports the following interception policies:
-
High: Intercepts all risk levels (low, medium, and high).
-
Medium: Intercepts medium and high risk levels.
-
Low: Intercepts high risk level only.
-
Monitor mode: Records only, no interception.
Hot-swappable policies and plugins
AI Gateway provides built-in policies and plugins and supports custom plugins for specific business needs.
The model API includes five core built-in policies: tool selection, security protection, throttling, caching, and web search. Enable additional policies and plugins as needed.
All policies and plugins support hot-swapping and rolling updates without affecting service traffic.
What to do next
Learn about AI Gateway gateway types and billing.
Create an AI Gateway instance to experience the features of AI Gateway.