All Products
Search
Document Center

API Gateway:What is AI Gateway

Last Updated:Nov 28, 2025

Overview

Artificial intelligence (AI) is a key driver of innovation for modern enterprises. As large language models (LLMs) develop, the use cases for AI expand. Commercial and proprietary models drive business progress across various domains, and enterprise application architecture evolves from microservices and cloud-native architectures to AI-native architectures. This evolution presents numerous challenges for enterprises, including AI integration, system stability, security and compliance, and management complexity.

To address these challenges, Alibaba Cloud's cloud-native API gateway introduces AI Gateway. It acts as a core component that connects enterprise AI applications with model services, tools, and other agents. AI Gateway helps enterprises build and manage AI-native applications by providing capabilities such as protocol conversion, security protection, traffic governance, and unified observability.

Challenges of using AI in enterprise applications

AI applications are widely used in various enterprise scenarios. Compared to traditional applications, AI applications have a distinct architectural feature. They are model-centric and use a model's inference capabilities, combined with prompts, tool calling, and memory mechanisms, to support and respond to specific business needs.

Based on their traffic characteristics, AI applications can be divided into the following three scenarios:

  • AI applications access various model services: The core feature of AI applications is the use of model capabilities for inference and planning. Therefore, ensuring the security and stability of the model access path is critical.

  • AI applications call external tools: Tools act as a bridge between AI applications and external systems. Tool calling is typically achieved through standardized protocols such as MCP.

  • AI applications are accessed externally: This includes access by end users or other AI applications. In this scenario, AI applications often use protocols such as A2A for communication and calls between applications.

Enterprises face numerous engineering and technical challenges when implementing these three scenarios. These include the following:

Challenges in accessing model services: Multiple factors and high requirements

The Three Mores:

  1. Multiple models: Different model providers have inconsistent API operation specifications, authentication mechanisms, and invocation methods. This lack of a standard abstraction layer prevents unified integration, makes it difficult for developers to switch between providers, and does not support parallel multi-model calls.

  2. Multimodal: Unlike text-to-text LLMs that often adhere to the OpenAI standard, multimodal models lack unified standards for transport protocols (such as Server-Sent Events (SSE), WebSocket, and Web Real-Time Communication), communication patterns (sync/asynchronous), and request-response structures. This diversity of interfaces complicates system integration and operations management.

  3. Multiple scenarios: Different business scenarios have vastly different requirements for model services. For example, real-time speech recognition demands low response time (low RT), while long-text understanding requires processing stability. Each scenario needs customized adaptations for rate limiting policies, fault tolerance mechanisms, and service quality guarantees.

The Supreme People's Court and the Supreme People's Procuratorate:

  1. High security requirements: Enterprises face data breach risks when calling model services, especially when using external or open source models. The transmission and processing of sensitive data must comply with strict data regulations, requiring robust security controls such as privacy protection, audit trails, and access control.

  2. High stability requirements: Model services are constrained by underlying computing power resources, often resulting in low API operation rate-limiting thresholds. Their response time (RT) and request success rate fluctuate more significantly than those of traditional API services, leading to lower service availability. This instability poses a direct challenge to the continuity and user experience of upstream AI applications.

Challenges in accessing tools: Precision and security

The primary challenge for AI applications in tool calling is balancing efficiency with security.

As the number of available tools grows, passing the entire tool list to an LLM for selection significantly increases token consumption and inference costs. Moreover, an excessive number of candidate tools can lead to incorrect selections by the model, reducing execution accuracy.

Additionally, tools are often directly linked to core business logic. Improper tool calls can expand the system's attack surface. The emergence of new attack vectors, such as MCP malicious poisoning, demands a more secure design for tool access mechanisms.

Challenges in accessing AI applications: Stability and flexibility

Developers can build AI applications in several ways, which primarily fall into three categories:

  • High-code development: Build applications by writing code using frameworks such as Spring AI Alibaba, ADK, and LangChain. This approach offers the highest flexibility and functional extensibility but demands a higher level of technical expertise from developers.

  • Low-code development: Use platforms such as Model Studio to build applications through visual, drag-and-drop flow orchestration. This method supports quick development and iteration, lowers the barrier to entry, and is ideal for quick validation and prototyping.

  • Zero-code development: Leverage tools such as JManus to build AI applications solely through prompt configuration without any programming, suitable for fast deployment in simple scenarios.

Because these development models vary in implementation and architectural design, there is no unified standard for accessing them. This makes it difficult to achieve the centralized administration and control that is common in cloud-native applications.

An AI application's behavior and performance are highly dependent on the underlying LLM's capabilities, making its output stability uncertain. Without effective isolation and fault tolerance mechanisms, a single point of failure can trigger a chain reaction, causing widespread failures in business systems that rely on the application.

Typical practices for three scenarios using AI Gateway

To address these customer challenges, Alibaba Cloud launched AI Gateway. It acts as a bridge between AI applications and model services, tools, and other agents. The following three scenarios show typical use cases for AI Gateway.

Model access

An enterprise plans to build AI applications to improve operational efficiency and explore new business scenarios. The enterprise uses the Alibaba Cloud platform to deploy a fine-tuned model on PAI and integrates Alibaba Cloud Model Studio as a fallback service. For specific needs, such as image generation, it uses an open source model deployed on Function Compute. To ensure secure and efficient calls from all AI applications to these LLM services, the enterprise deploys AI Gateway. It configures a Model API for each application scenario and integrates control capabilities, such as traffic governance and authentication, at the API layer to provide a unified entry point for model access.

AI Gateway effectively addresses the challenges of multiple factors and high requirements:

  • Multiple models: AI Gateway supports various model routing policies, including rules based on model name, request proportion, or specific request features (such as a Header). The gateway can also unify protocols from different model providers into an OpenAI-compatible interface, allowing AI applications to seamlessly switch between multiple models by integrating with a single standard.

  • Multimodal: AI Gateway supports proxying multimodal model calls over both HTTP and WebSocket protocols, providing a unified endpoint. This enables applications to consistently invoke various models, such as text-to-text, text-to-image, and speech recognition. Administrators can also use the plugin mechanism to enhance the security and stability of multimodal calls.

  • Multiple scenarios: You can create a separate Model API for each model application scenario (such as text generation, image generation, and speech recognition) and assign a unique consumer identity to each caller. This enables per-consumer observability, rate limiting, security protection, and metering, ensuring resource isolation and fine-grained management.

  • High security requirements: AI Gateway provides comprehensive protection across three layers: network security, data security, and content security.

    • Network Security: Integrates SSL certificates, WAF protection, and IP blacklists and whitelists to defend against malicious traffic and attacks at the network ingress.

    • Data security: Supports consumer-side identity authentication to prevent direct exposure of API keys. It also implements backend authentication and API key management for backend model services. Keys can be hosted in KMS to prevent sensitive information from being stored locally on the gateway.

    • Content Moderation: Deeply integrates with AI content safety features to intercept non-compliant content and risky inputs in real time. Combined with a data masking plugin, it can remove sensitive information before forwarding requests to ensure content compliance.

  • High stability requirements: AI Gateway enhances system stability from two perspectives: observability and control.

    • Observability: For every request, the gateway logs the source provider, target model, calling consumer, and key metrics such as first packet latency and token count. It also flags events such as rate limiting, interceptions, and fallbacks, providing end-to-end visualization through a built-in dashboard.

    • Control: Provides load balancing, fallback mechanisms, rate limiting policies, and caching capabilities. You can configure governance rules, such as token count limits and concurrency control, on a per-consumer basis. Administrators can continuously optimize these policies and dynamically adjust resource allocation based on monitoring data to ensure stable system operation.

Tool access

After establishing a unified access system for model services, the enterprise identifies that tool access presents several challenges, particularly high security risks that require dedicated management. To address this, the enterprise decides to implement unified control over tool access protocols and entry points. The architecture team selects MCP as the standard protocol for tool access and uses AI Gateway's HTTP-to-MCP conversion capability to automatically transform existing APIs into MCP Servers, supporting fast business iteration and innovation.

AI Gateway ensures the precision and security of tool calls through the following mechanisms:

  • Precision:

    AI Gateway supports both connecting to existing HTTP services and hosting MCP Servers. For existing HTTP services, you can dynamically update tool descriptions within the gateway. The gateway supports flexible tool orchestration. You can create virtual MCP Servers to assemble custom tool lists on demand, meeting the needs of different business scenarios and enabling providers and consumers to define their own MCP Servers independently. Additionally, AI Gateway provides an intelligent tool routing feature that can automatically filter the relevant tool collection based on the request content at the gateway. It returns only the list of tools that match the current task. This effectively reduces the token consumption required for model inference and improves tool selection accuracy.

  • Security: For tool access control, AI Gateway provides a multilayered security mechanism. In addition to supporting call authentication at the MCP Server level, it also enables fine-grained access permission configuration for individual tools. This enables precise authorization management based on the caller's identity, ensuring that tools with different security levels can be assigned corresponding access privileges based on their risk level.

Agent access

As the number of AI applications grows, the enterprise decides to unify them under AI Gateway to address coordination and management challenges. It recommends using the A2A protocol with Nacos AI Registry for service registration and discovery.

AI Gateway can serve as a unified proxy service for AI applications, offering both stability and flexibility.

  • Stability: AI Gateway supports direct connections to various Alibaba Cloud runtime platforms (such as ACK, FC, and SAE). It provides active and passive health check mechanisms to automatically isolate abnormal nodes. By incorporating phased release capabilities, it reduces the risks associated with changes. It also supports multi-dimensional rate limiting policies to prevent application overload and ensure service stability.

  • Flexibility: Through its service discovery feature, AI Gateway uniformly exposes AI applications deployed on different computing platforms. It provides REST-to-A2A protocol conversion, enabling the automatic upgrade of existing HTTP applications to the A2A protocol. For low-code AI applications built on Model Studio, AI Gateway supports unified proxy access and can be extended with a secondary authentication mechanism.

In addition, AI Gateway is deeply integrated with the Alibaba Cloud observability ecosystem. Once an AI application is connected, you can enable end-to-end observability with a single click. This covers the entire call chain from the application layer, through MCP tools, to the final model call, enabling end-to-end tracing and fault diagnosis.

Core capabilities of AI Gateway

Unified proxy for models, MCP Servers, and agents

AI Gateway provides proxy capabilities for models, MCP Servers, and agents, supporting unified access and management for multiple service types, including the following:

  • AI service: Proxies various model services, including those from providers such as Model Studio, OpenAI, Minimax, Anthropic, Amazon Bedrock, and Azure, and is compatible with self-hosted models based on frameworks such as Ollama, vLLM, and SGLang. You can configure an API key for the AI service and specify a custom DNS Server for internal endpoints.

  • Agent service: Supports services from agent application platforms, including Model Studio, Dify, and user-defined agent workloads. You can configure an API key and APP-ID for identity authentication and access control.

  • Container service: Supports services running on Alibaba Cloud ACK or ACS clusters. A single AI Gateway instance can be associated with up to three container clusters.

  • Nacos service: Supports service instances registered in an MSE Nacos registry, suitable for both standard microservices and MCP Servers.

  • DNS service: Supports accessing backend services via DNS parsing. This lets you specify a dedicated DNS Server to parse private network or internal domain names.

  • Fixed address: Supports configuring backend endpoints as a list of fixed IPs. You can set multiple IP:Port addresses.

  • SAE service: Supports services running on Alibaba Cloud SAE.

  • FC service: Supports Alibaba Cloud Function Compute (FC) service registration. AI Gateway can bypass the HTTP Trigger and integrate directly with the backend service, improving call efficiency.

  • Compute Nest MCP service: Supports MCP Servers hosted by Compute Nest.

AI Gateway lets you configure health checks for services. This includes both active and passive health check modes.

  • Active health check: The gateway periodically sends health probe requests to service nodes based on user-defined probing rules to determine their availability status.

  • Passive health check: The gateway evaluates a node's health based on its performance during actual request processing, according to user-defined probing rules.

Load balancing and phased release for models and agents

Load balancing and phased release for models

The Model API provides three built-in model load balancing capabilities:

  • Single model service: You can specify a single LLM service. This mode supports either passing through the model name from the request or specifying a fixed model name. When a model name is explicitly specified, any model name passed in the user's request is ignored.

  • Multiple model services (by model name): You can configure one or more LLM services and set model name matching rules for each. For example, you can define a rule to route requests with model names matching deepseek-* to a DeepSeek LLM service, and those matching qwen-* to the Alibaba Cloud Model Studio LLM service.

  • Multiple model services (by proportion): You can configure one or more LLM services, specifying a model name and a request allocation percentage for each. This is suitable for scenarios such as a phased release of a new model.

The Model API supports custom route configurations. This lets you forward requests to different backend services based on request features, such as a specific Header.

Phased release for agents

Similar to the Model API, the Agent API supports phased release capabilities based on request features. You can route requests to different backend services based on specific features, such as a particular Header.

Per-consumer authentication, observability, rate limiting, and metering

AI Gateway provides independent authentication, monitoring, rate limiting, and metering functions for different business sources to meet fine-grained management needs.

Consumer authentication

You can create different consumers on the AI Gateway and assign request credentials to each consumer. You can enable consumer authentication as needed for each Model API, MCP Server, and Agent API. The AI Gateway supports three consumer authentication methods: API-KEY, JWT, and HMAC. For security-sensitive scenarios, you can host consumer credentials on KMS.

You can create multiple consumers in AI Gateway and assign each consumer independent request credentials. For Model API, MCP Server, and Agent API, you can enable consumer authentication as needed. AI Gateway supports API-Key, JWT, and HMAC authentication methods. For scenarios with high security requirements, you can host consumer credentials in KMS for secure management.

Consumer observability and metering

AI Gateway provides multi-dimensional observability capabilities, supporting monitoring and analysis by consumer and other dimensions. Key metrics include the following:

  • Queries Per Second (QPS): The number of AI requests and responses per second, broken down into AI request QPS, streaming response QPS, and non-streaming response QPS.

  • Request success rate: The success rate of AI requests, with statistics available at second, 15-second, and minute granularities.

  • Tokens consumed/s: The number of tokens consumed per second, divided into input tokens, output tokens, and total tokens.

  • Average request RT: The average response time (in milliseconds) for AI requests over a specified period (by second, 15 seconds, or minute). Breakdowns include non-streaming RT, streaming RT (total time for the streaming response), and streaming first packet RT (first packet latency for the streaming response).

  • Cache hits: The number of cache hits and misses within a specified time period.

  • Rate limiting statistics: The number of rate-limited requests and normally processed requests within a specified time period.

  • Token statistics by model: The token consumption for different models within a specified time period.

  • Token statistics by consumer: The token consumption for different consumers within a specified time period.

  • Risk statistics: Statistics on identified risky requests based on Content Moderation detection results, categorized by risk type, consumer, and other dimensions.

Based on this observability data, AI Gateway supports consumer-based metering and billing. It provides detailed data, such as the number of tokens consumed by a specific consumer when calling a particular model over a defined period. This enables you to implement accurate, per-consumer resource usage metering and billing.

Consumer rate limiting

AI Gateway supports rate limiting policies based on multiple dimensions, including consumer, model name, and request Header. You can set limits on the number of requests, concurrent connections, and tokens over a unit of time.

Multi-dimensional, multimodal AI security protection

The AI Gateway integrates the Content Moderation protection feature to provide AI security protection. You can enable this feature on a per-API basis to effectively prevent security risks during model invocations, including those related to sensitive words, compliance, prompt injection attacks, and brute-force attacks. This improves the security and stability of your AI applications.

AI Gateway lets you configure independent interception policies for different protection dimensions. The dimensions that can be protected include the following:

  • contentModeration: Content compliance detection

  • promptAttack: Prompt attack detection

  • sensitiveData: Sensitive content detection

  • maliciousFile: Malicious file detection

  • waterMark: Digital watermarking

For each protection dimension, you can configure a corresponding interception policy. The policies include the following:

  • High: Intercepts all requests with a risk level of low, medium, or high.

  • Medium: Intercepts requests with a risk level of medium or high.

  • Low: Intercepts only requests with a risk level of high.

  • Monitor mode: Requests are not intercepted but are logged.

Hot-swappable and hot-updatable policies and extension plugins

AI Gateway provides a rich set of built-in extension policies and plugins, and also supports developing custom plugins to meet specific business requirements.

For example, the Model API comes with five pre-configured core policies: tool selection, security protection, rate limiting, caching, and web search. You can also enable additional policies and plugins as needed.

All policies and plugins support hot-swapping and hot-updating, ensuring that service traffic is not affected during configuration changes.

What to do next

Learn about AI Gateway gateway types and billing.

Create a gateway instance to experience the capabilities of AI Gateway.