API gateways are not a new concept, but DeepSeek's trend of popularization has brought fresh excitement to the API gateway. This article aims to offer a more comprehensive understanding of API gateways by discussing related concepts, the evolution and classification of API gateways, core capabilities, and how DeepSeek integrates with API gateways.
API gateways serve as core components for managing APIs and play a crucial role in the entire architecture system. They act like an intelligent transportation hub, responsible for coordinating and managing various API requests to ensure safety and stability, enabling efficient and smooth responses. Many rigid demands from large model applications are being met through API gateways, such as:
● Supporting multiple large models in the backend as both a product experience consideration and a stability concern; this has become a standard for large model applications, whether for conversational or code-related applications.
● Whether to have networked search capabilities, as the generation quality of large models varies significantly, the frontend must expose options for networked search.
● Ensuring content output safety and compliance by implementing control before content generation.
● Semantic caching, temporarily storing API response results on a caching server so that when identical requests arrive, the responses can be fetched directly from the cache, reducing the official API call costs.
● Quota and rate limiting for callers, a mechanism for restricting the number of API calls, traffic size, or resource usage for each caller (e.g. users, applications, IP addresses) within a certain time frame.
● Backend protective rate limiting to manage traffic flows to the API, ensuring stable and efficient operations including load balancing, rate limiting, degradation, and circuit breaking capabilities.
API (Application Programming Interface) is a set of specifications and protocols that define how different software applications or components communicate and interact with one another. APIs can be seen as middleware, allowing developers to access and use certain functionalities or data without having to understand the underlying implementation details. For example, Alibaba Cloud APIs provide developers with a series of application interfaces to manage cloud resources, data, and services. API classifications include:
The entry interface for creating APIs in the Alibaba Cloud Native API Gateway Console
An API gateway (APIG) is a piece of middleware that provides API hosting services. It sits between the client and backend services, serving as the sole entry point for client access to backend services. All requests from clients first pass through the API gateway, which then routes them to the backend services. It acts like a gatekeeper, responsible for identity verification, permission checks, flow control, and other actions to ensure the security and stability of API requests.
www.xxx.com
API gateways are not independent entities but have evolved alongside the evolution of software architecture. Software architectures have transitioned from monolithic, vertical, SOA, microservices, to cloud-native architectures. With the popularization of large models, the evolution has continued towards AI-native architectures, during which the forms of API gateways have also iterated, exhibiting different forms in various stages of software architecture.
Responsible for managing and optimizing data traffic to enhance business scalability and high availability. Nginx, as a representative software for traffic gateways, is popular for its efficient performance and flexible configuration. The core purpose of traffic gateways is to resolve load balancing issues among multiple business nodes by intelligently distributing customer requests to different servers, thereby evenly spreading the load, avoiding single points of failure, and ensuring the stability and continuity of services.
A critical integration solution designed for enterprises aimed at standardizing and simplifying communication and message transmission between different systems and services. Following service-oriented architecture (SOA) principles, the ESB achieves rapid deployment and efficient operations of services through centralized management of message routing, transformation, and security.
Responsible for centrally managing routing rules for microservices, enhancing system security, providing performance monitoring, and simplifying access processes to improve overall system reliability. Microservices gateways can implement load balancing, rate limiting, circuit breaking, and authentication functions, managing and optimizing interactions among different microservices through a unified entry. This not only simplifies the communication complexity between clients and microservices but also provides additional protection for system security. Spring Cloud Gateway is a widely used microservices gateway based on the Spring ecosystem, easy to integrate with Spring Boot projects, and favored by developers for its flexibility, efficiency, and scalability.
An innovative gateway born from the widespread application of K8s, which requires a gateway to forward external requests to internal cluster services due to the natural isolation of networks within K8s clusters. K8s adopts Ingress/Gateway APIs for unified gateway configuration, and it provides elastic scaling to help users solve application capacity scheduling issues. Consequently, users have new demands for gateways, expecting them to possess characteristics of traffic gateways to handle massive requests, while also having features of microservices gateways for service discovery and governance, and to provide elastic scaling capabilities for capacity scheduling issues. Envoy and Higress are examples of typical open-source cloud-native gateways.
We believe that the AI gateway is not a new form independent of the cloud-native gateway; it can essentially be regarded as a cloud-native gateway, with the distinction being that it has been specifically expanded to address new needs in AI scenarios. For example, it provides capabilities such as flexible switching between multiple models and retries, content safety and compliance for large models, semantic caching, multi-API Key balancing, token quota management and rate limiting, large model traffic grayscale handling, and cost auditing for calls. In the industry, Higress and Kong have evolved capabilities specifically targeted at AI scenarios on the foundation of cloud-native gateways, while others like Traefik and Cloudflare have also designed products and services for AI gateways. For the core capabilities of AI gateways, please refer to our previous article on the ten essential capabilities that an AI gateway should possess.
Due to the numerous capabilities provided by API gateways and the various roles involved, we will categorize all capabilities based on the users, including three scenarios: development, supply, and consumption. These correspond to the development teams of API interfaces, the development and operations teams of the API platform, and the external callers of the API platform.
API First means defining the API specifications first before coding. Unlike not defining the API and coding directly, API First emphasizes designing and developing API interfaces before building applications, treating APIs as core architectural components of the system, and achieving modularity through well-defined interface specifications.
For example, public cloud products offer API calling methods, and WeChat Mini Programs and DingTalk Open Platform provide API interfaces for developers, similar to a modular LEGO system, enabling flexible combinations between services through standard interfaces, enhancing system scalability and maintainability, thereby improving ecological efficiency.
In development scenarios, API gateways can cover the entire lifecycle around APIs, including design, development, testing, publishing, selling, operations monitoring, security management, and sunset of APIs.
API supply scenarios refer to the process by which API providers (such as enterprises, platforms, or services) expose data or functionalities to the outside world through standardized interfaces. Its core involves creating, managing, and maintaining APIs to ensure their availability, security, and efficiency.Core capabilities include:
API Security: Protect APIs from various security threats, ensuring that only authorized users and applications can access APIs, while ensuring confidentiality, integrity, and availability of data during transmission and storage.For example, authentication, authorization management, data encryption and decryption, and anti-attack mechanisms.
Gray Release: A strategy for gradually introducing new API versions or features in a production environment, allowing a portion of users or request traffic to be directed to the new version of the API while keeping the rest on the old version, thereby enabling testing and validation of the new API without impacting overall system stability and user experience.
Caching: Refers to temporarily storing API response results in a caching server so that when identical requests arrive again, the response results can be retrieved directly from the cache without re-accessing the backend server, thus improving API response speed and system performance.
API consumption scenarios refer to the process where callers (such as applications or developers) quickly implement functionalities or obtain data by integrating external APIs. The core is to use the capabilities or data provided by the platform to meet business needs.
The following demonstrations provide three scenarios for reference:
Some large model providers have already been integrated into the Alibaba Cloud Native API Gateway, allowing these models to be accessed directly by selecting a provider and configuring the API-KEY. These include: Alibaba Cloud Bailian, DeepSeek, OpenAI, Azure, Claude, Dark Side of the Moon, Baichuan Intelligence, Zero One Everything, Zhipu AI, Hunyuan, Jueyue Star, Spark, Doubao (Volcano Engine), MiniMax, and Gemini.
The gateway sends requests through services to create AI services using the following methods:
Example Configuration for Alibaba Cloud Bailian:
In the completed AI API interface, click Debug.
Specify the model as deepseek-r1 to interact with Alibaba Cloud Bailian's DeepSeek.
In this scenario, custom service addresses support the following cases:
In this scenario, direct integration can refer to the methods in PAI deployment model access to AI gateway [4].
The current cloud-native API gateway simultaneously supports integration based on both integrated models and general models, providing multi-model proxy services, and supports Fallback in cases of call exceptions; in such scenarios, users use a unified calling method to invoke different third-party model services concurrently.
Service List: Click Add to add the following multiple services.
Fallback List: Click Add to add the following services.
The configuration shown below will execute according to the following rules:
DeepSeek-*
ep-*
In the completed AI API interface, click Debug.
When specifying the model name as ep-20250219155230-28l6f or DeepSeek-R1-Distill-Qwen-1.5B, responses will be given from Volcano Engine or PAI as per the rules.
When an incorrect name is entered, and there is no corresponding DeepSeek model, Fallback will be triggered, calling Alibaba Cloud DeepSeek-R1:
In the future, we will compile experiences on the practices of various industry clients using DeepSeek + API gateways to build internal and external corporate services, and organize them into articles to be published on this public account. Everyone is welcome to subscribe and follow.
[1] https://www.alibabacloud.com/help/en/vpc/user-guide/create-and-manage-a-vpc
[2] https://www.alibabacloud.com/help/en/vpc/user-guide/use-the-snat-feature-of-an-internet-nat-gateway-to-access-the-internet
[3] https://www.alibabacloud.com/help/en/api-gateway/cloud-native-api-gateway/user-guide/create-gateway
[4] https://www.alibabacloud.com/help/en/api-gateway/cloud-native-api-gateway/use-cases/pai-deployment-model-access-ai-gateway
The Secret of Performance Improvement of Logtail's Multi-line Log Collection by 7 Times
Koordinator v1.6: Supports Heterogeneous Resource Scheduling in AI/ML Scenarios
548 posts | 52 followers
FollowAlibaba Cloud Native Community - February 13, 2025
Alibaba Cloud Native Community - February 20, 2025
Kalpesh Parmar - April 23, 2025
Alibaba Cloud Native Community - March 6, 2025
Alibaba Cloud Native Community - April 18, 2025
Alibaba Cloud Data Intelligence - February 8, 2025
548 posts | 52 followers
FollowAPI Gateway provides you with high-performance and high-availability API hosting services to deploy and release your APIs on Alibaba Cloud products.
Learn MoreMSE provides a fully managed registration and configuration center, and gateway and microservices governance capabilities.
Learn MoreAccelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreTop-performance foundation models from Alibaba Cloud
Learn MoreMore Posts by Alibaba Cloud Native Community