By SOFA
The gateway, as an important middleware, plays roles in traffic governance, routing forwarding, protocol conversion, and security protection in traditional business scenarios. According to the positioning of different business scenarios, various types of gateways will emerge, such as traffic gateways, ESBs (Enterprise Service Bus), API gateways, and cloud-native gateways. From the perspective of gateway responsibilities, the essential duties have not changed much; the main focus has been on more adaptation based on different business scenarios, to better meet business usage. For example, the API gateway is designed for microservice scenarios, transforming the previous management granularity from coarse-grained traffic or services to fine-grained management at the REST or interface level, thus achieving more refined governance. This is the core driving force behind the evolution from a traffic gateway to an API gateway.
In the AI scenario, there has been a fundamental transformation in business models, and the challenges faced by the gateway have shifted from 'service' to 'model' and 'intelligent agents.' This change is not a simple technological iteration; it also brings about a comprehensive reshaping of business logic, interaction modes, resource consumption, and risk models.
To effectively support increasingly diverse and complicated AI business scenarios (such as service models, intelligent agents, AI applications, and MCP, etc.), the role of the API gateway urgently needs to be upgraded from a general type to a specialized AI gateway. The core capabilities of the existing general-purpose gateway can no longer meet the specific demands of these scenarios. Therefore, the AI gateway has specifically expanded and strengthened its capability set, deriving core features such as intelligent routing, unified model access, semantic caching, content security, MCP proxy, and model rate limiting.
The SOFA commercialization team has launched the SOFA AI gateway, also known as SOFA Higress, to meet customer needs for AI business development.
SOFA AI Gateway (also known as SOFA Higress) is built on the open-source Higress kernel, specifically optimized and enhanced for SOFA scenarios. It is an intelligent gateway solution aimed at AI needs.
From the outset, the positioning of SOFA AI Gateway has been made clear: to provide specialized services for three core AI business scenarios:
The following sections will elaborate on the above three aspects in detail.
SOFA AI Gateway utilizes Higress as its kernel, mainly considering its strong open-source community and rich extension mechanism, while also aligning with the future goal of multi-gateway integration. Therefore, we built upon the Higress gateway and migrated existing capabilities from API gateways, data gateways, intercommunication gateways, etc.
Currently, intelligent agents are undoubtedly the hottest topic, with many enterprises beginning to build their own vertical business intelligent agents. To help enterprises build their agents better and faster, we have clearly positioned the gateway as a unified entry and exit gateway for agent traffic.
SOFA AI Gateway provides key capabilities for intelligent agents:
SOFA AI Gateway provides the following key functions mainly on the exit side of the intelligent agent traffic:
There are significant differences between the gateway's proxy model services and traditional service proxies. The root of this difference lies in the unique traffic characteristics of model services, mainly including:
Given these core characteristics of model traffic, traditional load balancing strategies commonly used by gateways (such as simple polling, least connections, and random) often perform poorly in model service proxy scenarios or even backfire. For example, polling may assign new requests to instances that are already overloaded and queued, further exacerbating delays. Therefore, gateway solutions for model services need to provide smarter routing strategies that can dynamically make decisions based on real-time load on model instances, KV Cache status, queue situations, and other indicators.
SOFA AI Gateway, as a unified entrance for models, is responsible for realizing multi-cluster routing and proxying for models, providing lifecycle management for model registration and decommissioning as well as intelligent routing capabilities.
The intelligent routing logic of SOFA AI Gateway is different from the implementations of open-source Higress and industry inference gateways, yet it integrates the advantages of both. Higress's intelligent routing capability is entirely implemented in plugins, meaning that all routing logic is developed and integrated through a plugin approach, including routing based on metrics indicators. This design typically performs better in terms of performance. In contrast, current industry inference gateways generally implement routing selection based on the Gateway API Inference Extension specification through independently deployed EPP services.
For the sake of improving delivery efficiency, SOFA AI Gateway did not choose to directly modify the Higress data plane source code to integrate Gateway API Inference Extension capabilities, nor did it allow business sides to directly write routing plugins within the plugin. Instead, we developed Higress plugins supporting ext-proc protocol to connect with the EPP services on the business side or used HTTP protocols to interface with traditional services, facilitating custom routing extensions.
Of course, in the future, to better align with industry standards, we also plan to make modifications to the data plane to integrate native Gateway API Inference Extension capabilities.
In the practice of intelligent agent projects, we realized that high-quality tools (especially specialized MCPs) and authoritative data are key to the capabilities of intelligent agents. General-purpose large models face significant limitations in professional fields like finance: knowledge may be outdated, lacking deep industry understanding, and difficult to ensure the accuracy and compliance of responses.
The role of specialized tools (MCP) lies in:
Address of the MCP Market: https://mcp.sofa.antdigital.com/mcp/home

In the construction process, we have also encountered some new challenges, primarily including insufficient accuracy in entity recognition and MCP context overflow.
Unclear Entity Extraction: When users query or operate MCP services through natural language, the relevant key inputs (such as fund or stock names or codes) depend heavily on precise entity recognition. However, when users use aliases, non-standard industry terms (commonly known as 'black words'), or incomplete names, the results extracted by the model may not accurately correspond to real financial entities (like fund names or stock codes). This directly affects the accuracy of subsequent processing and user experience. Therefore, we urgently need to introduce the 'Slot Extraction' engineering capability to refine the verification and mapping of recognition results, enhancing user interaction experience and information recall rates.
MCP Context Explosion: Currently, the platform has launched 15 specialized MCPs, and this number will continue to increase in the future. Accessing too many MCPs significantly inflates the processing context of each request, placing pressure on the model's performance and resource consumption. In response to this issue, constructing a set of intelligent MCP routing mechanisms is essential to accurately filter the required service modules based on user requests, avoiding unnecessary context loading.
Building slot extraction capabilities and intelligent routing capacities for MCPs will also be a key area of focus for SOFA AI Gateway in the second half of the year.
Thanks to the Higress open-source team; without such a great product, the rapid incubation of SOFA AI Gateway would not have been possible. Special thanks to @Chengtan for providing professional answers during the construction of SOFA AI Gateway.
If you want to learn more about Alibaba Cloud API Gateway (Higress), please click: https://higress.ai/en/
Higress AI Gateway Development Challenge Participation Guide
The Evolution of Cloud Native: Accelerating the Development of AI Applications
626 posts | 54 followers
FollowAlibaba Clouder - July 31, 2018
Alipay Technology - November 6, 2019
Alibaba Cloud Native Community - March 20, 2023
Alibaba Cloud Community - November 6, 2024
block - September 14, 2021
Alibaba Cloud Native Community - August 28, 2025
626 posts | 54 followers
Follow
API Gateway
API Gateway provides you with high-performance and high-availability API hosting services to deploy and release your APIs on Alibaba Cloud products.
Learn More
AgentBay
Multimodal cloud-based operating environment and expert agent platform, supporting automation and remote control across browsers, desktops, mobile devices, and code.
Learn More
AI Acceleration Solution
Accelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn More
Offline Visual Intelligence Software Packages
Offline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreMore Posts by Alibaba Cloud Native Community