Beyond Nginx Ingress: Higress as the Kubernetes Gateway for the AI Era

This article introduces Higress, a CNCF Sandbox Kubernetes gateway built on Envoy/Istio that replaces retiring Nginx Ingress with zero-downtime reload...

My name is Huxing, I'm from Alibaba Cloud. Today I'm going to talk about Beyond Nginx Ingress, Higress as the Kubernetes gateway for the AI era.

Just a quick note that Higress just joined the CNCF sandbox a couple of days before KubeCon EU. It has been announced in the keynote on the first day. So it's a new project in CNCF sandbox!

The Retirement of Nginx Ingress

Let me first talk about the retirement of Nginx Ingress. So this is a very special moment that, the Kubernetes SIG network and the security response committee has announced that the retirement of Nginx Ingress, which means right at the end of this month, it will be retired. There will be no further release, no bugfix or security patches.

But another fact is that according to an estimation, 50% of the global cloud native environment still use some sort of Nginx Ingress infrastructure. This is a crucial issue for us, which need to be addressed.

Another critical issue is that Nginx allows us to write an annotation-based code snippets, which allows to write any form of Nginx config. This is flexible and powerful, but since the retirement, there might be very critical vulnerabilities. That's another thing we need to address.

Challenges for Nginx-based Ingress Controllers

image_1_

At Alibaba Cloud, we actually used the Nginx-based Ingress before, and we have faced several critical challenges. I'll list some of them.

The first one is the traffic jitter for long-lived connections. As you may know that when we are updating Nginx Ingress, every config will go through a new worker process to update the config, which will cause the long-lived connection, like GRPC or WebSocket, to terminate the connections and reestablish. This is very painful for us, and we suffer from these kind of update issues.

Another issue is the GRPC load balancing issues. Nginx has designed for short-lived connections and for the long-lived, especially multiplexed HTTP2 connections will suffer from uneven load balancing issues. Even Nginx has provided support for the GRPC load balancing, but if we are using a long-lived connection and send multiplexing request over one single connections, it will cause uneven distribution of requests and therefore resource saturation.

The third one is the scalability issues. When the number of the configuration has grown, the time it will take for the reload will grow dramatically. It may not be a good practice when we are scaling to thousands of, maybe tens of thousands of configurations in one single Kubernetes cluster.

The Evolution of Higress

image_2_

Facing these issues we have decided to re-implement a new architecture of Ingress. This is why Higress is here. I will briefly introduce the big milestones before the Higress has been open sourced.

We decide to embrace the Istio and Envoy architecture. Higress is built on Istio and Envoy. In 2020, we firstly launched Higress in our company and quickly expanded to multiple services across business units, and it has been proven stable in the Double 11 festival of Alibaba, which handles a huge amount of data. Higress is proven production ready.

In 2021, we have launched the cloud service base on Higress for our customers at Alibaba Cloud. Then we have combined the traffic gateway with microservice gateway. Higress became a two-in-one gateway solutions which have brought us 50% cost reduction compared to the previous approach.

Since the 2022, we have officially open sourced the Higress project. As users keep coming, we added more features like security and a lot of plugins and protocol translations, making Higress as a traffic, microservice and security three-in-one gateway. This is a brief evolution of this project.

Higress Architecture

image_3_

What is the architecture of the Higress? As I said that Higress is built on Envoy and Istio, but not only we're using the open source architecture, but also we add enhancement to it.

On the control plane side, we have added features like dynamic certificate management that can automatically renew your certificate once it is expired. We have also added custom CRDs like service discovery especially microservices. It can not only discover services from Kubernetes, but also discover services based on the popular service registry like Consul, Nacos, and ZooKeeper.

In order to help the user migrate from Nginx Ingress to Higress, we also provide the dual stack support which support both the Nginx Ingress annotations and the new Gateway API.

On the data plane side, we added more features like protocol translations. It can convert HTTP requests into an RPC popular framework called Apache Dubbo in China. We also bring in a new feature called WASN plugin, which I will describe it in the next slide. For the features that WASM cannot handle, we added the native support for HTTP filter in Golang. It's natively running in the Envoy proxy and the purpose is to support features like MCP or other long-lived connection issues WASM plugins could not handle.We have also added more dimensional telemetry data.

Higress Core Features

image_4_

What are the core features of Higress? I named three core features here.

The first is the zero downtime configuration reload compared to Nginx Ingress. Unlike Nginx, Envoy updates routing and upstream tables entirely in memory via xDS — no process restarts, no worker cycling, no connection draining. The architecture can handle the config reloading issues for us and will not cause any downtime for us.

The second one is the WASM extensibility, which I think is one of the coolest features that Higress has. If we are going to add some extension to our Envoy proxy, we either would like to add native filters to the Envoy, which requires a restart of the process. We need to compile and restart the gateway proxy. Or you could have an external process, let the Envoy proxy talk to the external process, send the GRPC request and get a response back. But we think that we need a more flexible and more stable way to do that. Therefore we add the support for the WASM plugins.

You can easily write any WASM plugins and do hot reload on the Envoy proxy. This makes us very easy to implement new features without restarting the gateway. And another feature that WASM brings us is the isolation. Any code running in WASM sandbox cannot break the Envoy proxy process. It will not crash, which is a safe protection for the customer code. Based on that, we provide polyglot development language support for writing WASM plugins. You can use Golang, Rust, C++ and JavaScript to write your own WASM plugins.

And another feature that you need know is that the WASM plugin is distributed in an OCI compatible images, which allows you to use your existing Docker registry to distribute your WASM plugins. Higress natively supports over 100 plugins out of box. Some of them will be introduced later, but I cannot explain all of them.

The third one is the security features. Higress has natively integrated WAF with features like bot detection, replay protection, IP restriction, etc. Higress also provides rich authentication strategies. You can use JWT, API Key, HMAC, OAuth, OIDC, lots of strategies has been supported out of the box. We also provide things for guardrails like PII detection and etc. This is very useful for especially handing the AI traffic which will be described later.

Migrating from Nginx Ingress to Higress

image_5_

How do we migrate from Nginx Ingress to Higress? This is a core issue that we want to address. When a user migrate from Nginx to Higress, we want user to seamlessly migration without any code modifications.

In order to do that, we support the most common Nginx Ingress annotations. You can use the existing annotations, and you don't need to do any change. The reason behind this is that we have translated these Nginx annotations into Istio and Envoy proxy configurations so that it can be configured to the Istio and Envoy proxies.

I'm not going to go through all the features, but actually we support common scenarios like canary, rewrite, CORS, retry, and rate limiting or security. These annotations are listed or already supported, but some of them may not be easily translated into the Istio configurations, for example, the rate limiting features. To solve this, we write WASM plugins to support the same features.

Bridging Ingress API to Gateway API

image_6_

People also want to embrace the gateway API. Higress supports both the Nginx Ingress annotations and the Gateway API at the same time, which separates the users' concerns. Users can first migrate from Nginx Ingress to Higress without changing any configurations, then user can gradually migrate their old configurations and annotations into the new Gateway APIs. When both of the configuration exist, the Gateway API configuration will take the precedence.

User Case Study: Sealos

image_7_

Higress is not only adopted by Alibaba. This is a case study from Sealos. Sealos is a cloud provider that provides nearly a service to nearly over 200,000 users.

Before using Higress, they suffered from issues like configuration reload peak time up to minutes. When they are trying to scale to over 10,000, they cannot scale well because of the domain activation delay issues that they found.

When they migrate from Nginx Ingress to Higress, the peak reload time has been dramatically reduced from 30 minutes to less than 5 seconds. The memory consumption has also been dropped about 10 times less than the previous approach. There is actually a blog post talking about this migration on their official website.

Higress is also used by many companies like Didi, Qunar, PayPal, and other ones. Higress is a production-ready solution for a lot of companies.

The AI-Native Paradigm Shift in Enterprise Gateway Architecture

image_8_

Let's talk about the paradigm shift from traditional API gateway to the AI native gateway, because the kind of traffic are very different.

The traditional API gateways are serving short-lived HTTP requests and responses. While in AI traffic, there are some long-running connections with server send events, streams, and WebSocket connections, which are quite different. Besides, the payload may not be quite the same because for AI traffic, we need to understand the payload of the body to know that the which LLM we need to route it to, and we need to do a lot of work by being aware of the payload. And for the connection life cycle, the connection for AI traffic will live for minutes to hours for the long-running AI agent tasks, compared to miliseconds for traditional microservices traffic.

To address these issues, Higress can be act as two kind of roles. First, Higress can be a LLM gateway which can handle your traffic to the backend LLM models. So it provides features like multi-model routing, multi-model failover, token based rate limiting, semantic cache and observabilities.

Also, Higress can act as a MCP gateway. AI agents use many tools. They call tools through MCP servers. The core feature that it can provide includes automatically transformation from existing open APIs that you call into MCP servers. You don't need to write any kind of code.

Higress as a LLM Gateway: Core Features

image_9_

When Higress act as a LLM gateway, there are some core features that I'd like to address.

First is the model fallback features. AI agent will call a lot of LLM models. When the primary models fail, we don't want the user to be aware of that. Higress will detect the TTFT of the backend models. Once it reaches a limit, it will automatically fall back to the fallback models. This will make sure the users get a response rather than getting some errors.

image_10_

Next is the token aware rate limiting. Higress can be provided to different consumers that call the backend models. In a company, we usually divide it into a different business unit. Each unit has its token quotas. So if one of the units has reached the quota, Higress will automatically apply rate limiting to that. But it will not affect the other business unit. The key idea is that limits is applied based on actual token consumption, not based on the number of requests.

image_11_

Third one is the semantic caching. We don't want the similar questions sent to the backend every time. Higress can utilize the AI cache plugin to extract the semantic content from the user's query or response and store it in a Redis cache. For the next requests, if they are similar to the previous one, cached result can be used, which can prevent every request to be sent to the backend LLM and saving cost.

image_12_

So for observability, Higress provides quite a bunch of metrics like TTFT, TPOT, reasoning tokens, cached token, which is very important for the cost control and for debugging issues. It provides tracing features that can be sent to various backends including OpenTelemetry, Apache Skywalking and Zipkin. It can also be configured to log any columns that you want dynamically. For example, you can add session ID, reasoning content and tool calls to the logs so that you can analyze it later. All of them are written in WASM plugins and can be configured dynamically.

Higress as a MCP Gateway: Core Features

image_13_

For the MCP Gateway, Higress provides the ability to automatically convert an existing open API to an MCP server. The openapi-to-mcp tool reads your existing OpenAPI or Swagger specifications and auto-generates fully functional MCP tool configurations. It automatically maps OpenAPI schema locations to MCP tool arguments.

AI agents don't want to read a very complicated JSON that your open API returns. Higress can easily extract information from the JSON and only extract the necessary information to the AI agent and discard the unnecessary data. This can be configured in a very simple YAML configuration and Higress will automatically convert it for you.

image_14_

We call this approach as payload morphing. With this approach, we can dramatically save a lot of the returned data from sending to LLM. We can also reduce the hallucination for LLM because the LLMs are not very used to complicated JSON data. If you have a nested JSON with a lot of reference to each other, your legacy OpenAPIs specs might allow you to do that. It's not very friendly for AI to read. Higress can convert automatically from a complicated JSON data into a Markdown format, which is very easy for AI to read. This can be configured dynamically as well.

Case Study: Ctrip and Ant Group

image_15_

Let me share two more case studies for two companies. Ctrip, an online travel provider in China, also known as Trip.com, built its internal AI gateway entirely on Higress. The gateway serves as the sole ingress for all internal AI traffic, standardizing disparate LLM interfaces and managing MCP services via SSE and Streamable HTTP. Key outcomes include Bearer Token authentication, cross-business-unit token cost allocation, and automated zero-downtime failovers.

Ant Group provides online payment service, famously known as Alipay, in China. At Ant Group, the SOFA AI Gateway leverages the open-source Higress kernel to manage massive enterprise topologies, focusing on intelligent traffic routing, unified protocol access, and advanced rate-limiting.

Higress are also been widely adopted by companies including Kuaishou, DJI, and etc. , which can be found in Higress's website.

Higress Joins CNCF Sandbox!

image_16_

On March 25th, the CNCF has posted a blog post annoucing Higress officially joining CNCF Sandbox.

So if you are interested in this project, you can check out the website and GitHub. The repository will soon be transferred to CNCF repository. If you are interested in contributing, you are more than welcome to contribute.

Roadmap

image_17_

Finally, this is the roadmap of this project.

Currently, Higress serves as an Ingress gateway and it will continue to work on the support of Gateway API and Gateway API Inference Extension, placing Higress at the forefront of standardizing AI workload routing. There is an initial support on that, but it's not enabled by default. You can try it out. And we are working on WASM plugin support to the Gateway API Inference Extensions.

Looking at next steps, we are pioneering next-generation protocol support including WebRTC for real-time, bidirectional AI applications like the OpenAI RealTime API.

Higress is a vital, mature and productive, battle-tested project. Do hesitate to check it out!

Community

Beyond Nginx Ingress: Higress as the Kubernetes Gateway for the AI Era

The Retirement of Nginx Ingress

Challenges for Nginx-based Ingress Controllers

The Evolution of Higress

Higress Architecture

Higress Core Features

Migrating from Nginx Ingress to Higress

Bridging Ingress API to Gateway API

User Case Study: Sealos

The AI-Native Paradigm Shift in Enterprise Gateway Architecture

Higress as a LLM Gateway: Core Features

Higress as a MCP Gateway: Core Features

Case Study: Ctrip and Ant Group

Higress Joins CNCF Sandbox!

Roadmap

Read previous post:

Read next post:

Alibaba Cloud Native Community

You may also like

Comments

Alibaba Cloud Native Community

Related Products

Alibaba Cloud Model Studio

NAT(NAT Gateway)

Qwen

Alibaba Cloud Service Mesh