All Products
Search
Document Center

Container Service for Kubernetes:Gateway API overview

Last Updated:Feb 09, 2026

The Gateway API is the official traffic management standard of the Kubernetes community. It supports both Ingress-based north-south traffic management and Service Mesh-based east-west traffic management. This topic describes the core concepts of the Gateway API and its solutions for north-south traffic management.

Core concepts

Gateway API uses a role-oriented, layered design that decouples infrastructure provisioning, cluster operations, and application routing. It includes the following core resources:

  • GatewayClass (Infrastructure layer): Similar to IngressClass. It defines the gateway controller type, such as ALB or Envoy Gateway, and general configurations.

  • Gateway (Operations layer): Defines a specific gateway instance and its network listener rules, such as the port, protocol, and TLS configurations.

  • HTTPRoute, GRPCRoute, etc. (Application layer): Define specific traffic routing rules, such as path matching, header modification, and traffic weighting, and specify a backend service.

  • Policy (Policy layer): Defines a set of specific configurations or behaviors, such as circuit breaking, rate limiting, and JWT authentication. Policies can be attached to a specified gateway, route, or backend service.

image

Gateway API solutions in ACK

In an ACK cluster, you can select a suitable Gateway API solution based on your business needs:

Solution

Gateway with Inference Extension

ALB (Application Load Balancer)

Overview

Gateway with Inference Extension is a component built on the open source Envoy Gateway project. It is optimized for cloud-native and AI inference scenarios. It listens for Gateway API resources to dynamically create and delete gateways, managing north-south traffic for the cluster. This component is unmanaged. It is deployed on the nodes of your cluster and requires self-maintenance. It does not come with a cloud product Service-Level Agreement (SLA) guarantee.

ALB Ingress Controller has supported Gateway API since version v2.17.0. Configure resources such as Gateway and HTTPRoute to route external Layer 7 traffic to workloads (pods) managed by services within the cluster. This manages the cluster's north-south traffic. The ALB Ingress Controller listens for changes to Gateway API resources and converts them in real-time into listener rules, routing rules, and server group configurations for the underlying ALB instance.

Scenarios

General traffic and AI inference scenarios

General traffic scenarios

Core advantages

  • Open source Envoy standard architecture: Stays consistent with the latest upstream version from the Envoy Gateway community. It is built on the Envoy proxy. Its mature ecosystem and high performance help you easily handle large-scale traffic challenges.

  • Rich traffic management features: Supports multiple rate limiting and circuit breaking rules. Supports powerful routing capabilities, including at the zone level. Supports common features such as fault injection, traffic mirroring, compression, and caching.

  • AI inference extension: Provides model-aware load balancing for large model inference services. Supports various model-based scheduling policies, such as LoRA/KV Cache-aware scheduling, Waiting Request Num, and Model Priority. Supports phased release capabilities for multiple model versions.

  • Flexible extensibility: Natively integrates with Envoy Gateway to support various extension methods like EnvoyFilter, Lua, Wasm, and ExtProc to meet special traffic management needs.

  • Fully managed: Uses an integrated, fully managed design for both the gateway and its components. This significantly reduces O&M complexity and costs.

  • High performance and instant elasticity: Built on Alibaba Cloud's self-developed network virtualization stack. It delivers excellent queries per second (QPS) throughput and concurrent connection processing performance. It also supports automatic scale-out based on traffic loads to handle traffic spikes.

  • Powerful application layer routing: Designed for application layer load scenarios. It is deeply integrated with container service and supports a wide range of advanced routing rules.

  • Hot updates and zero downtime: Supports hot updates for configurations. Dynamically adjust forwarding rules through OpenAPI. Changes take effect in real-time without restarting instances, ensuring zero downtime for service traffic.

  • Wide range of supported scenarios: Suitable for scenarios such as persistent connections, high concurrent connections, high QPS, fluctuating traffic, reliance on hot updates and hot upgrades, and active zone-redundancy or active geo-redundancy disaster recovery.

References

Manage general traffic using ALB