Gateway API Architecture for ACK North-South Traffic - Container Service for Kubernetes

The Gateway API is the official traffic management standard from the Kubernetes community. It supports both Ingress-based north-south traffic management and Service Mesh-based east-west traffic management. This topic describes the core concepts of the Gateway API and its solutions for managing north-south traffic.

Core concepts

The Gateway API uses a role-oriented, layered design. This design decouples infrastructure provisioning, cluster O&M, and application routing. The Gateway API includes the following core resources:

GatewayClass (Infrastructure layer): Similar to IngressClass, this resource defines the gateway controller type, such as ALB or Envoy Gateway, and its general configurations.
Gateway (O&M layer): Defines a specific gateway instance and describes its network listener rules, such as ports, protocols, and TLS configurations.
HTTPRoute, GRPCRoute, and other route types (Application layer): These resources define specific traffic routing rules, such as path matching, header modification, and traffic weighting. These rules are attached to a backend service.
Policy (Policy layer): This resource defines a set of specific configurations or behaviors, such as circuit breaking, rate limiting, and JWT authentication. It can be attached to a specific gateway, route, or backend service.

Gateway API solutions in ACK

In an ACK cluster, you can choose a Gateway API solution based on your business needs:

Solution	Gateway with Inference Extension	ALB (Application Load Balancer)
Overview	Gateway with Inference Extension is a component built on the open source Envoy Gateway project. It is optimized for cloud-native and AI inference scenarios. It listens for Gateway API resources to dynamically create and delete gateways and manage north-south traffic for the cluster. This component is not fully managed. It is deployed on the nodes of your cluster and requires you to perform O&M. It does not come with a cloud product Service-Level Agreement (SLA).	ALB Ingress Controller supports the Gateway API in versions v2.17.0 and later. Configure resources such as Gateway and HTTPRoute to route external application layer traffic to workloads (pods) that are managed by services inside the cluster. This setup manages the cluster's north-south traffic. The ALB Ingress Controller listens for changes to Gateway API resources. It then converts these changes into listener rules, routing rules, and server group configurations for the underlying ALB instance in real time.
Scenarios	General traffic and AI inference scenarios	General traffic scenarios
Core advantages	Standard open source Envoy architecture: Stays consistent with the latest upstream version from the Envoy Gateway community. The underlying layer is based on the Envoy proxy. Its mature ecosystem and high performance can easily handle large-scale traffic challenges. Rich traffic management features: Supports multiple rules for rate limiting and circuit breaking. It provides powerful routing capabilities, including at the zone level. It also supports common features such as fault injection, traffic mirroring, compression, and caching. AI inference extension : Provides model-aware load balancing for large model inference services. It supports multiple model-based scheduling policies, such as LoRA/KV Cache-aware scheduling, Waiting Request Num, and Model Priority. It also supports phased releases for multiple model versions. Flexible extensibility: Natively integrates with Envoy Gateway to support multiple extension methods, such as EnvoyFilter, Lua, Wasm, and ExtProc. This meets special traffic management needs.	Fully managed: Uses an integrated, fully managed design for both the gateway and its components. This significantly reduces O&M complexity and costs. High performance and instant elasticity: Built on Alibaba Cloud's self-developed network virtualization stack. It delivers excellent queries per second (QPS) throughput and concurrent connection processing performance. It also supports automatic scaling based on traffic loads to handle traffic spikes. Powerful application layer routing: Designed for application layer load scenarios. It is deeply integrated with container services and supports a wide range of advanced routing rules. Rolling updates and zero downtime: Supports rolling updates. Use the OpenAPI to dynamically adjust forwarding rules. Changes take effect in real time without restarting the instance, which ensures zero downtime for service traffic. Broad scenario support: Suitable for scenarios with persistent connections, high numbers of concurrent connections, high QPS, and traffic peaks and troughs. It is also ideal for scenarios that require rolling updates and hot upgrades, active zone-redundancy, and active geo-redundancy disaster recovery.
References	Manage general traffic using Gateway with Inference Extension Manage LLM traffic using Gateway with Inference Extension	Manage general traffic using ALB