The Gateway with Inference Extension component is an enhanced component based on the Kubernetes Gateway API and its Inference Extension specification. It provides intelligent load balancing for large language model (LLM) inference scenarios and supports Kubernetes Layer 4 and Layer 7 routing services. This topic introduces the Gateway with Inference Extension component, explains how to use it, and provides its change log.
Component information
Built on the Envoy Gateway project, the Gateway with Inference Extension component is compatible with Gateway API features and integrates the inference extension from the Gateway API. It primarily provides load balancing and routing for LLM inference services.
Usage instructions
The Gateway with Inference Extension component depends on the CustomResourceDefinitions (CRDs) provided by the Gateway API component. Before you install the component, make sure that the Gateway API component is installed in your cluster. For more information, see Install components.
For more information about how to use the Gateway with Inference Extension component, see Gateway with Inference Extension overview.
Change log
September 2025
Version number | Change date | Changes | Impact |
v1.4.0-apsara.3 | September 4, 2025 |
| Upgrading from an earlier version restarts the gateway pod. Perform the upgrade during off-peak hours. |
May 2025
Version number | Change date | Changes | Impact |
v1.4.0-aliyun.1 | May 27, 2025 |
| Upgrading from an earlier version restarts the gateway pod. Perform the upgrade during off-peak hours. |
April 2025
Version number | Change date | Changes | Impact |
v1.3.0-aliyun.2 | May 7, 2025 |
| Upgrading from an earlier version restarts the gateway pod. Perform the upgrade during off-peak hours. |
March 2025
Version number | Change date | Changes | Impact |
v1.3.0-aliyun.1 | March 12, 2025 |
| This upgrade does not affect your services. |