Gateway with Inference Extension enhances Kubernetes Gateway API with Inference Extension specifications. It provides Layer 4 and Layer 7 routing services in Kubernetes and delivers intelligent load balancing for large language models (LLMs) inference scenarios. This topic introduces the usage guidelines and release notes of Gateway with Inference Extension.
Introduction
Gateway with Inference Extension is built based on the Envoy Gateway project. It maintains compatibility with Gateway API while integrating its inference extensions. This add-on primarily delivers load balancing and routing capabilities for LLM inference services.
Usage notes
The installation and use of Gateway with Inference Extension depends on the custom resource definitions (CRDs) provided by Gateway API. Before installation, make sure that Gateway API is installed in the cluster.
Release notes
May 2025
Version number | Release date | Description | Impact |
1.4.0-aliyun.1 | 2025-05-27 |
| Gateway pod restarts will occur during updates. We recommend performing these updates during off-peak hours. |
April 2025
Version number | Release date | Description | Impact |
1.3.0-aliyun.2 | 2025-04-07 |
| Gateway pod restarts will occur during updates. We recommend performing these updates during off-peak hours. |