Gateway with Inference Extension is based on Envoy Gateway. You can adjust Gateway parameters, such as the service type, number of deployment replicas, and resources, by modifying the EnvoyProxy resource configuration. This topic describes how to configure the number of replicas and resource usage for Gateways with different scopes.
Configuration description
When you use the Gateway with Inference Extension component to manage generative AI inference services, you create GatewayClass and Gateway resources.
You can define the runtime parameters of a Gateway, such as the number of replicas and resource usage, by associating an EnvoyProxy resource. You can associate the resources in two ways:
Fine-grained configuration (for a single Gateway): Directly associate an
EnvoyProxyresource with a specificGatewayresource to configure theGatewayindependently.Unified configuration (for an entire Gateway Class): Associate an
EnvoyProxyresource with aGatewayClass. This way, allGatewayresources in thisGatewayClassthat are not independently configured inherit these unified resource parameters.
If both configurations exist, the parameters of the independently configured Gateway resource take precedence.
Specify the configuration for a Gateway
You can configure the number of replicas and resource usage for a Gateway by referencing an EnvoyProxy resource in the infrastructure field of the Gateway resource. The following is an example:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: eg
spec:
gatewayClassName: eg
infrastructure:
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: custom-proxy-config
listeners:
- name: http
protocol: HTTP
port: 80
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: custom-proxy-config
spec:
provider:
type: Kubernetes
kubernetes:
envoyDeployment:
replicas: 2
container:
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1
memory: 2GiIn the preceding example, the infrastructure field of the Gateway resource uses the parametersRef field to reference an EnvoyProxy resource named custom-proxy-config. This configures the number of replicas and resource usage.
A Gateway resource can only reference an EnvoyProxy resource in the same namespace.
Specify the configuration for a GatewayClass
You can also configure the number of replicas and resource usage for all Gateway resources that belong to a GatewayClass by referencing an EnvoyProxy resource in the GatewayClass resource. The following is an example:
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: custom-proxy-config
namespace: default
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: custom-proxy-config
namespace: default
spec:
provider:
type: Kubernetes
kubernetes:
envoyDeployment:
replicas: 2
container:
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1
memory: 2GiIn the preceding example, the parametersRef field of the GatewayClass resource references an EnvoyProxy resource named custom-proxy-config to configure the number of replicas and resource usage.
Complete configuration and common fields of the EnvoyProxy resource
The following code provides a complete configuration example for an EnvoyProxy resource:
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: custom-proxy-config
spec:
provider:
type: Kubernetes
kubernetes:
envoyDeployment:
replicas: 2 # If envoyHpa is also configured, you do not need to configure replicas.
strategy:
rollingUpdate:
maxSurge: 2
maxUnavailable: 1
pod:
affinity: ...
tolerations: ...
nodeSelector: ...
container:
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
envoyService:
annotations:
key: value
labels:
key: value
type: LoadBalancer
loadBalancerClass: ...
externalTrafficPolicy: Cluster # or Local
envoyHpa:
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
envoyPDB:
minAvailable: 1If you configure both envoyDeployment and envoyHpa, you do not need to configure replicas under envoyDeployment.
This section lists only some common fields. For the complete definition of the EnvoyProxy resource, see EnvoyProxy.
Field | Type | Required | Description |
envoyDeployment | No | The workload configuration for the custom Gateway. | |
envoyService | No | The service configuration for the custom Gateway. | |
envoyHpa | No | The Horizontal Pod Autoscaler (HPA) configuration for the custom Gateway. | |
envoyPDB | No | The PodDisruptionBudget (PDB) configuration for the custom Gateway. |