Custom Gateway configuration - Container Service for Kubernetes

Gateway with Inference Extension is based on Envoy Gateway. You can adjust Gateway parameters, such as the service type, number of deployment replicas, and resources, by modifying the EnvoyProxy resource configuration. This topic describes how to configure the number of replicas and resource usage for Gateways with different scopes.

Configuration description

When you use the Gateway with Inference Extension component to manage generative AI inference services, you create GatewayClass and Gateway resources.

You can define the runtime parameters of a Gateway, such as the number of replicas and resource usage, by associating an EnvoyProxy resource. You can associate the resources in two ways:

Fine-grained configuration (for a single Gateway): Directly associate an EnvoyProxy resource with a specific Gateway resource to configure the Gateway independently.
Unified configuration (for an entire Gateway Class): Associate an EnvoyProxy resource with a GatewayClass. This way, all Gateway resources in this GatewayClass that are not independently configured inherit these unified resource parameters.

Important

If both configurations exist, the parameters of the independently configured Gateway resource take precedence.

Specify the configuration for a Gateway

You can configure the number of replicas and resource usage for a Gateway by referencing an EnvoyProxy resource in the infrastructure field of the Gateway resource. The following is an example:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: eg
spec:
  gatewayClassName: eg
  infrastructure:
    parametersRef:
      group: gateway.envoyproxy.io
      kind: EnvoyProxy
      name: custom-proxy-config
  listeners:
    - name: http
      protocol: HTTP
      port: 80
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: custom-proxy-config
spec:
  provider:
    type: Kubernetes
    kubernetes:
      envoyDeployment:
        replicas: 2
        container:
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: 1
              memory: 2Gi

In the preceding example, the infrastructure field of the Gateway resource uses the parametersRef field to reference an EnvoyProxy resource named custom-proxy-config. This configures the number of replicas and resource usage.

Important

A Gateway resource can only reference an EnvoyProxy resource in the same namespace.

Specify the configuration for a GatewayClass

You can also configure the number of replicas and resource usage for all Gateway resources that belong to a GatewayClass by referencing an EnvoyProxy resource in the GatewayClass resource. The following is an example:

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: custom-proxy-config
    namespace: default
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: custom-proxy-config
  namespace: default
spec:
  provider:
    type: Kubernetes
    kubernetes:
      envoyDeployment:
        replicas: 2
        container:
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: 1
              memory: 2Gi

In the preceding example, the parametersRef field of the GatewayClass resource references an EnvoyProxy resource named custom-proxy-config to configure the number of replicas and resource usage.

Complete configuration and common fields of the EnvoyProxy resource

The following code provides a complete configuration example for an EnvoyProxy resource:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: custom-proxy-config
spec:
  provider:
    type: Kubernetes
    kubernetes:
      envoyDeployment:
        replicas: 2  # If envoyHpa is also configured, you do not need to configure replicas.
        strategy:
          rollingUpdate:
            maxSurge: 2
            maxUnavailable: 1
        pod:
          affinity: ...
          tolerations: ...
          nodeSelector: ...
        container:
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: 1
              memory: 2Gi
      envoyService:
        annotations:
          key: value
        labels:
          key: value
        type: LoadBalancer
        loadBalancerClass: ...
        externalTrafficPolicy: Cluster # or Local
      envoyHpa:
        minReplicas: 1
        maxReplicas: 10
        metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 80
        - type: Resource
          resource:
            name: memory
            target:
              type: Utilization
              averageUtilization: 80
      envoyPDB:
        minAvailable: 1

Important

If you configure both envoyDeployment and envoyHpa, you do not need to configure replicas under envoyDeployment.

This section lists only some common fields. For the complete definition of the EnvoyProxy resource, see EnvoyProxy.

Field	Type	Required	Description
envoyDeployment	KubernetesDeploymentSpec	No	The workload configuration for the custom Gateway.
envoyService	KubernetesServiceSpec	No	The service configuration for the custom Gateway.
envoyHpa	KubernetesHorizontalPodAutoscalerSpec	No	The Horizontal Pod Autoscaler (HPA) configuration for the custom Gateway.
envoyPDB	KubernetesPodDisruptionBudgetSpec	No	The PodDisruptionBudget (PDB) configuration for the custom Gateway.