All Products
Search
Document Center

Container Compute Service:Implement global throttling via Gateway with Inference Extension

Last Updated:Sep 26, 2025

Gateway with Inference Extension supports global throttling for clusters to ensure system stability under high concurrency or abnormal traffic conditions. This topic describes how to configure and use the global throttling capabilities of Gateway with Inference Extension.

How it works

Throttling is a mechanism that controls the number of requests sent to a server. It specifies the maximum number of requests a client can make in a given time period, such as 300 requests per minute or 10 requests per second.

When you enable global throttling, Gateway with Inference Extension automatically deploys a centralized throttling service. This service manages and enforces global throttling policies in real time. Gateway with Inference Extension interacts with this throttling service through the built-in throttling filters (such as rate limit) to get real-time preset throttling thresholds (such as requests or concurrent connections per second), and rate-limits incoming requests based on these policies.

Prerequisites

Procedure

Step 1: Enable global throttling

The automatically deployed global throttling service requires a Redis service as global storage. This topic uses a self-built Redis service as an example. You can also use Tair (Redis OSS-compatible) to quickly create a Redis instance and update the related configurations to ack-gateway-config in the envoy-gateway-system namespace. For more information, see Envoy Gateway.

  1. Create a file named redis-service.yaml to deploy Redis.

    kind: Namespace
    apiVersion: v1
    metadata:
      name: redis-system
    ---
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: redis
      namespace: redis-system
      labels:
        app: redis
    spec:
      serviceName: "redis"
      replicas: 1
      selector:
        matchLabels:
          app: redis
      template:
        metadata:
          labels:
            app: redis
        spec:
          containers:
            - image: registry-cn-hangzhou.ack.aliyuncs.com/dev/redis:6.0.6-for-ack-gateway
              name: redis
              ports:
                - containerPort: 6379
              resources:
                limits:
                  cpu: 1500m
                  memory: 512Mi
                requests:
                  cpu: 200m
                  memory: 256Mi
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: redis
      namespace: redis-system
      labels:
        app: redis
    spec:
      ports:
        - name: redis
          port: 6379
          protocol: TCP
          targetPort: 6379
      selector:
        app: redis
    
  2. Create a file named enable-global-rate-limit.yaml. This ConfigMap configures the gateway to use your Redis instance.

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: ack-gateway-config
      namespace: envoy-gateway-system
    data:
      ack-gateway.yaml: |
        apiVersion: gateway.envoyproxy.io/v1alpha1
        kind: EnvoyGateway
        rateLimit:
          backend:
            type: Redis
            redis:
              url: redis.redis-system.svc.cluster.local:6379
  3. Deploy the Redis service and enable global throttling.

    kubectl apply -f redis-service.yaml
    kubectl apply -f enable-global-rate-limit.yaml

Step 2: Deploy a sample HTTPRoute

Create an HTTPRoute resource that will be the target for your throttling policy.

  1. Create a file named httproute.yaml.

    ---
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: http-ratelimit
    spec:
      parentRefs:
      - name: eg
      hostnames:
      - ratelimit.example 
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /
        backendRefs:
        - group: ""
          kind: Service
          name: backend
          port: 3000
    
  2. Deploy the HTTPRoute resource.

    kubectl apply -f httproute.yaml
  3. Get the public IP address of the gateway.

    export GATEWAY_HOST=$(kubectl get gateway/eg -o jsonpath='{.status.addresses[0].value}')

Step 3: Configure and test throttling scenarios

Scenario 1: Throttling by user ID

In this scenario, we will limit requests from a specific user (identified by the header x-user-id: one) to 3 requests per hour.

  1. Create a BackendTrafficPolicy resource in a file named backendtrafficpolicy.yaml.

    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: BackendTrafficPolicy 
    metadata:
      name: policy-httproute
    spec:
      targetRefs:
      - group: gateway.networking.k8s.io
        kind: HTTPRoute
        name: http-ratelimit
      rateLimit:
        type: Global
        global:
          rules:
          - clientSelectors:
            - headers:
              - name: x-user-id
                value: one
            limit:
              requests: 3
              unit: Hour
  2. Apply the throttling policy backendtrafficpolicy.

    kubectl apply -f backendtrafficpolicy.yaml
  3. Test the throttling by sending four requests with the x-user-id: one header.

    for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: one" http://$GATEWAY_HOST/get ; sleep 1; done

    Expected output:

    HTTP/1.1 200 OK 
    content-type: application/json                                                                                                                                             
    x-content-type-options: nosniff                                                                                                                                            
    date: Tue, 27 May 2025 07:47:49 GMT                                                                                                                                        
    content-length: 504                                                                                                                                                        
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 2                                                                                                                                                   
    x-ratelimit-reset: 731                                                                                                                                                     
     
    HTTP/1.1 200 OK 
    content-type: application/json                                                                                                                                             
    x-content-type-options: nosniff                                                                                                                                            
    date: Tue, 27 May 2025 07:47:50 GMT                                                                                                                                        
    content-length: 504                                                                                                                                                        
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 1                                                                                                                                                   
    x-ratelimit-reset: 730                                                                                                                                                     
     
    HTTP/1.1 200 OK 
    content-type: application/json                                                                                                                                             
    x-content-type-options: nosniff                                                                                                                                            
    date: Tue, 27 May 2025 07:47:52 GMT                                                                                                                                        
    content-length: 504                                                                                                                                                        
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 0                                                                                                                                                   
    x-ratelimit-reset: 728                                                                                                                                                     
     
    HTTP/1.1 429 Too Many Requests 
    x-envoy-ratelimited: true                                                                                                                                                  
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 0                                                                                                                                                   
    x-ratelimit-reset: 727                                                                                                                                                     
    date: Tue, 27 May 2025 07:47:52 GMT                                                                                                                                        
    transfer-encoding: chunked 

    The first three requests succeeded (HTTP 200), and the fourth request was rejected (HTTP 429).

  4. Test the throttling by sending four requests with the x-user-id: two header.

    for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: two" http://$GATEWAY_HOST/get ; sleep 1; done

    Expected output:

    HTTP/1.1 200 OK
    content-type: application/json
    x-content-type-options: nosniff
    date: Tue, 27 May 2025 07:50:11 GMT
    content-length: 504
    
    HTTP/1.1 200 OK
    content-type: application/json
    x-content-type-options: nosniff
    date: Tue, 27 May 2025 07:50:12 GMT
    content-length: 504
    
    HTTP/1.1 200 OK
    content-type: application/json
    x-content-type-options: nosniff
    date: Tue, 27 May 2025 07:50:14 GMT
    content-length: 504
    
    HTTP/1.1 200 OK
    content-type: application/json
    x-content-type-options: nosniff
    date: Tue, 27 May 2025 07:50:15 GMT
    content-length: 504

    All four requests returned 200, indicating that the throttling policy did not limit requests with the x-user-id: two header.

Scenario 2: Throttling all users except administrators

In this scenario, we will limit every unique user to 3 requests per hour, except for the user admin.

  1. Edit the throttling policy.

    kubectl edit BackendTrafficPolicy policy-httproute

    Update the rateLimit section as follows, where the invert: true flag excludes the admin user from the policy:

    ...
      rateLimit:
        type: Global
        global:
          rules:
          - clientSelectors:
            - headers:
              - type: Distinct
                name: x-user-id
              - name: x-user-id
                value: admin
                invert: true
            limit:
              requests: 3
              unit: Hour

    After saving and exiting, the throttling policy takes effect immediately.

  2. Test the throttling by sending four requests with the x-user-id: one header.

    for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: one" http://$GATEWAY_HOST/get ; sleep 1; done

    Expected output:

    HTTP/1.1 200 OK 
    content-type: application/json                                                                                                                                             
    x-content-type-options: nosniff                                                                                                                                            
    date: Tue, 27 May 2025 07:47:49 GMT                                                                                                                                        
    content-length: 504                                                                                                                                                        
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 2                                                                                                                                                   
    x-ratelimit-reset: 731                                                                                                                                                     
     
    HTTP/1.1 200 OK 
    content-type: application/json                                                                                                                                             
    x-content-type-options: nosniff                                                                                                                                            
    date: Tue, 27 May 2025 07:47:50 GMT                                                                                                                                        
    content-length: 504                                                                                                                                                        
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 1                                                                                                                                                   
    x-ratelimit-reset: 730                                                                                                                                                     
     
    HTTP/1.1 200 OK 
    content-type: application/json                                                                                                                                             
    x-content-type-options: nosniff                                                                                                                                            
    date: Tue, 27 May 2025 07:47:52 GMT                                                                                                                                        
    content-length: 504                                                                                                                                                        
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 0                                                                                                                                                   
    x-ratelimit-reset: 728                                                                                                                                                     
     
    HTTP/1.1 429 Too Many Requests 
    x-envoy-ratelimited: true                                                                                                                                                  
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 0                                                                                                                                                   
    x-ratelimit-reset: 727                                                                                                                                                     
    date: Tue, 27 May 2025 07:47:52 GMT                                                                                                                                        
    transfer-encoding: chunked 

    The first three requests succeeded (HTTP 200), and the fourth request was rejected (HTTP 429).

  3. Test the throttling by sending four requests with the x-user-id: two header.

    for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: two" http://$GATEWAY_HOST/get ; sleep 1; done

    Expected output:

    HTTP/1.1 200 OK
    content-type: application/json
    x-content-type-options: nosniff
    date: Tue, 27 May 2025 07:53:38 GMT
    content-length: 504
    x-ratelimit-limit: 3, 3;w=3600
    x-ratelimit-remaining: 2
    x-ratelimit-reset: 382
    
    HTTP/1.1 200 OK
    content-type: application/json
    x-content-type-options: nosniff
    date: Tue, 27 May 2025 07:53:39 GMT
    content-length: 504
    x-ratelimit-limit: 3, 3;w=3600
    x-ratelimit-remaining: 1
    x-ratelimit-reset: 381
    
    HTTP/1.1 200 OK
    content-type: application/json
    x-content-type-options: nosniff
    date: Tue, 27 May 2025 07:53:41 GMT
    content-length: 504
    x-ratelimit-limit: 3, 3;w=3600
    x-ratelimit-remaining: 0
    x-ratelimit-reset: 379
    
    HTTP/1.1 429 Too Many Requests
    x-envoy-ratelimited: true
    x-ratelimit-limit: 3, 3;w=3600
    x-ratelimit-remaining: 0
    x-ratelimit-reset: 378
    date: Tue, 27 May 2025 07:53:41 GMT
    transfer-encoding: chunked

    The first three requests succeeded (HTTP 200), and the fourth request was rejected (HTTP 429).

  4. Test the throttling by sending four requests with the x-user-id: admin header.

    for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: admin" http://$GATEWAY_HOST/get ; sleep 1; done

    Expected output:

    HTTP/1.1 200 OK
    content-type: application/json
    x-content-type-options: nosniff
    date: Tue, 27 May 2025 07:57:44 GMT
    content-length: 506
    
    HTTP/1.1 200 OK
    content-type: application/json
    x-content-type-options: nosniff
    date: Tue, 27 May 2025 07:57:45 GMT
    content-length: 506
    
    HTTP/1.1 200 OK
    content-type: application/json
    x-content-type-options: nosniff
    date: Tue, 27 May 2025 07:57:46 GMT
    content-length: 506
    
    HTTP/1.1 200 OK
    content-type: application/json
    x-content-type-options: nosniff
    date: Tue, 27 May 2025 07:57:47 GMT
    content-length: 506

    All four requests returned 200, indicating that the throttling policy did not limit requests with the x-user-id: admin header.

Scenario 3: Throttling all requests

In this scenario, we will limit all requests to 3 requests per hour.

  1. Edit the throttling policy.

    kubectl edit BackendTrafficPolicy policy-httproute

    Update the rateLimit section with the following:

    ...
      rateLimit:
        type: Global
        global:
          rules:
          - limit:
              requests: 3
              unit: Hour

    After saving and exiting, the throttling policy takes effect immediately.

  2. Test the throttling by sending four requests.

    for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" http://$GATEWAY_HOST/get ; sleep 1; done

    Expected output:

    HTTP/1.1 200 OK 
    content-type: application/json                                                                                                                                             
    x-content-type-options: nosniff                                                                                                                                            
    date: Tue, 27 May 2025 08:02:53 GMT                                                                                                                                        
    content-length: 473                                                                                                                                                        
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 2                                                                                                                                                   
    x-ratelimit-reset: 3427                                                                                                                                                    
     
    HTTP/1.1 200 OK 
    content-type: application/json                                                                                                                                             
    x-content-type-options: nosniff                                                                                                                                            
    date: Tue, 27 May 2025 08:02:55 GMT                                                                                                                                        
    content-length: 473                                                                                                                                                        
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 1                                                                                                                                                   
    x-ratelimit-reset: 3425                                                                                                                                                    
     
    HTTP/1.1 200 OK 
    content-type: application/json                                                                                                                                             
    x-content-type-options: nosniff                                                                                                                                            
    date: Tue, 27 May 2025 08:02:56 GMT                                                                                                                                        
    content-length: 473                                                                                                                                                        
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 0                                                                                                                                                   
    x-ratelimit-reset: 3424                                                                                                                                                    
     
    HTTP/1.1 429 Too Many Requests 
    x-envoy-ratelimited: true                                                                                                                                                  
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 0                                                                                                                                                   
    x-ratelimit-reset: 3423                                                                                                                                                    
    date: Tue, 27 May 2025 08:02:57 GMT                                                                                                                                        
    transfer-encoding: chunked 

    The first three requests succeeded (HTTP 200), and the fourth request was rejected (HTTP 429), indicating that the throttling policy has taken effect.

Scenario 4: Throttling by client IP address

In this scenario, we will limit each unique source IP address within an IP range to 3 requests per hour.

Note

The IP range 0.0.0.0/0 is used for demonstration purposes. Modify this setting as needed.

  1. Edit the throttling policy.

    kubectl edit BackendTrafficPolicy policy-httproute

    Update the rateLimit section with the following:

    ...
      rateLimit:
        type: Global
        global:
          rules:
          - clientSelectors:
            - sourceCIDR: 
                value: 0.0.0.0/0
                type: Distinct
            limit:
              requests: 3
              unit: Hour

    After saving and exiting, the throttling rule takes effect immediately.

  2. Test the throttling by sending four requests.

    for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" http://$GATEWAY_HOST/get ; sleep 1; done

    Expected output:

    HTTP/1.1 200 OK 
    content-type: application/json                                                                                                                                             
    x-content-type-options: nosniff                                                                                                                                            
    date: Tue, 27 May 2025 08:02:53 GMT                                                                                                                                        
    content-length: 473                                                                                                                                                        
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 2                                                                                                                                                   
    x-ratelimit-reset: 3427                                                                                                                                                    
     
    HTTP/1.1 200 OK 
    content-type: application/json                                                                                                                                             
    x-content-type-options: nosniff                                                                                                                                            
    date: Tue, 27 May 2025 08:02:55 GMT                                                                                                                                        
    content-length: 473                                                                                                                                                        
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 1                                                                                                                                                   
    x-ratelimit-reset: 3425                                                                                                                                                    
     
    HTTP/1.1 200 OK 
    content-type: application/json                                                                                                                                             
    x-content-type-options: nosniff                                                                                                                                            
    date: Tue, 27 May 2025 08:02:56 GMT                                                                                                                                        
    content-length: 473                                                                                                                                                        
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 0                                                                                                                                                   
    x-ratelimit-reset: 3424                                                                                                                                                    
     
    HTTP/1.1 429 Too Many Requests 
    x-envoy-ratelimited: true                                                                                                                                                  
    x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
    x-ratelimit-remaining: 0                                                                                                                                                   
    x-ratelimit-reset: 3423                                                                                                                                                    
    date: Tue, 27 May 2025 08:02:57 GMT                                                                                                                                        
    transfer-encoding: chunked 

    The first three requests succeeded (HTTP 200), and the fourth request was rejected (HTTP 429), indicating that the throttling policy for 0.0.0.0/0 has taken effect.

(Optional) Step 4: Clean up resources

  1. Clean up the throttling policy.

    kubectl delete BackendTrafficPolicy policy-httproute
  2. Clean up other resources created in this topic.

    kubectl delete -f httproute.yaml
    kubectl delete -f redis-service.yaml
    kubectl delete -f enable-global-rate-limit.yaml