Gateway with Inference Extension supports global throttling for clusters to ensure system stability under high concurrency or abnormal traffic conditions. This topic describes how to configure and use the global throttling capabilities of Gateway with Inference Extension.
How it works
Throttling is a mechanism that controls the number of requests sent to a server. It specifies the maximum number of requests a client can make in a given time period, such as 300 requests per minute or 10 requests per second.
When you enable global throttling, Gateway with Inference Extension automatically deploys a centralized throttling service. This service manages and enforces global throttling policies in real time. Gateway with Inference Extension interacts with this throttling service through the built-in throttling filters (such as rate limit) to get real-time preset throttling thresholds (such as requests or concurrent connections per second), and rate-limits incoming requests based on these policies.
Prerequisites
You have installed Gateway with Inference Extension version 1.4.0 or later.
You have completed the steps in Preparations.
Procedure
Step 1: Enable global throttling
The automatically deployed global throttling service requires a Redis service as global storage. This topic uses a self-built Redis service as an example. You can also use Tair (Redis OSS-compatible) to quickly create a Redis instance and update the related configurations to ack-gateway-config in the envoy-gateway-system namespace. For more information, see Envoy Gateway.
Create a file named
redis-service.yamlto deploy Redis.kind: Namespace apiVersion: v1 metadata: name: redis-system --- apiVersion: apps/v1 kind: StatefulSet metadata: name: redis namespace: redis-system labels: app: redis spec: serviceName: "redis" replicas: 1 selector: matchLabels: app: redis template: metadata: labels: app: redis spec: containers: - image: registry-cn-hangzhou.ack.aliyuncs.com/dev/redis:6.0.6-for-ack-gateway name: redis ports: - containerPort: 6379 resources: limits: cpu: 1500m memory: 512Mi requests: cpu: 200m memory: 256Mi --- apiVersion: v1 kind: Service metadata: name: redis namespace: redis-system labels: app: redis spec: ports: - name: redis port: 6379 protocol: TCP targetPort: 6379 selector: app: redisCreate a file named
enable-global-rate-limit.yaml. This ConfigMap configures the gateway to use your Redis instance.apiVersion: v1 kind: ConfigMap metadata: name: ack-gateway-config namespace: envoy-gateway-system data: ack-gateway.yaml: | apiVersion: gateway.envoyproxy.io/v1alpha1 kind: EnvoyGateway rateLimit: backend: type: Redis redis: url: redis.redis-system.svc.cluster.local:6379Deploy the Redis service and enable global throttling.
kubectl apply -f redis-service.yaml kubectl apply -f enable-global-rate-limit.yaml
Step 2: Deploy a sample HTTPRoute
Create an HTTPRoute resource that will be the target for your throttling policy.
Create a file named
httproute.yaml.--- apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: http-ratelimit spec: parentRefs: - name: eg hostnames: - ratelimit.example rules: - matches: - path: type: PathPrefix value: / backendRefs: - group: "" kind: Service name: backend port: 3000Deploy the HTTPRoute resource.
kubectl apply -f httproute.yamlGet the public IP address of the gateway.
export GATEWAY_HOST=$(kubectl get gateway/eg -o jsonpath='{.status.addresses[0].value}')
Step 3: Configure and test throttling scenarios
Scenario 1: Throttling by user ID
In this scenario, we will limit requests from a specific user (identified by the header x-user-id: one) to 3 requests per hour.
Create a
BackendTrafficPolicyresource in a file namedbackendtrafficpolicy.yaml.apiVersion: gateway.envoyproxy.io/v1alpha1 kind: BackendTrafficPolicy metadata: name: policy-httproute spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: http-ratelimit rateLimit: type: Global global: rules: - clientSelectors: - headers: - name: x-user-id value: one limit: requests: 3 unit: HourApply the throttling policy
backendtrafficpolicy.kubectl apply -f backendtrafficpolicy.yamlTest the throttling by sending four requests with the
x-user-id: oneheader.for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: one" http://$GATEWAY_HOST/get ; sleep 1; doneExpected output:
HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:47:49 GMT content-length: 504 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 2 x-ratelimit-reset: 731 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:47:50 GMT content-length: 504 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 1 x-ratelimit-reset: 730 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:47:52 GMT content-length: 504 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 0 x-ratelimit-reset: 728 HTTP/1.1 429 Too Many Requests x-envoy-ratelimited: true x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 0 x-ratelimit-reset: 727 date: Tue, 27 May 2025 07:47:52 GMT transfer-encoding: chunkedThe first three requests succeeded (HTTP 200), and the fourth request was rejected (HTTP 429).
Test the throttling by sending four requests with the
x-user-id: twoheader.for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: two" http://$GATEWAY_HOST/get ; sleep 1; doneExpected output:
HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:50:11 GMT content-length: 504 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:50:12 GMT content-length: 504 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:50:14 GMT content-length: 504 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:50:15 GMT content-length: 504All four requests returned
200, indicating that the throttling policy did not limit requests with thex-user-id: twoheader.
Scenario 2: Throttling all users except administrators
In this scenario, we will limit every unique user to 3 requests per hour, except for the user admin.
Edit the throttling policy.
kubectl edit BackendTrafficPolicy policy-httprouteUpdate the
rateLimitsection as follows, where theinvert: trueflag excludes the admin user from the policy:... rateLimit: type: Global global: rules: - clientSelectors: - headers: - type: Distinct name: x-user-id - name: x-user-id value: admin invert: true limit: requests: 3 unit: HourAfter saving and exiting, the throttling policy takes effect immediately.
Test the throttling by sending four requests with the
x-user-id: oneheader.for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: one" http://$GATEWAY_HOST/get ; sleep 1; doneExpected output:
HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:47:49 GMT content-length: 504 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 2 x-ratelimit-reset: 731 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:47:50 GMT content-length: 504 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 1 x-ratelimit-reset: 730 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:47:52 GMT content-length: 504 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 0 x-ratelimit-reset: 728 HTTP/1.1 429 Too Many Requests x-envoy-ratelimited: true x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 0 x-ratelimit-reset: 727 date: Tue, 27 May 2025 07:47:52 GMT transfer-encoding: chunkedThe first three requests succeeded (HTTP 200), and the fourth request was rejected (HTTP 429).
Test the throttling by sending four requests with the
x-user-id: twoheader.for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: two" http://$GATEWAY_HOST/get ; sleep 1; doneExpected output:
HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:53:38 GMT content-length: 504 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 2 x-ratelimit-reset: 382 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:53:39 GMT content-length: 504 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 1 x-ratelimit-reset: 381 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:53:41 GMT content-length: 504 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 0 x-ratelimit-reset: 379 HTTP/1.1 429 Too Many Requests x-envoy-ratelimited: true x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 0 x-ratelimit-reset: 378 date: Tue, 27 May 2025 07:53:41 GMT transfer-encoding: chunkedThe first three requests succeeded (HTTP 200), and the fourth request was rejected (HTTP 429).
Test the throttling by sending four requests with the
x-user-id: adminheader.for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: admin" http://$GATEWAY_HOST/get ; sleep 1; doneExpected output:
HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:57:44 GMT content-length: 506 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:57:45 GMT content-length: 506 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:57:46 GMT content-length: 506 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 07:57:47 GMT content-length: 506All four requests returned
200, indicating that the throttling policy did not limit requests with thex-user-id: adminheader.
Scenario 3: Throttling all requests
In this scenario, we will limit all requests to 3 requests per hour.
Edit the throttling policy.
kubectl edit BackendTrafficPolicy policy-httprouteUpdate the
rateLimitsection with the following:... rateLimit: type: Global global: rules: - limit: requests: 3 unit: HourAfter saving and exiting, the throttling policy takes effect immediately.
Test the throttling by sending four requests.
for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" http://$GATEWAY_HOST/get ; sleep 1; doneExpected output:
HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 08:02:53 GMT content-length: 473 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 2 x-ratelimit-reset: 3427 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 08:02:55 GMT content-length: 473 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 1 x-ratelimit-reset: 3425 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 08:02:56 GMT content-length: 473 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 0 x-ratelimit-reset: 3424 HTTP/1.1 429 Too Many Requests x-envoy-ratelimited: true x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 0 x-ratelimit-reset: 3423 date: Tue, 27 May 2025 08:02:57 GMT transfer-encoding: chunkedThe first three requests succeeded (HTTP 200), and the fourth request was rejected (HTTP 429), indicating that the throttling policy has taken effect.
Scenario 4: Throttling by client IP address
In this scenario, we will limit each unique source IP address within an IP range to 3 requests per hour.
The IP range 0.0.0.0/0 is used for demonstration purposes. Modify this setting as needed.
Edit the throttling policy.
kubectl edit BackendTrafficPolicy policy-httprouteUpdate the
rateLimitsection with the following:... rateLimit: type: Global global: rules: - clientSelectors: - sourceCIDR: value: 0.0.0.0/0 type: Distinct limit: requests: 3 unit: HourAfter saving and exiting, the throttling rule takes effect immediately.
Test the throttling by sending four requests.
for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" http://$GATEWAY_HOST/get ; sleep 1; doneExpected output:
HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 08:02:53 GMT content-length: 473 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 2 x-ratelimit-reset: 3427 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 08:02:55 GMT content-length: 473 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 1 x-ratelimit-reset: 3425 HTTP/1.1 200 OK content-type: application/json x-content-type-options: nosniff date: Tue, 27 May 2025 08:02:56 GMT content-length: 473 x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 0 x-ratelimit-reset: 3424 HTTP/1.1 429 Too Many Requests x-envoy-ratelimited: true x-ratelimit-limit: 3, 3;w=3600 x-ratelimit-remaining: 0 x-ratelimit-reset: 3423 date: Tue, 27 May 2025 08:02:57 GMT transfer-encoding: chunkedThe first three requests succeeded (HTTP 200), and the fourth request was rejected (HTTP 429), indicating that the throttling policy for
0.0.0.0/0has taken effect.
(Optional) Step 4: Clean up resources
Clean up the throttling policy.
kubectl delete BackendTrafficPolicy policy-httprouteClean up other resources created in this topic.
kubectl delete -f httproute.yaml kubectl delete -f redis-service.yaml kubectl delete -f enable-global-rate-limit.yaml