All Products
Search
Document Center

Alibaba Cloud Service Mesh:Configure local rate limiting for services

Last Updated:Mar 11, 2026

Local throttling caps the number of requests each pod accepts, protecting services from overload during traffic spikes, resource exhaustion, or denial-of-service attacks. Each Envoy sidecar proxy enforces limits independently using the token bucket algorithm: tokens refill at a fixed interval, and each incoming request consumes one token. When no tokens remain, the proxy rejects the request with HTTP 429 (Too Many Requests).

Because throttling is enforced per pod, the effective cluster-wide limit scales with the number of replicas. For example, a service with 3 replicas and a limit of 10 requests per 60 seconds accepts up to 30 total requests across all instances.

ApproachScopeUse when
Local throttling (this document)Per pod -- each instance enforces its own limit independentlyYou want simple, low-latency protection without external dependencies
Global throttlingShared across all instances -- a central counter tracks the totalYou need a precise cluster-wide limit regardless of replica count

Prerequisites

Before you begin, make sure that you have:

  • A Service Mesh (ASM) instance that meets one of these version requirements:

    • Enterprise Edition or Ultimate Edition: version 1.14.3 or later. To upgrade, see Update an ASM instance

    • Standard Edition: version 1.9 or later. Standard Edition supports only the native Istio rate limiting approach. See Enabling Rate Limits using Envoy in the Istio documentation

  • A Kubernetes cluster added to the ASM instance

  • Automatic sidecar proxy injection enabled for the default namespace. See "Enable automatic sidecar proxy injection" in Manage global namespaces

Deploy sample services

Deploy HTTPBin and sleep as sample services, then verify connectivity.

  1. Create an httpbin.yaml file with the following content:

    httpbin.yaml

    ##################################################################################################
    # Sample HTTPBin service
    ##################################################################################################
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: httpbin
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: httpbin
      labels:
        app: httpbin
        service: httpbin
    spec:
      ports:
      - name: http
        port: 8000
        targetPort: 80
      selector:
        app: httpbin
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: httpbin
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: httpbin
          version: v1
      template:
        metadata:
          labels:
            app: httpbin
            version: v1
        spec:
          serviceAccountName: httpbin
          containers:
          - image: docker.io/kennethreitz/httpbin
            imagePullPolicy: IfNotPresent
            name: httpbin
            ports:
            - containerPort: 80
  2. Deploy the HTTPBin service:

    kubectl apply -f httpbin.yaml -n default
  3. Create a sleep.yaml file with the following content:

    sleep.yaml

    ##################################################################################################
    # Sample sleep service
    ##################################################################################################
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: sleep
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: sleep
      labels:
        app: sleep
        service: sleep
    spec:
      ports:
      - port: 80
        name: http
      selector:
        app: sleep
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: sleep
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: sleep
      template:
        metadata:
          labels:
            app: sleep
        spec:
          terminationGracePeriodSeconds: 0
          serviceAccountName: sleep
          containers:
          - name: sleep
            image: curlimages/curl
            command: ["/bin/sleep", "infinity"]
            imagePullPolicy: IfNotPresent
            volumeMounts:
            - mountPath: /etc/sleep/tls
              name: secret-volume
          volumes:
          - name: secret-volume
            secret:
              secretName: sleep-secret
              optional: true
    ---
  4. Deploy the sleep service:

    kubectl apply -f sleep.yaml -n default
  5. Open a shell in the sleep pod and send a test request:

    kubectl exec -it deploy/sleep -- sh
    curl -I http://httpbin:8000/headers

    Expected output:

    HTTP/1.1 200 OK
    server: envoy
    date: Tue, 26 Dec 2023 07:23:49 GMT
    content-type: application/json
    content-length: 353
    access-control-allow-origin: *
    access-control-allow-credentials: true
    x-envoy-upstream-service-time: 1

    A 200 OK response confirms connectivity between the two services.

Throttle all requests to a specific port

This scenario limits all requests to port 8000 of the HTTPBin service to 10 requests per 60 seconds per pod.

Create the throttling rule

  1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Traffic Management Center > Rate Limiting. Click Create.

  3. On the Create page, configure the following parameters, then click OK.

    SectionParameterValue
    Basic Information About ThrottlingNamespacedefault
    Namehttpbin
    Type of Effective WorkloadApplicable Application
    Relevant WorkloadKey: app, Value: httpbin
    List of Throttling RulesService Port8000 (the HTTP port declared in the HTTPBin Kubernetes Service)
    Throttling ConfigurationTime Window for Throttling Detection60 seconds
    Number of Requests Allowed in Time Window10
  4. Click OK.

    Scenario 1 - throttling rule configuration

YAML equivalent

The equivalent YAML:

apiVersion: istio.alibabacloud.com/v1beta1
kind: ASMLocalRateLimiter
metadata:
  name: httpbin
  namespace: default
spec:
  workloadSelector:
    labels:
      app: httpbin           # Target pods with the app=httpbin label
  isGateway: false            # Apply to application workloads, not gateways
  configs:
    - match:
        vhost:
          name: '*'           # Match all virtual hosts
          port: 8000          # Match requests to port 8000
          route:
            header_match:
              - name: ':path'
                prefix_match: /    # Match all paths
                invert_match: false
      limit:
        fill_interval:
          seconds: 60         # Time window: 60 seconds
        quota: 10             # Allow up to 10 requests per time window per pod

Verify the throttling rule

  1. Open a shell in the sleep pod:

    kubectl exec -it deploy/sleep -- sh
  2. Send 11 requests -- the first 10 consume all available tokens, and the 11th is rejected:

    for i in $(seq 1 11); do curl -s -o /dev/null -w "Request $i: %{http_code}\n" http://httpbin:8000/headers; done

    Expected output (the first 10 return 200, the 11th returns 429):

    Request 1: 200
    ...
    Request 10: 200
    Request 11: 429

    You can also inspect the full response headers of a rejected request:

    curl -v http://httpbin:8000/headers
    < HTTP/1.1 429 Too Many Requests
    < x-local-rate-limit: true
    < content-length: 18
    < content-type: text/plain

    The 429 Too Many Requests status code and x-local-rate-limit: true header confirm that local throttling is active.

Throttle requests to a specific path on a port

This scenario limits only requests to the /headers path on port 8000 of the HTTPBin service. Requests to other paths, such as /get, remain unthrottled.

Create the throttling rule

  1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Traffic Management Center > Rate Limiting. Click Create.

  3. On the Create page, configure the following parameters, then click OK.

    SectionParameterValue
    Basic Information About ThrottlingNamespacedefault
    Namehttpbin
    Type of Effective WorkloadApplicable Application
    Relevant WorkloadKey: app, Value: httpbin
    List of Throttling RulesService Port8000
    Match Request AttributesMatched AttributesRequest Path
    Matching MethodPrefix Match
    Matched Content/headers
    Throttling ConfigurationTime Window for Throttling Detection60 seconds
    Number of Requests Allowed in Time Window10

    Scenario 2 - throttling rule configuration with path matching

Verify the throttling rule

  1. Open a shell in the sleep pod:

    kubectl exec -it deploy/sleep -- sh
  2. Send 11 requests to /headers -- the first 10 succeed, the 11th is rejected:

    for i in $(seq 1 11); do curl -s -o /dev/null -w "Request $i: %{http_code}\n" http://httpbin:8000/headers; done

    Expected output:

    Request 1: 200
    ...
    Request 10: 200
    Request 11: 429
  3. Confirm that requests to other paths are not throttled:

    curl -s -o /dev/null -w "%{http_code}\n" http://httpbin:8000/get

    Expected output:

    200

    The 200 response confirms that throttling applies only to the /headers path.

View local throttling metrics

Local throttling exposes four Envoy counter metrics. Use these to monitor throttling behavior in production.

MetricTypeDescription
envoy_http_local_rate_limiter_http_local_rate_limit_enabledCounterTotal requests evaluated by the throttling filter
envoy_http_local_rate_limiter_http_local_rate_limit_okCounterRequests allowed (tokens available in the bucket)
envoy_http_local_rate_limiter_http_local_rate_limit_rate_limitedCounterRequests with no tokens available (not necessarily rejected -- see enforced)
envoy_http_local_rate_limiter_http_local_rate_limit_enforcedCounterRequests rejected with HTTP 429
Note

The rate_limited count may differ from enforced when the enforcement percentage (filter_enforced) is set below 100%. In that case, some token-exhausted requests are tracked but not rejected.

To collect these metrics with Prometheus:

  1. Configure proxyStatsMatcher on the sidecar proxy. Select Regular Expression Match and set the value to .*http_local_rate_limit.*. Alternatively, click Add Local Throttling Metrics. For details, see proxyStatsMatcher.

  2. Redeploy the HTTPBin Deployment for the updated sidecar configuration to take effect. See "(Optional) Redeploy workloads" in Configure sidecar proxies.

  3. Configure a throttling rule and run request tests as described in Throttle all requests to a specific port or Throttle requests to a specific path on a port.

  4. Query throttling metrics from the HTTPBin sidecar:

    kubectl exec -it deploy/httpbin -c istio-proxy -- curl localhost:15020/stats/prometheus | grep http_local_rate_limit

    Example output:

    envoy_http_local_rate_limiter_http_local_rate_limit_enabled{} 37
    envoy_http_local_rate_limiter_http_local_rate_limit_enforced{} 17
    envoy_http_local_rate_limiter_http_local_rate_limit_ok{} 20
    envoy_http_local_rate_limiter_http_local_rate_limit_rate_limited{} 17

What's next

  • Query parameter matching: In ASM 1.19.0 and later, use the limit_overrides field to match requests by query parameters. See ASMLocalRateLimiter field reference.

  • Global throttling: Enforce a shared limit across all pod instances with ASMGlobalRateLimiter.

  • Ingress gateway throttling: Apply local or global throttling at the ingress gateway.

  • Traffic warm-up: Gradually ramp up traffic to new pods to avoid timeouts during scaling. See Use the warm-up feature.

  • Circuit breaking: Protect services from cascading failures with the connectionPool field. See Configure circuit breaking.