All Products
Search
Document Center

Alibaba Cloud Service Mesh:Configure local throttling in ASM

Last Updated:Jun 05, 2023

Throttling is a mechanism that limits the number of requests sent to a service. Envoy uses the token bucket algorithm to implement local throttling. This topic describes how to configure local throttling for a Service Mesh (ASM) instance.

Prerequisites

  • An ASM instance is created and meets the following requirements:

    • If the ASM instance is of Enterprise Edition or Ultimate Edition, the version of the ASM instance must be 1.14.3 or later. If the version of the ASM instance is earlier than 1.14.3, upgrade the ASM instance. For more information, see Update an ASM instance.

    • If the ASM instance is of Standard Edition, the version of the ASM instance must be 1.9 or later. In addition, you can use only the native rate limiting feature of Istio to implement local throttling for the ASM instance. The reference document varies with the Istio version. For more information about how to configure local throttling for the latest Istio version, see Enabling Rate Limits using Envoy.

  • An ACK managed cluster is created. For more information, see Create an ACK managed cluster.
  • The cluster is added to the ASM instance. For more information, see Add a cluster to an ASM instance.
  • Automatic sidecar injection is enabled for the default namespace in the ACK cluster. For more information, see Enable automatic sidecar proxy injection.

What is throttling?

Concept of throttling

Throttling is a mechanism that limits the number of requests sent to a service. It specifies the maximum number of requests that clients can send to a service in a given period of time, such as 300 requests per minute or 10 requests per second. The aim of throttling is to prevent a service from being overloaded because it receives excessive requests from a specific client IP address or from global clients.

For example, if you limit the number of requests sent to a service to 300 per minute, the 301st request is denied. At the same time, the 429 HTTP status code that indicates excessive requests is returned.

Throttling modes

Envoy proxies implement throttling in the following modes.

Mode

Description

Global or distributed throttling

  • Global throttling limits the number of requests sent to multiple services. In this mode, all the services in a cluster share the throttling configuration. Generally, global throttling requires an external component, such as a Redis database.

  • Global throttling is typically used in scenarios where many clients send requests to a smaller number of services. In this case, the requests may interrupt the services. Global throttling can help prevent cascading failures. For example, you can configure global throttling for the ingress gateway to limit the total number of requests sent to an ASM instance. Then, you can configure local throttling to limit the number of requests sent to specific services in the ASM instance.

Local throttling

  • Local throttling is configured on a per Envoy process basis. An Envoy process is a pod in which an Envoy proxy is injected. The configuration of local throttling is simpler than that of global throttling and it does not require an additional component. If you configure both local throttling and global throttling for an ASM instance, the local rate limit is applied first. If the local rate limit is not reached, the global rate limit is applied.

    • Assume that local throttling limits the number of requests from a specific client to 50 per minute whereas the global throttling limit is 60 requests per minute. When the number of requests from the client exceeds 50, the excessive requests are denied even though the number of requests has not reached the global throttling limit.

    • Assume that local throttling limits the number of requests from a specific client to 50 per minute, and the global throttling limit is 40 requests per minute. When the number of requests from the client exceeds 40, the excessive requests are denied due to global throttling, even though the number of requests has not reached the local throttling limit.

  • If the pod on which the local throttling limit is configured has multiple replicas, each of those replicas has its own limit. That is, requests may be limited on one replica but not limited on another.

How local throttling works

Envoy uses the token bucket algorithm to implement local throttling. The token bucket algorithm is a method that limits the number of requests sent to services based on a certain number of tokens in a bucket. Tokens fill in the bucket at a constant rate. When a request is sent to a service, a token is removed from the bucket. When the bucket is empty, requests are denied. Generally, you need to specify the following parameters:

  • The interval at which the bucket is filled

  • The number of tokens added to the bucket each time

By default, Envoy returns the 429 HTTP status code when a request is denied and the x-envoy-ratelimited header is set. You can customize the HTTP status code and response header.

Take note of the following concepts when you use the throttling feature:

  • http_filter_enabled: indicates the percentage of requests for which the local rate limit is checked but not enforced.

  • http_filter_enforcing: indicates the percentage of requests on which the local rate limiter is applied or enforced.

Set the values to percentages. For example, you can set http_filter_enabled to 10% of requests and http_filter_enforcing to 5% of requests. This way, you can test the effect of throttling before it is applied to all the requests.

Configure local throttling

Step 1: Deploy sample services

  1. Create an httpbin.yaml file that contains the following content:

    ##################################################################################################
    # Example httpbin service 
    ##################################################################################################
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: httpbin
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: httpbin
      labels:
        app: httpbin
        service: httpbin
    spec:
      ports:
      - name: http
        port: 8000
        targetPort: 80
      selector:
        app: httpbin
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: httpbin
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: httpbin
          version: v1
      template:
        metadata:
          labels:
            app: httpbin
            version: v1
        spec:
          serviceAccountName: httpbin
          containers:
          - image: docker.io/kennethreitz/httpbin
            imagePullPolicy: IfNotPresent
            name: httpbin
            ports:
            - containerPort: 80
  2. Run the following command to create the httpbin service:

    kubectl apply -f httpbin.yaml -n default
  3. Create a sleep.yaml file that contains the following content:

    ##################################################################################################
    # Example sleep service 
    ##################################################################################################
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: sleep
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: sleep
      labels:
        app: sleep
        service: sleep
    spec:
      ports:
      - port: 80
        name: http
      selector:
        app: sleep
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: sleep
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: sleep
      template:
        metadata:
          labels:
            app: sleep
        spec:
          terminationGracePeriodSeconds: 0
          serviceAccountName: sleep
          containers:
          - name: sleep
            image: curlimages/curl
            command: ["/bin/sleep", "infinity"]
            imagePullPolicy: IfNotPresent
            volumeMounts:
            - mountPath: /etc/sleep/tls
              name: secret-volume
          volumes:
          - name: secret-volume
            secret:
              secretName: sleep-secret
              optional: true
    ---
  4. Run the following command to create the sleep service:

    kubectl apply -f sleep.yaml -n default
  5. Go to the pod where the sleep service resides, and run the following command to send multiple requests to the httpbin service:

    while true; do curl http://httpbin:8000/headers; done

Step 2: Define and execute a throttling policy

You can customize a response or use the default response when a request is denied.

  1. Log on to the ASM console.

  2. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  3. On the Mesh Management page, find the ASM instance that you want to configure. Click the name of the ASM instance or click Manage in the Actions column.

  4. On the details page of the ASM instance, choose Traffic Management Center > ASMLocalRateLimiter in the left-side navigation pane.

  5. On the LocalRateLimiter page, click Create. On the Create page, set the Namespace parameter to default, select a template, and copy one of the following codes to the YMAL code editor:

    • Use the default response:

      apiVersion: istio.alibabacloud.com/v1beta1
      kind: ASMLocalRateLimiter
      metadata:
        name: httpbin
        namespace: default
      spec:
        workloadSelector:
          labels:
            app: httpbin
        configs:
          - match:
              vhost:
                name: "*"
                port: 8000
                route:
                  header_match:
                  - name: ":path"
                    prefix_match: "/"
            limit:
               fill_interval:
                  seconds: 60
               quota: 10
    • Customize a response:

      apiVersion: istio.alibabacloud.com/v1beta1
      kind: ASMLocalRateLimiter
      metadata:
        name: httpbin
        namespace: default
      spec:
        workloadSelector:
          labels:
            app: httpbin
        configs:
          - match:
              vhost:
                name: "*"
                port: 8000
                route:
                  header_match:
                  - name: ":path"
                    prefix_match: "/"
            limit:
               fill_interval:
                  seconds: 60
               quota: 10
               custom_response_body: '{"custom": "custom message", "message": "Your request be limited" }'
               response_header_to_add:
                 x-rate-limited: 'TOO_MANY_REQUESTS'
                 x-local-rate-limit: 'enabled'
  6. Run the following command to send 10 requests:

    curl -v httpbin:8000/headers

    Expected output:

    • Default response:

      *   Trying 192.168.250.89:8000...
      * Connected to httpbin (192.168.250.89) port 8000 (#0)
      > GET /headers HTTP/1.1
      > Host: httpbin:8000
      > User-Agent: curl/7.85.0-DEV
      > Accept: */*
      >
      * Mark bundle as not supporting multiuse
      < HTTP/1.1 429 Too Many Requests
      < x-local-rate-limit: true
      < content-length: 18
      < content-type: text/plain
      < date: Tue, 27 Sep 2022 07:42:08 GMT
      < server: envoy
      < x-envoy-upstream-service-time: 0

      The 429 HTTP status code and the x-local-rate-limit response header are returned.

    • Custom response:

      *   Trying 192.168.250.89:8000...
      * Connected to httpbin (192.168.250.89) port 8000 (#0)
      > GET /headers HTTP/1.1
      > Host: httpbin:8000
      > User-Agent: curl/7.85.0-DEV
      > Accept: */*
      >
      * Mark bundle as not supporting multiuse
      < HTTP/1.1 429 Too Many Requests
      < x-local-rate-limit: enabled
      < x-rate-limited: TOO_MANY_REQUESTS
      < content-length: 67
      < content-type: text/plain
      < date: Tue, 27 Sep 2022 11:45:45 GMT
      < server: envoy
      < x-envoy-upstream-service-time: 0
      <
      * Connection #0 to host httpbin left intact
      {"custom": "custom message", "message": "Your request be limited" }

      The custom HTTP status code and response header are returned.

Throttling metrics

The following table describes the throttling metrics that Envoy automatically generates.

Metric

Description

<stat_prefix>.http_local_rate_limit.enabled

Total number of requests for which the throttling is triggered

<stat_prefix>.http_local_rate_limit.ok

Total number of responses to requests that have tokens in the token bucket

<stat_prefix>.http_local_rate_limit.rate_limited

Total number of responses to requests that have no tokens available (but not necessarily enforced)

<stat_prefix>.http_local_rate_limit.enforced

Total number of requests to which throttling was applied (for example, 429 is returned)

The preceding metrics are prefixed with <stat_prefix>.http_local_rate_limit, where stat_prefix indicates the value that you configured in the stat_prefix field, such as http_local_rate_limiter.

View throttling metrics in Prometheus Service

  1. Run the following command to enable collection of statistics from Envoy:

    Add annotations to spec.template.metadata in the Deployment YAML file to enable statistics collection from Envoy.

    kubectl patch deployment httpbin --type merge -p '{"spec":{"template":{"metadata":{"annotations":{"proxy.istio.io/config":"proxyStatsMatcher:\n  inclusionRegexps:\n  - \".*http_local_rate_limit.*\""}}}}}'
  2. After the pod is automatically restarted, run the following command to send multiple requests to the httpbin service:

    curl -v httpbin:8000/headers

    Expected output:

    envoy_http_local_rate_limiter_http_local_rate_limit_enabled{} 37
    
    envoy_http_local_rate_limiter_http_local_rate_limit_enforced{} 17
    
    envoy_http_local_rate_limiter_http_local_rate_limit_ok{} 20
    
    envoy_http_local_rate_limiter_http_local_rate_limit_rate_limited{} 17
  3. View throttling metrics by using a self-managed Prometheus instance or in the ARMS console.

    To view throttling metrics in the ARMS console, perform the following operations:

    1. Log on to the ARMS console. In the left-side navigation pane, choose Prometheus Service > Prometheus Instances.

    2. Click the instance that you want to use. In the left-side navigation pane, click Dashboards, and then click the dashboard that you want to view.

    3. In the left-side navigation pane, click the Explore icon to view metrics.

      The following figure provides an example.Throttling metrics