All Products
Search
Document Center

Alibaba Cloud Service Mesh:Use ASMAdaptiveConcurrency to implement adaptive concurrency control

Last Updated:Mar 10, 2026

Static circuit breaking requires you to estimate and manually set a fixed concurrency threshold for each service. If the estimate is too low, it blocks legitimate traffic. If too high, the service overloads before the circuit breaker activates. The ASMAdaptiveConcurrency CustomResourceDefinition (CRD) removes this guesswork by dynamically adjusting the concurrency limit based on real-time latency measurements, keeping the limit near the service's actual capacity without manual tuning.

When concurrent requests exceed the calculated limit, the sidecar proxy returns HTTP status code 503 with the error message reached concurrency limit.

How it works

The adaptive concurrency controller uses a gradient-based algorithm that periodically measures the baseline latency of a service (MinRTT) and compares it against sampled request latency (SampleRTT) to adjust the concurrency limit.

Gradient calculation

The controller calculates a gradient value from the sampled latencies:

gradient = (minRTT + buffer) / sampleRTT

The buffer absorbs normal latency variance, so the gradient only decreases when sampled latency exceeds the baseline by a meaningful amount:

buffer = minRTT * buffer_pct

The gradient then updates the concurrency limit:

new_limit = gradient * current_limit + headroom

When the service is healthy, sampleRTT stays close to minRTT, the gradient stays near 1.0, and the limit remains stable or increases. As the service becomes overloaded, sampleRTT rises, the gradient drops below 1.0, and the limit decreases -- rejecting excess requests before the service degrades further.

MinRTT recalculation

The controller periodically recalculates MinRTT by temporarily reducing the concurrency limit to the min_concurrency value. During this window, only a small number of requests are forwarded so the service can respond at its true minimum latency.

Important

The concurrency limit drops significantly during MinRTT recalculation, which may cause a spike in 503 responses. To mitigate this, create a destination rule that enables retries for the target service. This allows the sidecar proxy to retry rejected requests on other hosts that are not in a MinRTT recalculation window.

Prerequisites

Before you begin, ensure that you have:

Step 1: Deploy sample applications

This tutorial uses two applications to demonstrate adaptive concurrency control:

ApplicationRoleConfiguration
testserverTarget serviceHandles up to 500 concurrent requests, each taking 1,000 ms to process. Requests beyond the concurrency limit are queued.
gotestLoad generatorEach replica sends 200 concurrent requests to testserver.

Deploy testserver

  1. Create a file named testserver.yaml with the following content:

    View testserver.yaml

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: testserver
      name: testserver
      namespace: default
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: testserver
      template:
        metadata:
          creationTimestamp: null
          labels:
            app: testserver
        spec:
          containers:
          - args:
            - -m
            - "500"
            - -t
            - "1000"
            command:
            - /usr/local/bin/limited-concurrency-http-server
            image: registry.cn-hangzhou.aliyuncs.com/acs/asm-limited-concurrency-http-server:v0.1.1-gee0b08f-aliyun
            imagePullPolicy: IfNotPresent
            name: testserver
            ports:
            - containerPort: 8080
              protocol: TCP

    The -m flag sets the maximum concurrent requests (500). The -t flag sets the processing time per request in milliseconds (1,000).

  2. Apply the Deployment:

    kubectl apply -f testserver.yaml

Create the testserver Service

  1. Create a file named testservice.yaml with the following content:

    View testservice.yaml

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: testserver
      name: testserver
      namespace: default
    spec:
      internalTrafficPolicy: Cluster
      ipFamilies:
        - IPv4
      ipFamilyPolicy: SingleStack
      ports:
        - name: http
          port: 8080
          protocol: TCP
          targetPort: 8080
        - name: metrics
          port: 15020
          protocol: TCP
          targetPort: 15020
      selector:
        app: testserver
      type: ClusterIP
  2. Apply the Service:

    kubectl apply -f testservice.yaml

Deploy gotest

  1. Create a file named gotest.yaml with the following content:

    View gotest.yaml

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: gotest
      name: gotest
      namespace: default
    spec:
      replicas: 0
      selector:
        matchLabels:
          app: gotest
      template:
        metadata:
          creationTimestamp: null
          labels:
            app: gotest
        spec:
          containers:
          - args:
            - -c
            - "200"
            - -n
            - "10000"
            - -u
            - testserver:8080
            command:
            - /root/go-stress-testing-linux
            image: xocoder/go-stress-testing-linux:v0.1
            imagePullPolicy: Always
            name: gotest
            resources:
              limits:
                cpu: 500m

    The initial replica count is 0. Scale it up in Step 4 to generate load.

  2. Apply the Deployment:

    kubectl apply -f gotest.yaml

Step 2: Create an ASMAdaptiveConcurrency CRD

  1. Connect kubectl to your ASM instance. For more information, see Use kubectl on the control plane to access Istio resources.

  2. Create a file named adaptiveconcurrency.yaml with the following content:

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: ASMAdaptiveConcurrency
    metadata:
      name: sample-adaptive-concurrency
      namespace: default
    spec:
      workload_selector:
        labels:
          app: testserver
      sample_aggregate_percentile:
        value: 60
      concurrency_limit_params:
        max_concurrency_limit: 500
        concurrency_update_interval: 15s
      min_rtt_calc_params:
        interval: 60s
        request_count: 100
        jitter:
          value: 15
        min_concurrency: 50
        buffer:
          value: 25

    This configuration can be understood as follows:

    • Target the testserver workload and set the upper bound to 500 concurrent requests.

    • Update the concurrency limit every 15 seconds using the 60th percentile of sampled latencies as the SampleRTT.

    • Recalculate MinRTT every 60 seconds (with up to 15% random jitter) using 100 sampled requests. During MinRTT recalculation, limit concurrency to 50 requests.

    • Treat latency fluctuations within 25% of MinRTT as normal.

  3. Apply the CRD:

    kubectl apply -f adaptiveconcurrency.yaml

Parameter reference

ParameterTypeRequiredDefaultDescription
workload_selectorWorkloadSelectorYes--Selects the target pod by label.
labelsmapYes--Labels to match the target pod.
sample_aggregate_percentilePercentYes--Percentile used to aggregate sampled latencies into SampleRTT. Valid values: 0 to 100.
concurrency_limit_paramsObjectYes--Concurrency limit settings.
max_concurrency_limitintNo1000Upper bound for concurrent requests.
concurrency_update_intervaldurationYes--How often the controller updates the concurrency limit. Example: 60s.
min_rtt_calc_paramsObjectYes--MinRTT calculation settings.
intervaldurationNo--How often MinRTT is recalculated. Example: 120s.
request_countintNo50Number of requests sampled to calculate MinRTT.
jitterPercentNo15Random jitter added to the MinRTT recalculation interval. For example, if interval is 120s and jitter is 50, the actual interval is random(120, 120 + 120 * 50%) seconds.
min_concurrencyintNo3Concurrency limit during MinRTT recalculation. Also serves as the initial limit when the controller starts. Set this well below the service's actual capacity to get accurate baseline measurements.
bufferPercentNo25Acceptable latency fluctuation as a percentage of MinRTT. For example, if MinRTT is 100 ms and buffer is 10, latencies up to 110 ms are considered normal.

Step 3: Set up monitoring with Prometheus

Export adaptive concurrency metrics to Managed Service for Prometheus to observe the controller's behavior and tune parameters.

  1. Enable Managed Service for Prometheus for your cluster. For more information, see Use Managed Service for Prometheus.

  2. Create a ServiceMonitor to scrape metrics from the testserver sidecar proxy.

    1. Create a file named servicemonitor.yaml with the following content:

      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        name: testserver-envoy-metrics
        namespace: default
      spec:
        endpoints:
          - interval: 5s
            path: /stats/prometheus
            port: metrics
        namespaceSelector:
          any: true
        selector:
          matchLabels:
            app: testserver
    2. Connect kubectl to the ACK cluster and apply the ServiceMonitor:

      kubectl apply -f servicemonitor.yaml
  3. Import the Grafana dashboard to visualize the controller metrics. Download the dashboard JSON and import it in the Managed Service for Grafana console. For import instructions, see the ARMS documentation.

    View Grafana dashboard JSON

    {
      "annotations": {
        "list": [
          {
            "builtIn": 1,
            "datasource": "-- Grafana --",
            "enable": true,
            "hide": true,
            "iconColor": "rgba(0, 211, 255, 1)",
            "name": "Annotations & Alerts",
            "type": "dashboard"
          }
        ]
      },
      "description": "monitoring ASM Adaptive Concurrency",
      "editable": true,
      "gnetId": 6693,
      "graphTooltip": 0,
      "id": 3239002,
      "iteration": 1651922323976,
      "links": [],
      "panels": [
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "$cluster",
          "fieldConfig": {
            "defaults": { "custom": {} },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 0,
          "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
          "hiddenSeries": false,
          "id": 22,
          "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
          "lines": true,
          "linewidth": 1,
          "nullPointMode": "null",
          "options": { "alertThreshold": true },
          "percentage": false,
          "pluginVersion": "7.4.0-pre",
          "pointradius": 2,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_rq_blocked{service=\"$service\", pod=\"$pod\"}",
              "interval": "",
              "legendFormat": "{{service}}-{{pod}}",
              "queryType": "randomWalk",
              "refId": "A"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "RqBlocked",
          "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
          "type": "graph",
          "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
          "yaxes": [
            { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
            { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
          ],
          "yaxis": { "align": false, "alignLevel": null }
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "$cluster",
          "fieldConfig": {
            "defaults": { "custom": {} },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 0,
          "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 },
          "hiddenSeries": false,
          "id": 24,
          "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
          "lines": true,
          "linewidth": 1,
          "nullPointMode": "null",
          "options": { "alertThreshold": true },
          "percentage": false,
          "pluginVersion": "7.4.0-pre",
          "pointradius": 2,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_burst_queue_size{service=\"$service\", pod=\"$pod\"}",
              "format": "time_series",
              "interval": "",
              "legendFormat": "{{service}}-{{pod}}",
              "queryType": "randomWalk",
              "refId": "A"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "HeadRoom",
          "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
          "type": "graph",
          "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
          "yaxes": [
            { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
            { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
          ],
          "yaxis": { "align": false, "alignLevel": null }
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "$cluster",
          "fieldConfig": {
            "defaults": { "custom": {} },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 0,
          "gridPos": { "h": 8, "w": 12, "x": 0, "y": 8 },
          "hiddenSeries": false,
          "id": 26,
          "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
          "lines": true,
          "linewidth": 1,
          "nullPointMode": "null",
          "options": { "alertThreshold": true },
          "percentage": false,
          "pluginVersion": "7.4.0-pre",
          "pointradius": 2,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_concurrency_limit{service=\"$service\",pod=\"$pod\"}",
              "interval": "",
              "legendFormat": "{{service}}-{{pod}}",
              "queryType": "randomWalk",
              "refId": "A"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "ConcurrencyLimit",
          "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
          "type": "graph",
          "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
          "yaxes": [
            { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
            { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
          ],
          "yaxis": { "align": false, "alignLevel": null }
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "$cluster",
          "fieldConfig": {
            "defaults": { "custom": {} },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 0,
          "gridPos": { "h": 8, "w": 12, "x": 12, "y": 8 },
          "hiddenSeries": false,
          "id": 28,
          "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
          "lines": true,
          "linewidth": 1,
          "nullPointMode": "null",
          "options": { "alertThreshold": true },
          "percentage": false,
          "pluginVersion": "7.4.0-pre",
          "pointradius": 2,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_gradient{service=\"$service\",pod=\"$pod\"}",
              "interval": "",
              "legendFormat": "{{service}}-{{pod}}",
              "queryType": "randomWalk",
              "refId": "A"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "Gradient",
          "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
          "type": "graph",
          "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
          "yaxes": [
            { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
            { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
          ],
          "yaxis": { "align": false, "alignLevel": null }
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "$cluster",
          "fieldConfig": {
            "defaults": { "custom": {} },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 0,
          "gridPos": { "h": 8, "w": 12, "x": 0, "y": 16 },
          "hiddenSeries": false,
          "id": 32,
          "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
          "lines": true,
          "linewidth": 1,
          "nullPointMode": "null",
          "options": { "alertThreshold": true },
          "percentage": false,
          "pluginVersion": "7.4.0-pre",
          "pointradius": 2,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_min_rtt_msecs{service=\"$service\",pod=\"$pod\"}",
              "interval": "",
              "legendFormat": "{{service}}-{{pod}}",
              "queryType": "randomWalk",
              "refId": "A"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "MinRTT(msec)",
          "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
          "type": "graph",
          "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
          "yaxes": [
            { "format": "ms", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
            { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
          ],
          "yaxis": { "align": false, "alignLevel": null }
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "$cluster",
          "fieldConfig": {
            "defaults": { "custom": {} },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 0,
          "gridPos": { "h": 8, "w": 12, "x": 12, "y": 16 },
          "hiddenSeries": false,
          "id": 34,
          "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
          "lines": true,
          "linewidth": 1,
          "nullPointMode": "null",
          "options": { "alertThreshold": true },
          "percentage": false,
          "pluginVersion": "7.4.0-pre",
          "pointradius": 2,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_sample_rtt_msecs{service=\"$service\",pod=\"$pod\"}",
              "interval": "",
              "legendFormat": "{{service}}-{{pod}}",
              "queryType": "randomWalk",
              "refId": "A"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "SampleRTT(msec)",
          "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
          "type": "graph",
          "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
          "yaxes": [
            { "format": "ms", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
            { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
          ],
          "yaxis": { "align": false, "alignLevel": null }
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "test-adaptive-concurrency_1217520382582089",
          "fieldConfig": {
            "defaults": { "custom": {} },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 0,
          "gridPos": { "h": 8, "w": 12, "x": 0, "y": 24 },
          "hiddenSeries": false,
          "id": 30,
          "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
          "lines": true,
          "linewidth": 1,
          "nullPointMode": "null",
          "options": { "alertThreshold": true },
          "percentage": false,
          "pluginVersion": "7.4.0-pre",
          "pointradius": 2,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_min_rtt_calculation_active{service=\"$service\",pod=\"$pod\"}",
              "interval": "",
              "legendFormat": "{{service}}-{{pod}}",
              "queryType": "randomWalk",
              "refId": "A"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "MinRTTCalc",
          "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
          "type": "graph",
          "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
          "yaxes": [
            { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
            { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
          ],
          "yaxis": { "align": false, "alignLevel": null }
        }
      ],
      "refresh": "5s",
      "schemaVersion": 26,
      "style": "dark",
      "tags": [],
      "templating": {
        "list": [
          {
            "current": { "selected": true, "text": "edas120_1217520382582089", "value": "edas120_1217520382582089" },
            "error": null,
            "hide": 0,
            "includeAll": false,
            "label": null,
            "multi": false,
            "name": "cluster",
            "options": [],
            "query": "prometheus",
            "queryValue": "",
            "refresh": 1,
            "regex": "",
            "skipUrlSync": false,
            "type": "datasource"
          },
          {
            "allValue": null,
            "current": { "isNone": true, "selected": false, "text": "None", "value": "" },
            "datasource": "$cluster",
            "definition": "label_values(envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_burst_queue_size,service)",
            "error": null,
            "hide": 0,
            "includeAll": false,
            "label": null,
            "multi": false,
            "name": "service",
            "options": [],
            "query": "label_values(envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_burst_queue_size,service)",
            "refresh": 2,
            "regex": "",
            "skipUrlSync": false,
            "sort": 1,
            "tagValuesQuery": "",
            "tags": [],
            "tagsQuery": "",
            "type": "query",
            "useTags": false
          },
          {
            "allValue": null,
            "current": { "selected": false, "text": "All", "value": "$__all" },
            "datasource": "$cluster",
            "definition": "label_values(envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_concurrency_limit, pod)",
            "error": null,
            "hide": 0,
            "includeAll": true,
            "label": null,
            "multi": true,
            "name": "pod",
            "options": [],
            "query": "label_values(envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_concurrency_limit, pod)",
            "refresh": 2,
            "regex": "",
            "skipUrlSync": false,
            "sort": 0,
            "tagValuesQuery": "",
            "tags": [],
            "tagsQuery": "",
            "type": "query",
            "useTags": false
          }
        ]
      },
      "time": { "from": "now-15m", "to": "now" },
      "timepicker": {
        "refresh_intervals": ["5s", "10s", "30s", "1m", "5m", "15m", "30m", "1h", "2h", "1d"],
        "time_options": ["5m", "15m", "1h", "6h", "12h", "24h", "2d", "7d", "30d"]
      },
      "timezone": "",
      "title": "ASM Adaptive Concurrency",
      "uid": "000000084",
      "version": 3
    }

Step 4: Verify the concurrency controller

Generate load against testserver to confirm that the ASMAdaptiveConcurrency CRD limits concurrency as expected.

  1. Scale the gotest Deployment to 5 replicas. With 200 concurrent requests per replica, this produces 1,000 simultaneous requests -- double the testserver's 500-request capacity.

    1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

    2. On the Clusters page, find your cluster and click its name or click Details in the Actions column.

    3. In the left-side navigation pane, choose Workloads > Deployments.

    4. On the Deployments page, set Namespace to default. In the Actions column for the gotest application, choose More > View in YAML.

    5. In the Edit YAML dialog box, set replicas to 5 and click Update.

  2. Open the Grafana dashboard you imported in Step 3. Set Service to testserver and Pod to ALL.

  3. Check the ConcurrencyLimit panel. The concurrency limit should stabilize below 500, confirming that the controller protects testserver from overload. The RqBlocked panel shows the cumulative count of rejected requests.

Metrics reference

The adaptive concurrency controller exposes the following Prometheus metrics through the sidecar proxy. Use them to monitor controller behavior and tune CRD parameters.

MetricTypeDescription
rq_blockedCounterTotal requests rejected by the concurrency controller.
burst_queue_sizeGaugeCurrent headroom value used in the concurrency limit calculation.
concurrency_limitGaugeCurrent concurrency limit enforced by the controller.
gradientGaugeCurrent gradient value.
min_rtt_msecsGaugeCurrent MinRTT measurement in milliseconds.
sample_rtt_msecsGaugeCurrent SampleRTT aggregate in milliseconds.
min_rtt_calculation_activeGaugeSet to 1 when the controller is recalculating MinRTT.

All metrics use the namespace prefix envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_.

Apply to production services

  • Enable retries: Create a destination rule with retry policies for the target service to handle 503 responses during MinRTT recalculation windows.

  • Tune parameters: Adjust concurrency_update_interval, min_concurrency, and buffer based on the Grafana dashboard observations. A smaller buffer makes the controller more sensitive to latency changes; a larger value allows more fluctuation.

  • Target your own workloads: Replace the workload_selector labels to target your production workloads. Start with conservative settings (min_concurrency well below expected capacity) and iterate based on metrics.