Use ASMAdaptiveConcurrency for Adaptive Concurrency Control - Alibaba Cloud Service Mesh

Static circuit breaking requires you to estimate and manually set a fixed concurrency threshold for each service. If the estimate is too low, it blocks legitimate traffic. If too high, the service overloads before the circuit breaker activates. The ASMAdaptiveConcurrency CustomResourceDefinition (CRD) removes this guesswork by dynamically adjusting the concurrency limit based on real-time latency measurements, keeping the limit near the service's actual capacity without manual tuning.

When concurrent requests exceed the calculated limit, the sidecar proxy returns HTTP status code 503 with the error message reached concurrency limit.

How it works

The adaptive concurrency controller uses a gradient-based algorithm that periodically measures the baseline latency of a service (MinRTT) and compares it against sampled request latency (SampleRTT) to adjust the concurrency limit.

Gradient calculation

The controller calculates a gradient value from the sampled latencies:

gradient = (minRTT + buffer) / sampleRTT

The buffer absorbs normal latency variance, so the gradient only decreases when sampled latency exceeds the baseline by a meaningful amount:

buffer = minRTT * buffer_pct

The gradient then updates the concurrency limit:

new_limit = gradient * current_limit + headroom

When the service is healthy, sampleRTT stays close to minRTT, the gradient stays near 1.0, and the limit remains stable or increases. As the service becomes overloaded, sampleRTT rises, the gradient drops below 1.0, and the limit decreases -- rejecting excess requests before the service degrades further.

MinRTT recalculation

The controller periodically recalculates MinRTT by temporarily reducing the concurrency limit to the min_concurrency value. During this window, only a small number of requests are forwarded so the service can respond at its true minimum latency.

Important

The concurrency limit drops significantly during MinRTT recalculation, which may cause a spike in 503 responses. To mitigate this, create a destination rule that enables retries for the target service. This allows the sidecar proxy to retry rejected requests on other hosts that are not in a MinRTT recalculation window.

Prerequisites

Before you begin, ensure that you have:

A Service Mesh (ASM) instance of version 1.12.4.19 or later. For more information, see Create an ASM instance
A cluster added to the ASM instance. For more information, see Add a cluster to an ASM instance
A kubectl client connected to the cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster

Step 1: Deploy sample applications

This tutorial uses two applications to demonstrate adaptive concurrency control:

Application	Role	Configuration
testserver	Target service	Handles up to 500 concurrent requests, each taking 1,000 ms to process. Requests beyond the concurrency limit are queued.
gotest	Load generator	Each replica sends 200 concurrent requests to `testserver`.

Deploy testserver

Create a file named testserver.yaml with the following content:

View testserver.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: testserver
  name: testserver
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: testserver
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: testserver
    spec:
      containers:
      - args:
        - -m
        - "500"
        - -t
        - "1000"
        command:
        - /usr/local/bin/limited-concurrency-http-server
        image: registry.cn-hangzhou.aliyuncs.com/acs/asm-limited-concurrency-http-server:v0.1.1-gee0b08f-aliyun
        imagePullPolicy: IfNotPresent
        name: testserver
        ports:
        - containerPort: 8080
          protocol: TCP

The -m flag sets the maximum concurrent requests (500). The -t flag sets the processing time per request in milliseconds (1,000).

Apply the Deployment:
```
kubectl apply -f testserver.yaml
```

Create the testserver Service

Create a file named testservice.yaml with the following content:

View testservice.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app: testserver
  name: testserver
  namespace: default
spec:
  internalTrafficPolicy: Cluster
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  ports:
    - name: http
      port: 8080
      protocol: TCP
      targetPort: 8080
    - name: metrics
      port: 15020
      protocol: TCP
      targetPort: 15020
  selector:
    app: testserver
  type: ClusterIP

Apply the Service:
```
kubectl apply -f testservice.yaml
```

Deploy gotest

Create a file named gotest.yaml with the following content:

View gotest.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: gotest
  name: gotest
  namespace: default
spec:
  replicas: 0
  selector:
    matchLabels:
      app: gotest
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: gotest
    spec:
      containers:
      - args:
        - -c
        - "200"
        - -n
        - "10000"
        - -u
        - testserver:8080
        command:
        - /root/go-stress-testing-linux
        image: xocoder/go-stress-testing-linux:v0.1
        imagePullPolicy: Always
        name: gotest
        resources:
          limits:
            cpu: 500m

The initial replica count is 0. Scale it up in Step 4 to generate load.

Apply the Deployment:
```
kubectl apply -f gotest.yaml
```

Step 2: Create an ASMAdaptiveConcurrency CRD

Connect kubectl to your ASM instance. For more information, see Use kubectl on the control plane to access Istio resources.
Create a file named adaptiveconcurrency.yaml with the following content:
```
apiVersion: istio.alibabacloud.com/v1beta1
kind: ASMAdaptiveConcurrency
metadata:
  name: sample-adaptive-concurrency
  namespace: default
spec:
  workload_selector:
    labels:
      app: testserver
  sample_aggregate_percentile:
    value: 60
  concurrency_limit_params:
    max_concurrency_limit: 500
    concurrency_update_interval: 15s
  min_rtt_calc_params:
    interval: 60s
    request_count: 100
    jitter:
      value: 15
    min_concurrency: 50
    buffer:
      value: 25
```
This configuration can be understood as follows:
- Target the testserver workload and set the upper bound to 500 concurrent requests.
- Update the concurrency limit every 15 seconds using the 60th percentile of sampled latencies as the SampleRTT.
- Recalculate MinRTT every 60 seconds (with up to 15% random jitter) using 100 sampled requests. During MinRTT recalculation, limit concurrency to 50 requests.
- Treat latency fluctuations within 25% of MinRTT as normal.

Apply the CRD:

kubectl apply -f adaptiveconcurrency.yaml

Parameter reference

Parameter	Type	Required	Default	Description
`workload_selector`	WorkloadSelector	Yes	--	Selects the target pod by label.
`labels`	map	Yes	--	Labels to match the target pod.
`sample_aggregate_percentile`	Percent	Yes	--	Percentile used to aggregate sampled latencies into SampleRTT. Valid values: 0 to 100.
`concurrency_limit_params`	Object	Yes	--	Concurrency limit settings.
`max_concurrency_limit`	int	No	1000	Upper bound for concurrent requests.
`concurrency_update_interval`	duration	Yes	--	How often the controller updates the concurrency limit. Example: `60s`.
`min_rtt_calc_params`	Object	Yes	--	MinRTT calculation settings.
`interval`	duration	No	--	How often MinRTT is recalculated. Example: `120s`.
`request_count`	int	No	50	Number of requests sampled to calculate MinRTT.
`jitter`	Percent	No	15	Random jitter added to the MinRTT recalculation interval. For example, if `interval` is `120s` and `jitter` is `50`, the actual interval is random(120, 120 + 120 * 50%) seconds.
`min_concurrency`	int	No	3	Concurrency limit during MinRTT recalculation. Also serves as the initial limit when the controller starts. Set this well below the service's actual capacity to get accurate baseline measurements.
`buffer`	Percent	No	25	Acceptable latency fluctuation as a percentage of MinRTT. For example, if MinRTT is 100 ms and buffer is 10, latencies up to 110 ms are considered normal.

Step 3: Set up monitoring with Prometheus

Export adaptive concurrency metrics to Managed Service for Prometheus to observe the controller's behavior and tune parameters.

Enable Managed Service for Prometheus for your cluster. For more information, see Use Managed Service for Prometheus.

Create a ServiceMonitor to scrape metrics from the testserver sidecar proxy.

Create a file named servicemonitor.yaml with the following content:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: testserver-envoy-metrics
  namespace: default
spec:
  endpoints:
    - interval: 5s
      path: /stats/prometheus
      port: metrics
  namespaceSelector:
    any: true
  selector:
    matchLabels:
      app: testserver

Connect kubectl to the ACK cluster and apply the ServiceMonitor:
```
kubectl apply -f servicemonitor.yaml
```

Import the Grafana dashboard to visualize the controller metrics. Download the dashboard JSON and import it in the Managed Service for Grafana console. For import instructions, see the ARMS documentation.

View Grafana dashboard JSON

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "description": "monitoring ASM Adaptive Concurrency",
  "editable": true,
  "gnetId": 6693,
  "graphTooltip": 0,
  "id": 3239002,
  "iteration": 1651922323976,
  "links": [],
  "panels": [
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "$cluster",
      "fieldConfig": {
        "defaults": { "custom": {} },
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
      "hiddenSeries": false,
      "id": 22,
      "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": { "alertThreshold": true },
      "percentage": false,
      "pluginVersion": "7.4.0-pre",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_rq_blocked{service=\"$service\", pod=\"$pod\"}",
          "interval": "",
          "legendFormat": "{{service}}-{{pod}}",
          "queryType": "randomWalk",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "RqBlocked",
      "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
      "type": "graph",
      "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
      "yaxes": [
        { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
        { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
      ],
      "yaxis": { "align": false, "alignLevel": null }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "$cluster",
      "fieldConfig": {
        "defaults": { "custom": {} },
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 },
      "hiddenSeries": false,
      "id": 24,
      "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": { "alertThreshold": true },
      "percentage": false,
      "pluginVersion": "7.4.0-pre",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_burst_queue_size{service=\"$service\", pod=\"$pod\"}",
          "format": "time_series",
          "interval": "",
          "legendFormat": "{{service}}-{{pod}}",
          "queryType": "randomWalk",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "HeadRoom",
      "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
      "type": "graph",
      "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
      "yaxes": [
        { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
        { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
      ],
      "yaxis": { "align": false, "alignLevel": null }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "$cluster",
      "fieldConfig": {
        "defaults": { "custom": {} },
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 8 },
      "hiddenSeries": false,
      "id": 26,
      "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": { "alertThreshold": true },
      "percentage": false,
      "pluginVersion": "7.4.0-pre",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_concurrency_limit{service=\"$service\",pod=\"$pod\"}",
          "interval": "",
          "legendFormat": "{{service}}-{{pod}}",
          "queryType": "randomWalk",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "ConcurrencyLimit",
      "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
      "type": "graph",
      "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
      "yaxes": [
        { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
        { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
      ],
      "yaxis": { "align": false, "alignLevel": null }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "$cluster",
      "fieldConfig": {
        "defaults": { "custom": {} },
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 8 },
      "hiddenSeries": false,
      "id": 28,
      "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": { "alertThreshold": true },
      "percentage": false,
      "pluginVersion": "7.4.0-pre",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_gradient{service=\"$service\",pod=\"$pod\"}",
          "interval": "",
          "legendFormat": "{{service}}-{{pod}}",
          "queryType": "randomWalk",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Gradient",
      "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
      "type": "graph",
      "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
      "yaxes": [
        { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
        { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
      ],
      "yaxis": { "align": false, "alignLevel": null }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "$cluster",
      "fieldConfig": {
        "defaults": { "custom": {} },
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 16 },
      "hiddenSeries": false,
      "id": 32,
      "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": { "alertThreshold": true },
      "percentage": false,
      "pluginVersion": "7.4.0-pre",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_min_rtt_msecs{service=\"$service\",pod=\"$pod\"}",
          "interval": "",
          "legendFormat": "{{service}}-{{pod}}",
          "queryType": "randomWalk",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "MinRTT(msec)",
      "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
      "type": "graph",
      "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
      "yaxes": [
        { "format": "ms", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
        { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
      ],
      "yaxis": { "align": false, "alignLevel": null }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "$cluster",
      "fieldConfig": {
        "defaults": { "custom": {} },
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 16 },
      "hiddenSeries": false,
      "id": 34,
      "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": { "alertThreshold": true },
      "percentage": false,
      "pluginVersion": "7.4.0-pre",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_sample_rtt_msecs{service=\"$service\",pod=\"$pod\"}",
          "interval": "",
          "legendFormat": "{{service}}-{{pod}}",
          "queryType": "randomWalk",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "SampleRTT(msec)",
      "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
      "type": "graph",
      "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
      "yaxes": [
        { "format": "ms", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
        { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
      ],
      "yaxis": { "align": false, "alignLevel": null }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "test-adaptive-concurrency_1217520382582089",
      "fieldConfig": {
        "defaults": { "custom": {} },
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 24 },
      "hiddenSeries": false,
      "id": 30,
      "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": { "alertThreshold": true },
      "percentage": false,
      "pluginVersion": "7.4.0-pre",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_min_rtt_calculation_active{service=\"$service\",pod=\"$pod\"}",
          "interval": "",
          "legendFormat": "{{service}}-{{pod}}",
          "queryType": "randomWalk",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "MinRTTCalc",
      "tooltip": { "shared": true, "sort": 0, "value_type": "individual" },
      "type": "graph",
      "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] },
      "yaxes": [
        { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true },
        { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }
      ],
      "yaxis": { "align": false, "alignLevel": null }
    }
  ],
  "refresh": "5s",
  "schemaVersion": 26,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "current": { "selected": true, "text": "edas120_1217520382582089", "value": "edas120_1217520382582089" },
        "error": null,
        "hide": 0,
        "includeAll": false,
        "label": null,
        "multi": false,
        "name": "cluster",
        "options": [],
        "query": "prometheus",
        "queryValue": "",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      },
      {
        "allValue": null,
        "current": { "isNone": true, "selected": false, "text": "None", "value": "" },
        "datasource": "$cluster",
        "definition": "label_values(envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_burst_queue_size,service)",
        "error": null,
        "hide": 0,
        "includeAll": false,
        "label": null,
        "multi": false,
        "name": "service",
        "options": [],
        "query": "label_values(envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_burst_queue_size,service)",
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
        "allValue": null,
        "current": { "selected": false, "text": "All", "value": "$__all" },
        "datasource": "$cluster",
        "definition": "label_values(envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_concurrency_limit, pod)",
        "error": null,
        "hide": 0,
        "includeAll": true,
        "label": null,
        "multi": true,
        "name": "pod",
        "options": [],
        "query": "label_values(envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_concurrency_limit, pod)",
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      }
    ]
  },
  "time": { "from": "now-15m", "to": "now" },
  "timepicker": {
    "refresh_intervals": ["5s", "10s", "30s", "1m", "5m", "15m", "30m", "1h", "2h", "1d"],
    "time_options": ["5m", "15m", "1h", "6h", "12h", "24h", "2d", "7d", "30d"]
  },
  "timezone": "",
  "title": "ASM Adaptive Concurrency",
  "uid": "000000084",
  "version": 3
}

Step 4: Verify the concurrency controller

Generate load against testserver to confirm that the ASMAdaptiveConcurrency CRD limits concurrency as expected.

Scale the gotest Deployment to 5 replicas. With 200 concurrent requests per replica, this produces 1,000 simultaneous requests -- double the testserver's 500-request capacity.
1. Log on to the ACK console. In the left-side navigation pane, click Clusters.
2. On the Clusters page, find your cluster and click its name or click Details in the Actions column.
3. In the left-side navigation pane, choose Workloads > Deployments.
4. On the Deployments page, set Namespace to default. In the Actions column for the gotest application, choose More > View in YAML.
5. In the Edit YAML dialog box, set replicas to 5 and click Update.
Open the Grafana dashboard you imported in Step 3. Set Service to testserver and Pod to ALL.
Check the ConcurrencyLimit panel. The concurrency limit should stabilize below 500, confirming that the controller protects testserver from overload. The RqBlocked panel shows the cumulative count of rejected requests.

Metrics reference

The adaptive concurrency controller exposes the following Prometheus metrics through the sidecar proxy. Use them to monitor controller behavior and tune CRD parameters.

Metric	Type	Description
`rq_blocked`	Counter	Total requests rejected by the concurrency controller.
`burst_queue_size`	Gauge	Current headroom value used in the concurrency limit calculation.
`concurrency_limit`	Gauge	Current concurrency limit enforced by the controller.
`gradient`	Gauge	Current gradient value.
`min_rtt_msecs`	Gauge	Current MinRTT measurement in milliseconds.
`sample_rtt_msecs`	Gauge	Current SampleRTT aggregate in milliseconds.
`min_rtt_calculation_active`	Gauge	Set to 1 when the controller is recalculating MinRTT.

All metrics use the namespace prefix envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_.

Apply to production services

Enable retries: Create a destination rule with retry policies for the target service to handle 503 responses during MinRTT recalculation windows.
Tune parameters: Adjust concurrency_update_interval, min_concurrency, and buffer based on the Grafana dashboard observations. A smaller buffer makes the controller more sensitive to latency changes; a larger value allows more fluctuation.
Target your own workloads: Replace the workload_selector labels to target your production workloads. Start with conservative settings (min_concurrency well below expected capacity) and iterate based on metrics.