Static circuit breaking requires you to estimate and manually set a fixed concurrency threshold for each service. If the estimate is too low, it blocks legitimate traffic. If too high, the service overloads before the circuit breaker activates. The ASMAdaptiveConcurrency CustomResourceDefinition (CRD) removes this guesswork by dynamically adjusting the concurrency limit based on real-time latency measurements, keeping the limit near the service's actual capacity without manual tuning.
When concurrent requests exceed the calculated limit, the sidecar proxy returns HTTP status code 503 with the error message reached concurrency limit.
How it works
The adaptive concurrency controller uses a gradient-based algorithm that periodically measures the baseline latency of a service (MinRTT) and compares it against sampled request latency (SampleRTT) to adjust the concurrency limit.
Gradient calculation
The controller calculates a gradient value from the sampled latencies:
gradient = (minRTT + buffer) / sampleRTTThe buffer absorbs normal latency variance, so the gradient only decreases when sampled latency exceeds the baseline by a meaningful amount:
buffer = minRTT * buffer_pctThe gradient then updates the concurrency limit:
new_limit = gradient * current_limit + headroomWhen the service is healthy, sampleRTT stays close to minRTT, the gradient stays near 1.0, and the limit remains stable or increases. As the service becomes overloaded, sampleRTT rises, the gradient drops below 1.0, and the limit decreases -- rejecting excess requests before the service degrades further.
MinRTT recalculation
The controller periodically recalculates MinRTT by temporarily reducing the concurrency limit to the min_concurrency value. During this window, only a small number of requests are forwarded so the service can respond at its true minimum latency.
The concurrency limit drops significantly during MinRTT recalculation, which may cause a spike in 503 responses. To mitigate this, create a destination rule that enables retries for the target service. This allows the sidecar proxy to retry rejected requests on other hosts that are not in a MinRTT recalculation window.
Prerequisites
Before you begin, ensure that you have:
A Service Mesh (ASM) instance of version 1.12.4.19 or later. For more information, see Create an ASM instance
A cluster added to the ASM instance. For more information, see Add a cluster to an ASM instance
A kubectl client connected to the cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster
Step 1: Deploy sample applications
This tutorial uses two applications to demonstrate adaptive concurrency control:
| Application | Role | Configuration |
|---|---|---|
| testserver | Target service | Handles up to 500 concurrent requests, each taking 1,000 ms to process. Requests beyond the concurrency limit are queued. |
| gotest | Load generator | Each replica sends 200 concurrent requests to testserver. |
Deploy testserver
Create a file named
testserver.yamlwith the following content:The
-mflag sets the maximum concurrent requests (500). The-tflag sets the processing time per request in milliseconds (1,000).Apply the Deployment:
kubectl apply -f testserver.yaml
Create the testserver Service
Create a file named
testservice.yamlwith the following content:Apply the Service:
kubectl apply -f testservice.yaml
Deploy gotest
Create a file named
gotest.yamlwith the following content:The initial replica count is 0. Scale it up in Step 4 to generate load.
Apply the Deployment:
kubectl apply -f gotest.yaml
Step 2: Create an ASMAdaptiveConcurrency CRD
Connect kubectl to your ASM instance. For more information, see Use kubectl on the control plane to access Istio resources.
Create a file named
adaptiveconcurrency.yamlwith the following content:apiVersion: istio.alibabacloud.com/v1beta1 kind: ASMAdaptiveConcurrency metadata: name: sample-adaptive-concurrency namespace: default spec: workload_selector: labels: app: testserver sample_aggregate_percentile: value: 60 concurrency_limit_params: max_concurrency_limit: 500 concurrency_update_interval: 15s min_rtt_calc_params: interval: 60s request_count: 100 jitter: value: 15 min_concurrency: 50 buffer: value: 25This configuration can be understood as follows:
Target the
testserverworkload and set the upper bound to 500 concurrent requests.Update the concurrency limit every 15 seconds using the 60th percentile of sampled latencies as the SampleRTT.
Recalculate MinRTT every 60 seconds (with up to 15% random jitter) using 100 sampled requests. During MinRTT recalculation, limit concurrency to 50 requests.
Treat latency fluctuations within 25% of MinRTT as normal.
Apply the CRD:
kubectl apply -f adaptiveconcurrency.yaml
Parameter reference
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
workload_selector | WorkloadSelector | Yes | -- | Selects the target pod by label. |
labels | map | Yes | -- | Labels to match the target pod. |
sample_aggregate_percentile | Percent | Yes | -- | Percentile used to aggregate sampled latencies into SampleRTT. Valid values: 0 to 100. |
concurrency_limit_params | Object | Yes | -- | Concurrency limit settings. |
max_concurrency_limit | int | No | 1000 | Upper bound for concurrent requests. |
concurrency_update_interval | duration | Yes | -- | How often the controller updates the concurrency limit. Example: 60s. |
min_rtt_calc_params | Object | Yes | -- | MinRTT calculation settings. |
interval | duration | No | -- | How often MinRTT is recalculated. Example: 120s. |
request_count | int | No | 50 | Number of requests sampled to calculate MinRTT. |
jitter | Percent | No | 15 | Random jitter added to the MinRTT recalculation interval. For example, if interval is 120s and jitter is 50, the actual interval is random(120, 120 + 120 * 50%) seconds. |
min_concurrency | int | No | 3 | Concurrency limit during MinRTT recalculation. Also serves as the initial limit when the controller starts. Set this well below the service's actual capacity to get accurate baseline measurements. |
buffer | Percent | No | 25 | Acceptable latency fluctuation as a percentage of MinRTT. For example, if MinRTT is 100 ms and buffer is 10, latencies up to 110 ms are considered normal. |
Step 3: Set up monitoring with Prometheus
Export adaptive concurrency metrics to Managed Service for Prometheus to observe the controller's behavior and tune parameters.
Enable Managed Service for Prometheus for your cluster. For more information, see Use Managed Service for Prometheus.
Create a ServiceMonitor to scrape metrics from the testserver sidecar proxy.
Create a file named
servicemonitor.yamlwith the following content:apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: testserver-envoy-metrics namespace: default spec: endpoints: - interval: 5s path: /stats/prometheus port: metrics namespaceSelector: any: true selector: matchLabels: app: testserverConnect kubectl to the ACK cluster and apply the ServiceMonitor:
kubectl apply -f servicemonitor.yaml
Import the Grafana dashboard to visualize the controller metrics. Download the dashboard JSON and import it in the Managed Service for Grafana console. For import instructions, see the ARMS documentation.
Step 4: Verify the concurrency controller
Generate load against testserver to confirm that the ASMAdaptiveConcurrency CRD limits concurrency as expected.
Scale the gotest Deployment to 5 replicas. With 200 concurrent requests per replica, this produces 1,000 simultaneous requests -- double the testserver's 500-request capacity.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find your cluster and click its name or click Details in the Actions column.
In the left-side navigation pane, choose Workloads > Deployments.
On the Deployments page, set Namespace to default. In the Actions column for the gotest application, choose More > View in YAML.
In the Edit YAML dialog box, set
replicasto5and click Update.
Open the Grafana dashboard you imported in Step 3. Set Service to
testserverand Pod toALL.Check the ConcurrencyLimit panel. The concurrency limit should stabilize below 500, confirming that the controller protects testserver from overload. The RqBlocked panel shows the cumulative count of rejected requests.
Metrics reference
The adaptive concurrency controller exposes the following Prometheus metrics through the sidecar proxy. Use them to monitor controller behavior and tune CRD parameters.
| Metric | Type | Description |
|---|---|---|
rq_blocked | Counter | Total requests rejected by the concurrency controller. |
burst_queue_size | Gauge | Current headroom value used in the concurrency limit calculation. |
concurrency_limit | Gauge | Current concurrency limit enforced by the controller. |
gradient | Gauge | Current gradient value. |
min_rtt_msecs | Gauge | Current MinRTT measurement in milliseconds. |
sample_rtt_msecs | Gauge | Current SampleRTT aggregate in milliseconds. |
min_rtt_calculation_active | Gauge | Set to 1 when the controller is recalculating MinRTT. |
All metrics use the namespace prefix envoy_http_inbound_0_0_0_0_8080_adaptive_concurrency_gradient_controller_.
Apply to production services
Enable retries: Create a destination rule with retry policies for the target service to handle 503 responses during MinRTT recalculation windows.
Tune parameters: Adjust
concurrency_update_interval,min_concurrency, andbufferbased on the Grafana dashboard observations. A smallerbuffermakes the controller more sensitive to latency changes; a larger value allows more fluctuation.Target your own workloads: Replace the
workload_selectorlabels to target your production workloads. Start with conservative settings (min_concurrencywell below expected capacity) and iterate based on metrics.