All Products
Search
Document Center

Alibaba Cloud Service Mesh:Configure a connection pool to implement circuit breaking

Last Updated:Mar 10, 2026

When microservices experience overload or partial failures, uncontrolled traffic can cascade across the system and bring down healthy services. Circuit breaking at the network level -- through Service Mesh (ASM) sidecar proxies -- rejects excess traffic before it reaches the backend, without requiring code changes in each service. Traditional approaches such as Resilience4j require embedding circuit breaking logic directly into application code.

Configure the connectionPool field in a DestinationRule to cap concurrent connections and pending requests to a destination service. The sections below cover the parameters, demonstrate behavior across four pod scaling topologies, and explain how to monitor circuit breaking in production.

Prerequisites

Before you begin, make sure that you have:

How connection pool circuit breaking works

Create a DestinationRule with connectionPool settings to enable circuit breaking for a target service. For the full field reference, see Destination Rule.

Three parameters control connection pool behavior:

ParameterTypeRequiredDefaultDescription
tcp.maxConnectionsint32No2^32-1Maximum HTTP/1.1 or TCP connections to a destination host. Enforced on sidecar proxies on both the client and server sides. A single client pod cannot open more than this number of connections, and a single server pod cannot accept more. Effective server-side capacity: min(client pods, server pods) x maxConnections.
http.http1MaxPendingRequestsint32No1024Maximum requests queued while waiting for an available connection. Setting this to 0 falls back to the default (1024), so use a value of at least 1.
http.http2MaxRequestsint32No1024Maximum active requests to a backend.
The connectionPool field limits connections and queued requests but does not eject unhealthy hosts from the load balancing pool. For host ejection based on error rates, combine connectionPool with outlierDetection in the same DestinationRule. See Destination Rule for outlierDetection fields.

With a single client and a single server pod, these parameters behave predictably. In production, services typically run multiple pods. The following scenarios show how circuit breaking behaves across four common topologies:

  • One client pod, one destination service pod

  • One client pod, multiple destination service pods

  • Multiple client pods, one destination service pod

  • Multiple client pods, multiple destination service pods

Deploy the sample applications

The sample setup has two components:

  • Server: A Flask application listening on port 9080 at the /hello endpoint. Each request takes 5 seconds to process (simulating a slow backend).

  • Client: A Python script that sends 10 parallel requests per batch. Batches fire at the 0th, 20th, and 40th second of each minute so that multiple client pods send requests simultaneously.

  1. Save the following YAML and run kubectl apply -f <file-name>.yaml to deploy the sample applications.

    Server script

    #!/usr/bin/env python3
    from flask import Flask
    import time
    
    app = Flask(__name__)
    
    @app.route('/hello')
    def get():
        time.sleep(5)
        return 'hello world!'
    
    if __name__ == '__main__':
        app.run(debug=True, host='0.0.0.0', port='9080', threaded=True)

    Client script

    #!/usr/bin/env python3
    import requests
    import time
    import sys
    from datetime import datetime
    import _thread
    
    def timedisplay(t):
      return t.strftime("%H:%M:%S")
    
    def get(url):
      try:
        stime = datetime.now()
        start = time.time()
        response = requests.get(url)
        etime = datetime.now()
        end = time.time()
        elapsed = end-start
        sys.stderr.write("Status: " + str(response.status_code) + ", Start: " + timedisplay(stime) + ", End: " + timedisplay(etime) + ", Elapsed Time: " + str(elapsed)+"\n")
        sys.stdout.flush()
      except Exception as myexception:
        sys.stderr.write("Exception: " + str(myexception)+"\n")
        sys.stdout.flush()
    
    time.sleep(30)
    
    while True:
      sc = int(datetime.now().strftime('%S'))
      time_range = [0, 20, 40]
    
      if sc not in time_range:
        time.sleep(1)
        continue
    
      sys.stderr.write("\n----------Info----------\n")
      sys.stdout.flush()
    
      # Send 10 requests in parallel
      for i in range(10):
        _thread.start_new_thread(get, ("http://circuit-breaker-sample-server:9080/hello", ))
    
      time.sleep(2)

    Deployment YAML

    ##################################################################################################
    #  circuit-breaker-sample-server services
    ##################################################################################################
    apiVersion: v1
    kind: Service
    metadata:
      name: circuit-breaker-sample-server
      labels:
        app: circuit-breaker-sample-server
        service: circuit-breaker-sample-server
    spec:
      ports:
      - port: 9080
        name: http
      selector:
        app: circuit-breaker-sample-server
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: circuit-breaker-sample-server
      labels:
        app: circuit-breaker-sample-server
        version: v1
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: circuit-breaker-sample-server
          version: v1
      template:
        metadata:
          labels:
            app: circuit-breaker-sample-server
            version: v1
        spec:
          containers:
          - name: circuit-breaker-sample-server
            image: registry.cn-hangzhou.aliyuncs.com/acs/istio-samples:circuit-breaker-sample-server.v1
            imagePullPolicy: Always
            ports:
            - containerPort: 9080
    ---
    ##################################################################################################
    #  circuit-breaker-sample-client services
    ##################################################################################################
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: circuit-breaker-sample-client
      labels:
        app: circuit-breaker-sample-client
        version: v1
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: circuit-breaker-sample-client
          version: v1
      template:
        metadata:
          labels:
            app: circuit-breaker-sample-client
            version: v1
        spec:
          containers:
          - name: circuit-breaker-sample-client
            image: registry.cn-hangzhou.aliyuncs.com/acs/istio-samples:circuit-breaker-sample-client.v1
            imagePullPolicy: Always
  2. Verify that the pods are running. Expected output:

    kubectl get po | grep circuit
    circuit-breaker-sample-client-d4f64d66d-fwrh4   2/2     Running   0             1m22s
    circuit-breaker-sample-server-6d6ddb4b-gcthv    2/2     Running   0             1m22s

Without a DestinationRule, the server handles all 10 concurrent requests and every response returns 200:

----------Info----------
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.016539812088013
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.012614488601685
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.015984535217285
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.015599012374878
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.012874364852905
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.018714904785156
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.010422468185425
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.012431621551514
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.011001348495483
Status: 200, Start: 02:39:20, End: 02:39:25, Elapsed Time: 5.01432466506958

Create a DestinationRule for circuit breaking

Define a DestinationRule for the destination service to enable circuit breaking. For more information, see Manage destination rules.

The following rule limits TCP connections to 5:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: circuit-breaker-sample-server
spec:
  host: circuit-breaker-sample-server
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 5

Scenario 1: One client pod, one destination service pod

  1. Restart the client pod and check its logs. All 10 requests succeed, but only 5 finish in approximately 5 seconds. The rest wait 10+ seconds because they queue until a connection frees up. With only tcp.maxConnections set, excess requests queue rather than fail -- the default queue depth is 2^32 - 1.

    ----------Info----------
    Status: 200, Start: 02:49:40, End: 02:49:45, Elapsed Time: 5.0167787075042725
    Status: 200, Start: 02:49:40, End: 02:49:45, Elapsed Time: 5.011920690536499
    Status: 200, Start: 02:49:40, End: 02:49:45, Elapsed Time: 5.017078161239624
    Status: 200, Start: 02:49:40, End: 02:49:45, Elapsed Time: 5.018405437469482
    Status: 200, Start: 02:49:40, End: 02:49:45, Elapsed Time: 5.018689393997192
    Status: 200, Start: 02:49:40, End: 02:49:50, Elapsed Time: 10.018936395645142
    Status: 200, Start: 02:49:40, End: 02:49:50, Elapsed Time: 10.016417503356934
    Status: 200, Start: 02:49:40, End: 02:49:50, Elapsed Time: 10.019930601119995
    Status: 200, Start: 02:49:40, End: 02:49:50, Elapsed Time: 10.022735834121704
    Status: 200, Start: 02:49:40, End: 02:49:55, Elapsed Time: 15.02303147315979
  2. For fail-fast circuit breaking, also limit http.http1MaxPendingRequests. Update the DestinationRule. For more information, see Manage destination rules.

    apiVersion: networking.istio.io/v1alpha3
    kind: DestinationRule
    metadata:
      name: circuit-breaker-sample-server
    spec:
      host: circuit-breaker-sample-server
      trafficPolicy:
        connectionPool:
          tcp:
            maxConnections: 5
          http:
            http1MaxPendingRequests: 1
  3. Restart the client pod and check its logs. Four requests are immediately rejected (503), five reach the destination, and one is queued (completing in approximately 10 seconds after waiting for a free connection).

    ----------Info----------
    Status: 503, Start: 02:56:40, End: 02:56:40, Elapsed Time: 0.005339622497558594
    Status: 503, Start: 02:56:40, End: 02:56:40, Elapsed Time: 0.007254838943481445
    Status: 503, Start: 02:56:40, End: 02:56:40, Elapsed Time: 0.0044133663177490234
    Status: 503, Start: 02:56:40, End: 02:56:40, Elapsed Time: 0.008964776992797852
    Status: 200, Start: 02:56:40, End: 02:56:45, Elapsed Time: 5.018309116363525
    Status: 200, Start: 02:56:40, End: 02:56:45, Elapsed Time: 5.017424821853638
    Status: 200, Start: 02:56:40, End: 02:56:45, Elapsed Time: 5.019804954528809
    Status: 200, Start: 02:56:40, End: 02:56:45, Elapsed Time: 5.01643180847168
    Status: 200, Start: 02:56:40, End: 02:56:45, Elapsed Time: 5.025975227355957
    Status: 200, Start: 02:56:40, End: 02:56:50, Elapsed Time: 10.01716136932373
  4. Verify the active connection count from the client's sidecar proxy. Expected output: Five active connections from the client proxy to the destination pod, matching the maxConnections limit.

    kubectl exec $(kubectl get pod --selector app=circuit-breaker-sample-client --output jsonpath='{.items[0].metadata.name}') -c istio-proxy -- curl -X POST http://localhost:15000/clusters | grep circuit-breaker-sample-server | grep cx_active
    outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local::172.20.192.124:9080::cx_active::5

Scenario 2: One client pod, multiple destination service pods

This scenario tests whether the connection limit applies per pod or per service. With one client and three destination pods:

  • Per-pod limit: Each pod allows 5 connections, totaling 15. All 10 requests should succeed in approximately 5 seconds.

  • Per-service limit: Only 5 connections total, regardless of pod count. Throttling behavior matches Scenario 1.

  1. Scale the destination service to three replicas.

    kubectl scale deployment/circuit-breaker-sample-server --replicas=3
  2. Restart the client pod and check its logs. The throttling pattern is identical to Scenario 1. Adding more destination pods does not increase the client's connection limit. The connection limit applies per service, not per pod.

    ----------Info----------
    Status: 503, Start: 03:06:20, End: 03:06:20, Elapsed Time: 0.011791706085205078
    Status: 503, Start: 03:06:20, End: 03:06:20, Elapsed Time: 0.0032286643981933594
    Status: 503, Start: 03:06:20, End: 03:06:20, Elapsed Time: 0.012153387069702148
    Status: 503, Start: 03:06:20, End: 03:06:20, Elapsed Time: 0.011871814727783203
    Status: 200, Start: 03:06:20, End: 03:06:25, Elapsed Time: 5.012892484664917
    Status: 200, Start: 03:06:20, End: 03:06:25, Elapsed Time: 5.013102769851685
    Status: 200, Start: 03:06:20, End: 03:06:25, Elapsed Time: 5.016939163208008
    Status: 200, Start: 03:06:20, End: 03:06:25, Elapsed Time: 5.014261484146118
    Status: 200, Start: 03:06:20, End: 03:06:25, Elapsed Time: 5.01246190071106
    Status: 200, Start: 03:06:20, End: 03:06:30, Elapsed Time: 10.021712064743042
  3. Verify the active connection distribution. Expected output: The proxy distributes connections across pods -- two per pod, six total rather than five. As mentioned in both Envoy and Istio documentation, a proxy allows some leeway in terms of the number of connections.

    kubectl exec $(kubectl get pod --selector app=circuit-breaker-sample-client --output jsonpath='{.items[0].metadata.name}') -c istio-proxy -- curl -X POST http://localhost:15000/clusters | grep circuit-breaker-sample-server | grep cx_active
    outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local::172.20.192.124:9080::cx_active::2
    outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local::172.20.192.158:9080::cx_active::2
    outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local::172.20.192.26:9080::cx_active::2

Scenario 3: Multiple client pods, one destination service pod

  1. Adjust replicas: scale the server to 1 and the client to 3.

    kubectl scale deployment/circuit-breaker-sample-server --replicas=1
    kubectl scale deployment/circuit-breaker-sample-client --replicas=3
  2. Restart the client pods and check their logs.

    Client logs

    Client 1

    ----------Info----------
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.008828878402709961
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.010806798934936523
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.012855291366577148
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.004465818405151367
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.007823944091796875
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.06221342086791992
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.06922149658203125
    Status: 503, Start: 03:10:40, End: 03:10:40, Elapsed Time: 0.06859922409057617
    Status: 200, Start: 03:10:40, End: 03:10:45, Elapsed Time: 5.015282392501831
    Status: 200, Start: 03:10:40, End: 03:10:50, Elapsed Time: 9.378434181213379

    Client 2

    ----------Info----------
    Status: 503, Start: 03:11:00, End: 03:11:00, Elapsed Time: 0.007795810699462891
    Status: 503, Start: 03:11:00, End: 03:11:00, Elapsed Time: 0.00595545768737793
    Status: 503, Start: 03:11:00, End: 03:11:00, Elapsed Time: 0.013380765914916992
    Status: 503, Start: 03:11:00, End: 03:11:00, Elapsed Time: 0.004278898239135742
    Status: 503, Start: 03:11:00, End: 03:11:00, Elapsed Time: 0.010999202728271484
    Status: 200, Start: 03:11:00, End: 03:11:05, Elapsed Time: 5.015426874160767
    Status: 200, Start: 03:11:00, End: 03:11:05, Elapsed Time: 5.0184690952301025
    Status: 200, Start: 03:11:00, End: 03:11:05, Elapsed Time: 5.019806146621704
    Status: 200, Start: 03:11:00, End: 03:11:05, Elapsed Time: 5.0175628662109375
    Status: 200, Start: 03:11:00, End: 03:11:05, Elapsed Time: 5.031521558761597

    Client 3

    ----------Info----------
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.012019157409667969
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.012546539306640625
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.013760805130004883
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.014089822769165039
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.014792442321777344
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.015463829040527344
    Status: 503, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.01661539077758789
    Status: 200, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.02904224395751953
    Status: 200, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.03912043571472168
    Status: 200, Start: 03:13:20, End: 03:13:20, Elapsed Time: 0.06436014175415039

    The 503 error rate increases on each client. Each client proxy enforces its own 5-connection limit independently, but the single destination service proxy also enforces a 5-connection limit. Only 5 requests from across all clients can succeed concurrently.

  3. Check the client proxy logs for response flags.

    Client proxy logs

    {"authority":"circuit-breaker-sample-server:9080","bytes_received":"0","bytes_sent":"81","downstream_local_address":"192.168.142.207:9080","downstream_remote_address":"172.20.192.31:44610","duration":"0","istio_policy_status":"-","method":"GET","path":"/hello","protocol":"HTTP/1.1","request_id":"d9d87600-cd01-421f-8a6f-dc0ee0ac8ccd","requested_server_name":"-","response_code":"503","response_flags":"UO","route_name":"default","start_time":"2023-02-28T03:14:00.095Z","trace_id":"-","upstream_cluster":"outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local","upstream_host":"-","upstream_local_address":"-","upstream_service_time":"-","upstream_transport_failure_reason":"-","user_agent":"python-requests/2.21.0","x_forwarded_for":"-"}
    {"authority":"circuit-breaker-sample-server:9080","bytes_received":"0","bytes_sent":"81","downstream_local_address":"192.168.142.207:9080","downstream_remote_address":"172.20.192.31:43294","duration":"58","istio_policy_status":"-","method":"GET","path":"/hello","protocol":"HTTP/1.1","request_id":"931d080a-3413-4e35-91f4-0c906e7ee565","requested_server_name":"-","response_code":"503","response_flags":"URX","route_name":"default","start_time":"2023-02-28T03:12:20.995Z","trace_id":"-","upstream_cluster":"outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local","upstream_host":"172.20.192.84:9080","upstream_local_address":"172.20.192.31:58742","upstream_service_time":"57","upstream_transport_failure_reason":"-","user_agent":"python-requests/2.21.0","x_forwarded_for":"-"}

    Throttled requests return a 503 with one of two response flags:

    FlagMeaningWhere it happens
    UOUpstream overflow (circuit breaking)Client proxy throttles the request locally
    URXUpstream retry/connection limit exceededDestination service proxy rejects the request

    Distinguish the two by examining DURATION, UPSTREAM_HOST, and UPSTREAM_CLUSTER in the access log. UO requests have no upstream host (throttled before sending), while URX requests reached the destination proxy and were rejected there.

  4. Confirm by checking the destination service proxy logs.

    Destination service proxy logs

    {"authority":"circuit-breaker-sample-server:9080","bytes_received":"0","bytes_sent":"81","downstream_local_address":"172.20.192.84:9080","downstream_remote_address":"172.20.192.31:59510","duration":"0","istio_policy_status":"-","method":"GET","path":"/hello","protocol":"HTTP/1.1","request_id":"7684cbb0-8f1c-44bf-b591-40c3deff6b0b","requested_server_name":"outbound_.9080_._.circuit-breaker-sample-server.default.svc.cluster.local","response_code":"503","response_flags":"UO","route_name":"default","start_time":"2023-02-28T03:14:00.095Z","trace_id":"-","upstream_cluster":"inbound|9080||","upstream_host":"-","upstream_local_address":"-","upstream_service_time":"-","upstream_transport_failure_reason":"-","user_agent":"python-requests/2.21.0","x_forwarded_for":"-"}

    The destination service proxy also returns 503 with the UO flag. This confirms that URX entries in the client proxy logs originate from the destination service proxy rejecting excess connections.

Request flow summary:

Each client proxy enforces a 5-connection limit independently. With 3 clients, up to 15 requests can leave the client proxies in parallel. However, the single destination service proxy also enforces a 5-connection limit, so it accepts only 5 and rejects the rest. The rejected requests appear as URX in the client proxy logs.

image

Scenario 4: Multiple client pods, multiple destination service pods

Scaling the destination service increases the overall success rate because each destination pod's proxy independently allows 5 connections.

  1. Set the server to 2 replicas and the client to 3. With 2 destination pods (each accepting 5), 10 out of 30 total requests (from 3 clients) succeed per batch.

    kubectl scale deployment/circuit-breaker-sample-server --replicas=2
    kubectl scale deployment/circuit-breaker-sample-client --replicas=3
  2. Scale the server to 3 replicas. 15 requests succeed per batch.

    kubectl scale deployment/circuit-breaker-sample-server --replicas=3
  3. Scale the server to 4 replicas. Still only 15 requests succeed. The client proxy limit caps at 5 per client regardless of how many destination pods are available. With 3 clients, the maximum is 3 x 5 = 15 successful concurrent requests.

    kubectl scale deployment/circuit-breaker-sample-server --replicas=4

Client and server constraint summary

RoleHow the limit applies
ClientEach client proxy enforces the limit independently. If maxConnections is 100 and there are N client pods, up to N x 100 requests can be in flight across all clients. The limit applies to the entire destination service, not to individual destination pods. Even with 200 destination pods, a single client proxy caps at 100 connections.
Destination serviceEach destination pod's proxy enforces the limit independently. With 50 active pods and maxConnections set to 100, each pod accepts up to 100 connections from client proxies before returning 503.

Monitor circuit breaking metrics

When circuit breaking activates, Envoy generates metrics for detecting and diagnosing throttling.

MetricTypeDescription
envoy_cluster_circuit_breakers_default_cx_openGauge1 if the connection pool circuit breaker is open (active); 0 otherwise.
envoy_cluster_circuit_breakers_default_rq_pending_openGauge1 if the pending request queue has exceeded its limit; 0 otherwise.

Enable circuit breaking metrics

  1. Configure proxyStatsMatcher for the sidecar proxy. Select Regular Expression Match and set the value to .*circuit_breaker.*. For more information, see proxyStatsMatcher.

  2. Redeploy the circuit-breaker-sample-server and circuit-breaker-sample-client Deployments. For more information, see Redeploy workloads.

  3. Re-run the circuit breaking test from the preceding scenarios.

  4. Query the metrics from the client proxy. Expected output:

    kubectl exec -it deploy/circuit-breaker-sample-client -c istio-proxy -- curl localhost:15090/stats/prometheus | grep circuit_breaker | grep circuit-breaker-sample-server
    envoy_cluster_circuit_breakers_default_cx_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 1
    envoy_cluster_circuit_breakers_default_cx_pool_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_default_remaining_cx{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_default_remaining_cx_pools{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 18446744073709551613
    envoy_cluster_circuit_breakers_default_remaining_pending{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 1
    envoy_cluster_circuit_breakers_default_remaining_retries{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 4294967295
    envoy_cluster_circuit_breakers_default_remaining_rq{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 4294967295
    envoy_cluster_circuit_breakers_default_rq_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_default_rq_pending_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_default_rq_retry_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_high_cx_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_high_cx_pool_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_high_rq_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_high_rq_pending_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0
    envoy_cluster_circuit_breakers_high_rq_retry_open{cluster_name="outbound|9080||circuit-breaker-sample-server.default.svc.cluster.local"} 0

Set up alerts for circuit breaking

Use Managed Service for Prometheus to collect circuit breaking metrics and set up alert rules. For component integration details, see Manage components.

If you already use a self-managed Prometheus instance to collect ASM metrics (see Monitor ASM instances by using a self-managed Prometheus instance), skip step 1.
  1. In Managed Service for Prometheus, connect the data plane cluster to the Alibaba Cloud ASM component or upgrade it to the latest version.

  2. Create an alert rule with a custom PromQL statement. For more information, see Use a custom PromQL statement to create an alert rule. Use the following parameters as a reference:

    ParameterExampleDescription
    Custom PromQL statements(sum by(cluster_name, pod_name, namespace) (envoy_cluster_circuit_breakers_default_cx_open)) != 0Checks whether circuit breaking is active in any connection pool. Groups by upstream service name, pod, and namespace so that you can pinpoint where throttling occurs.
    Alert messageCircuit breaking is active. The TCP connection limit has been reached. Namespace: {{$labels.namespace}}, Pod: {{$labels.pod_name}}, Upstream service: {{$labels.cluster_name}}Identifies the affected pod, its namespace, and the upstream service.