All Products
Search
Document Center

Alibaba Cloud Service Mesh:ASM Playground: peak EWMA load balancing

Last Updated:Mar 11, 2026

When a backend endpoint degrades -- responding slowly or returning errors -- the default LEAST_REQUEST algorithm continues routing traffic to it because it only tracks outstanding request counts, not latency. This raises overall P95 latency for the entire service. A peak exponentially weighted moving average (EWMA) load balancer solves this by tracking each endpoint's real-time latency and active request count, then reducing the routing weight of any endpoint whose performance drops. More requests go to healthy, low-latency endpoints as a result.

Use this ASM Playground scenario to deploy a test service with one normal endpoint and one degraded endpoint, then compare P95 latency between the default LEAST_REQUEST algorithm and PEAK_EWMA.

Important

Read ASM Playground overview before you start.

How peak EWMA works

A peak EWMA load balancer implements a latency-aware and active-requests-aware routing algorithm. It continuously measures each endpoint's response time and outstanding request count, then calculates a weight that reflects real-time performance. When an endpoint's latency increases or error rate rises, peak EWMA temporarily reduces that endpoint's weight so fewer requests reach it. When the endpoint recovers, its weight returns to normal.

ConditionPeak EWMA behavior
One endpoint has elevated latencyShifts traffic to lower-latency endpoints

This gives peak EWMA a clear advantage over LEAST_REQUEST when backend performance degrades unexpectedly. LEAST_REQUEST distributes traffic to endpoints with the fewest outstanding requests but does not factor in response latency. If one endpoint responds slowly, LEAST_REQUEST keeps routing traffic to it at roughly the same rate, which increases overall P95 latency.

Scenario overview

This scenario deploys a simple-server service with two versions:

DeploymentBehavior
simple-server-normalResponds with normal latency
simple-server-high-latencyResponds with occasional high latency

A test traffic generator (curl-job pod) sends requests to the simple-server service for 5 minutes. Run the test twice -- once with the default LEAST_REQUEST algorithm, once with PEAK_EWMA -- and compare the P95 latency on the built-in Grafana dashboard.

The following figure shows the deployment topology and trace path.

Deployment topology and trace path for the peak EWMA scenario

Create the scenario

Create a playground instance with the playground ID ewmaLb. For instructions, see Create a playground instance.

Interaction fields

The ASMPlayground custom resource (CR) exposes two fields for this scenario.

Field

Type

Description

spec.scene.ewmaLb.testTrafficStartTimestamp

int64

Triggers test traffic. Set this to any nonzero value that differs from the previous value. If the field does not exist, add it manually.

spec.scene.ewmaLb.enableEwmaForSimpleServer

bool

Enables peak EWMA for the simple-server service. When true, the ASM Playground controller creates a DestinationRule that sets the load balancing algorithm to PEAK_EWMA. When false or absent, the default LEAST_REQUEST algorithm applies.

Test the scenario

Run two rounds of test traffic -- one with the default algorithm, one with peak EWMA -- and compare the results on the Grafana dashboard.

Step 1: Send test traffic with the default algorithm

  1. Connect to the playground instance with kubectl, then edit the ASMPlayground CR:

    kubectl edit asmplayground default
  2. Set spec.scene.ewmaLb.testTrafficStartTimestamp to 1 (or any nonzero value not used previously), then save and exit:

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: ASMPlayground
    metadata:
      name: default
    spec:
      scene:
        ewmaLb:
          testTrafficStartTimestamp: 1
  3. Verify that test traffic is running:

    kubectl get pod

    Expected output:

    NAME                                          READY   STATUS      RESTARTS
    curl-job-npgbd                                2/2     Running     0
    simple-server-high-latency-7968d5978b-cnrqt   2/2     Running     0
    simple-server-normal-66bd9d546-kvn2m          2/2     Running     0

    A pod with the curl-job prefix confirms that test traffic has started. This pod sends requests to the simple-server service for 5 minutes.

Step 2: View the Grafana dashboard

  1. Retrieve the Grafana URL. Grafana is exposed on port 3000 through the ingress gateway:

    kubectl -n istio-system get svc istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}:3000'

    Expected output:

    xxx.xxx.xxx.xxx:3000
  2. Open the URL in a browser. Navigate to Dashboards > Istio > Istio Workload Dashboard and apply the following filters:

    FilterValue
    datasourcePrometheus
    Namespacedefault
    Workloadcurl-job
    Reportersource & destination
    Inbound Workload NamespaceAll
    Inbound WorkloadAll
    Destination Servicesimple-server.default.svc.cluster.local

    Grafana filter configuration

  3. Scroll to the Outbound Services section. The P95 latency intermittently exceeds 1 second because the LEAST_REQUEST algorithm continues routing traffic to the simple-server-high-latency endpoint despite its elevated latency.

    P95 latency with the default LEAST_REQUEST algorithm

Step 3: Enable peak EWMA load balancing

  1. Edit the ASMPlayground CR again:

    kubectl edit asmplayground default
  2. Set spec.scene.ewmaLb.enableEwmaForSimpleServer to true, then save and exit:

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: ASMPlayground
    metadata:
      name: default
    spec:
      scene:
        ewmaLb:
          testTrafficStartTimestamp: 1
          enableEwmaForSimpleServer: true
    status:
      scene:
        ewmaLb:
          testTrafficStartTimestamp: 1
  3. Confirm the DestinationRule is in place:

    kubectl get destinationrule -o yaml

    Expected output:

    apiVersion: v1
    items:
    - apiVersion: networking.istio.io/v1beta1
      kind: DestinationRule
      metadata:
        creationTimestamp: "2024-07-18T01:33:26Z"
        generation: 1
        labels:
          provider: asm
        name: simple-server
        namespace: default
        resourceVersion: "134230265"
        uid: bafdcd48-a90c-4b68-8517-9dbc99dcb94e
      spec:
        host: simple-server.default.svc.cluster.local
        trafficPolicy:
          loadBalancer:
            simple: PEAK_EWMA
    kind: List
    metadata:
      resourceVersion: ""

    The output shows a DestinationRule named simple-server with the load balancing algorithm set to PEAK_EWMA.

Step 4: Send test traffic again and compare results

Trigger a new round of test traffic by repeating Step 1. Set spec.scene.ewmaLb.testTrafficStartTimestamp to a new value (for example, 2).

Open the Grafana dashboard as described in Step 2 and observe the Outbound Services section. The P95 latency drops significantly. Peak EWMA detects the elevated latency on the simple-server-high-latency endpoint and temporarily reduces its weight, routing more requests to simple-server-normal. This lowers overall latency and improves service reliability.

P95 latency with peak EWMA load balancing enabled