When a backend endpoint degrades -- responding slowly or returning errors -- the default LEAST_REQUEST algorithm continues routing traffic to it because it only tracks outstanding request counts, not latency. This raises overall P95 latency for the entire service. A peak exponentially weighted moving average (EWMA) load balancer solves this by tracking each endpoint's real-time latency and active request count, then reducing the routing weight of any endpoint whose performance drops. More requests go to healthy, low-latency endpoints as a result.
Use this ASM Playground scenario to deploy a test service with one normal endpoint and one degraded endpoint, then compare P95 latency between the default LEAST_REQUEST algorithm and PEAK_EWMA.
Read ASM Playground overview before you start.
How peak EWMA works
A peak EWMA load balancer implements a latency-aware and active-requests-aware routing algorithm. It continuously measures each endpoint's response time and outstanding request count, then calculates a weight that reflects real-time performance. When an endpoint's latency increases or error rate rises, peak EWMA temporarily reduces that endpoint's weight so fewer requests reach it. When the endpoint recovers, its weight returns to normal.
| Condition | Peak EWMA behavior |
|---|---|
| One endpoint has elevated latency | Shifts traffic to lower-latency endpoints |
This gives peak EWMA a clear advantage over LEAST_REQUEST when backend performance degrades unexpectedly. LEAST_REQUEST distributes traffic to endpoints with the fewest outstanding requests but does not factor in response latency. If one endpoint responds slowly, LEAST_REQUEST keeps routing traffic to it at roughly the same rate, which increases overall P95 latency.
Scenario overview
This scenario deploys a simple-server service with two versions:
| Deployment | Behavior |
|---|---|
simple-server-normal | Responds with normal latency |
simple-server-high-latency | Responds with occasional high latency |
A test traffic generator (curl-job pod) sends requests to the simple-server service for 5 minutes. Run the test twice -- once with the default LEAST_REQUEST algorithm, once with PEAK_EWMA -- and compare the P95 latency on the built-in Grafana dashboard.
The following figure shows the deployment topology and trace path.
Create the scenario
Create a playground instance with the playground ID ewmaLb. For instructions, see Create a playground instance.
Interaction fields
The ASMPlayground custom resource (CR) exposes two fields for this scenario.
Field | Type | Description |
| int64 | Triggers test traffic. Set this to any nonzero value that differs from the previous value. If the field does not exist, add it manually. |
| bool | Enables peak EWMA for the |
Test the scenario
Run two rounds of test traffic -- one with the default algorithm, one with peak EWMA -- and compare the results on the Grafana dashboard.
Step 1: Send test traffic with the default algorithm
Connect to the playground instance with kubectl, then edit the ASMPlayground CR:
kubectl edit asmplayground defaultSet
spec.scene.ewmaLb.testTrafficStartTimestampto1(or any nonzero value not used previously), then save and exit:apiVersion: istio.alibabacloud.com/v1beta1 kind: ASMPlayground metadata: name: default spec: scene: ewmaLb: testTrafficStartTimestamp: 1Verify that test traffic is running:
kubectl get podExpected output:
NAME READY STATUS RESTARTS curl-job-npgbd 2/2 Running 0 simple-server-high-latency-7968d5978b-cnrqt 2/2 Running 0 simple-server-normal-66bd9d546-kvn2m 2/2 Running 0A pod with the
curl-jobprefix confirms that test traffic has started. This pod sends requests to thesimple-serverservice for 5 minutes.
Step 2: View the Grafana dashboard
Retrieve the Grafana URL. Grafana is exposed on port 3000 through the ingress gateway:
kubectl -n istio-system get svc istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}:3000'Expected output:
xxx.xxx.xxx.xxx:3000Open the URL in a browser. Navigate to Dashboards > Istio > Istio Workload Dashboard and apply the following filters:
Filter Value datasource Prometheus Namespace default Workload curl-job Reporter source & destination Inbound Workload Namespace All Inbound Workload All Destination Service simple-server.default.svc.cluster.local 
Scroll to the Outbound Services section. The P95 latency intermittently exceeds 1 second because the
LEAST_REQUESTalgorithm continues routing traffic to thesimple-server-high-latencyendpoint despite its elevated latency.
Step 3: Enable peak EWMA load balancing
Edit the ASMPlayground CR again:
kubectl edit asmplayground defaultSet
spec.scene.ewmaLb.enableEwmaForSimpleServertotrue, then save and exit:apiVersion: istio.alibabacloud.com/v1beta1 kind: ASMPlayground metadata: name: default spec: scene: ewmaLb: testTrafficStartTimestamp: 1 enableEwmaForSimpleServer: true status: scene: ewmaLb: testTrafficStartTimestamp: 1Confirm the DestinationRule is in place:
kubectl get destinationrule -o yamlExpected output:
apiVersion: v1 items: - apiVersion: networking.istio.io/v1beta1 kind: DestinationRule metadata: creationTimestamp: "2024-07-18T01:33:26Z" generation: 1 labels: provider: asm name: simple-server namespace: default resourceVersion: "134230265" uid: bafdcd48-a90c-4b68-8517-9dbc99dcb94e spec: host: simple-server.default.svc.cluster.local trafficPolicy: loadBalancer: simple: PEAK_EWMA kind: List metadata: resourceVersion: ""The output shows a DestinationRule named
simple-serverwith the load balancing algorithm set toPEAK_EWMA.
Step 4: Send test traffic again and compare results
Trigger a new round of test traffic by repeating Step 1. Set spec.scene.ewmaLb.testTrafficStartTimestamp to a new value (for example, 2).
Open the Grafana dashboard as described in Step 2 and observe the Outbound Services section. The P95 latency drops significantly. Peak EWMA detects the elevated latency on the simple-server-high-latency endpoint and temporarily reduces its weight, routing more requests to simple-server-normal. This lowers overall latency and improves service reliability.
