All Products
Search
Document Center

Alibaba Cloud Service Mesh:ASM Playground for a peak EWMA load balancing scenario

Last Updated:Dec 31, 2024

This topic describes how to use Service Mesh (ASM) Playground to create a peak exponentially weighted moving average (EWMA) load balancing scenario and demonstrates the standard interaction process in the scenario.

Important

Before you get started, make sure that you have read and understood the content in ASM Playground overview.

Scenario introduction

A peak EWMA load balancer is a load balancer for ASM traffic distribution based on endpoint status weights. This load balancer can proactively reduce the temporary weights of endpoints with deteriorated performance, such as increased latency and failed requests. This improves the success rate and reduces the latency of applications. This feature enables a peak EWMA load balancer to significantly outperform a traditional load balancer in dealing with unexpected backend exceptions. In this scenario, a simple-server service is deployed. Two versions of the service are deployed by using Deployments: simple-server-normal is a normal version and simple-server-high-latency is a version with occasional high latency. In this example, load balancing algorithms are switched and test traffic is initiated. On a monitoring panel, you can intuitively view the performance difference between the peak EWMA load balancer and the default LEAST_REQUEST load balancer in a scenario where application latency increases occasionally. The following figure shows the complete trace and deployment topology in this scenario.

image

Create a scenario

You can create a scenario with a playground ID named ewmaLb. For more information, see Create a playground instance.

Scenario interaction method

Use ASM Playground CRs to interact with the scenario

You can use ASM Playground custom resources (CRs) to control the playground instance. The following fields are available for this scenario.

Field

Type

Description

spec.scene.ewmaLb.testTrafficStartTimestamp

int64

The timestamp when the test traffic is initiated. You can initiate test traffic by changing the value to the current timestamp. You can specify any number that is not 0 but different from the last number. If the value does not exist, you can manually add it.

spec.scene.ewmaLb.enableEwmaForSimpleServer

bool

Specifies whether to enable a peak EWMA load balancer for the simple-server service. If this field is set to true, the ASM Playground controller creates a destination rule for the simple-server service and sets the load balancing algorithm to PEAK_EWMA. If this field is set to false, no destination rule is deployed and the default LEAST_REQUEST algorithm is used for load balancing.

Scenario interaction example

The following section provides the standard interaction process in this scenario. You can use the following procedure to experience this scenario. You can also modify ASM Playground CRs based on your business requirements to interact with the current scenario. In this example, the following process is used for demonstration:

  1. Initiate test traffic. In this case, the default load balancer is used.

  2. View the monitoring panel.

  3. Change the value of spec.scene.ewmaLb.enableEwmaForSimpleServer to true to enable the peak EWMA load balancer for the simple-server service.

  4. Initiate test traffic again and compare the test results with the previous test results.

Step 1: Use the default load balancer to initiate test traffic

  1. Use kubectl to connect to the playground instance based on the information in the kubeconfig file and then run the following command:

    kubectl edit asmplayground default

    Set the value of spec.scene.ewmaLb.testTrafficStartTimestamp to 1 (the value can be set to a timestamp or any number that is not used previously to initiate test traffic). Then, save the modification and exit the text editor.

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: ASMPlayground
    metadata:
      name: default
    spec:
      scene:
        ewmaLb:
          testTrafficStartTimestamp: 1
  2. Run the following command to query pod information about the playground instance:

    kubectl get pod

    Expected output:

    NAME                                          READY   STATUS      RESTARTS   
    curl-job-npgbd                                2/2     Running     0          
    simple-server-high-latency-7968d5978b-cnrqt   2/2     Running     0          
    simple-server-normal-66bd9d546-kvn2m          2/2     Running     0          

    The pod whose prefix is curl-job is started, which indicates that test traffic has been initiated. The pod keeps sending test traffic for 5 minutes to access the simple-server service.

Step 2: View the monitoring panel

  1. When the pod with the curl-job prefix sends test traffic, you can view the response time on the Grafana monitoring panel built in the playground instance. Grafana is exposed to the Internet through port 3000 of the playground instance. You can use kubectl to connect to the playground instance based on the information in the kubeconfig file and run the following command to obtain the URL of Grafana:

    kubectl -n istio-system get svc istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}:3000'

    Expected output:

     xxx.xxx.xxx.xxx:3000

    Use this URL to open the Grafana console. Choose Dashboards > Istio > Istio Workload Dashboard to open the workload monitoring panel. Then, configure a filter according to the following settings:

    • datasource: Prometheus

    • Namespace: default

    • Workload: curl-job

    • Reporter: source & destination

    • Inbound Workload Namespace: All

    • Inbound Workload: All

    • Destination Service: simple-server.default.svc.cluster.local

    image

  2. View the dashboards in the Outbound Services section. You can see that the P95 latency of the traffic sent from the pod with the curl-job prefix to the simple-server service intermittently exceeds 1 seconds. This is because some traffic is routed by the load balancing algorithm to an endpoint with higher latency (the pod deployed by using the simple-server-high-latency Deployment).

    image

Step 3: Enable the peak EWMA load balancer

  1. Use kubectl to connect to the playground instance based on the information in the kubeconfig file and then run the following command:

    kubectl edit asmplayground default

    Set the value of spec.scene.ewmaLb.enableEwmaForSimpleServer to true. Then, save the modification and exit the text editor.

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: ASMPlayground
    metadata:
      name: default
    spec:
      scene:
        ewmaLb:
          testTrafficStartTimestamp: 1
          enableEwmaForSimpleServer: true
    status:
      scene:
        ewmaLb:
          testTrafficStartTimestamp: 1
  2. Run the following command to query the configuration of the destination rule:

    kubectl get destinationrule -o yaml

    Expected output:

    apiVersion: v1
    items:
    - apiVersion: networking.istio.io/v1beta1
      kind: DestinationRule
      metadata:
        creationTimestamp: "2024-07-18T01:33:26Z"
        generation: 1
        labels:
          provider: asm
        name: simple-server
        namespace: default
        resourceVersion: "134230265"
        uid: bafdcd48-a90c-4b68-8517-9dbc99dcb94e
      spec:
        host: simple-server.default.svc.cluster.local
        trafficPolicy:
          loadBalancer:
            simple: PEAK_EWMA
    kind: List
    metadata:
      resourceVersion: ""

    The output indicates that a destination rule named simple-server is deployed in the instance and the load balancer is PEAK_EWMA.

Step 4: Initiate test traffic again and compare the test results with the previous test results

Perform the operations in Step 1: Use the default load balancer to initiate test traffic again to initiate a test, and refer to Step 2: View the monitoring panel to view the test results.

image

The output indicates that the P95 latency is significantly reduced. This is because when the peak EWMA load balancer detects that an endpoint encounters an increase in the latency or error rate, it will reduce the weight of this endpoint for a period of time, so that more requests are routed to endpoints with normal load. This reduces the overall latency and significantly improves the performance of the service.