The Mixerless Telemetry technology of Alibaba Cloud Service Mesh (ASM) allows you to obtain telemetry data on containers in a non-intrusive manner. You can use Prometheus to collect the monitoring metrics of an application, such as the number of requests, the average latency of requests, and the P99 latency of requests. Then, a Horizontal Pod Autoscaler (HPA) automatically scales the pods of the application based on the collected metrics. This topic describes how to use Mixerless Telemetry to scale the pods of an application.

Prerequisites

Application monitoring metrics are collected by Prometheus. For more information, see Use Mixerless Telemetry to observe ASM instances.

Step 1: Deploy a metrics adapter and a Flagger load tester

  1. Use kubectl to connect to a Container Service for Kubernetes (ACK) cluster. For more information, see Connect to Kubernetes clusters by using kubectl.
  2. Run the following command to deploy a metrics adapter:
    Note To obtain the complete script of a metrics adapter, visit GitHub.
    helm --kubeconfig <Path of the kubeconfig file> -n kube-system install asm-custom-metrics \  $KUBE_METRICS_ADAPTER_SRC/deploy/charts/kube-metrics-adapter \
      --set prometheus.url=http://prometheus.istio-system.svc:9090
  3. Verify whether the metrics adapter is deployed as expected.
    1. Run the following command to view the pod of the metrics adapter:
      kubectl --kubeconfig <Path of the kubeconfig file> get po -n kube-system | grep metrics-adapter

      Expected output:

      asm-custom-metrics-kube-metrics-adapter-6fb4949988-ht8pv   1/1     Running     0          30s
    2. Run the following command to view the custom resource definitions (CRDs) of autoscaling/v2beta:
      kubectl --kubeconfig <Path of the kubeconfig file> api-versions | grep "autoscaling/v2beta"

      Expected output:

      autoscaling/v2beta1
      autoscaling/v2beta2
    3. Run the following command to view the metrics adapter:
      kubectl --kubeconfig <Path of the kubeconfig file> get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

      Expected output:

      {
        "kind": "APIResourceList",
        "apiVersion": "v1",
        "groupVersion": "external.metrics.k8s.io/v1beta1",
        "resources": []
      }
  4. Deploy a Flagger load tester.
    1. Download the required YAML files of the Flagger load tester. For more information, visit GitHub.
    2. Run the following commands to deploy the Flagger load tester:
      kubectl --kubeconfig <Path of the kubeconfig file> apply -f <Path of the Flagger load tester>/kustomize/tester/deployment.yaml -n test
      kubectl --kubeconfig <Path of the kubeconfig file> apply -f <Path of the Flagger load tester>/kustomize/tester/service.yaml -n test

Step 2: Create different HPAs based on your business requirements

  1. Create an HPA to scale the pods of an application based on the value of the istio_requests_total parameter. The istio_requests_total parameter indicates the number of requests that are sent to the application.
    1. Use the following content to create the requests_total_hpa.yaml file:
      apiVersion: autoscaling/v2beta2
      kind: HorizontalPodAutoscaler
      metadata:
        name: podinfo-total
        namespace: test
        annotations:
          metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
            sum(rate(istio_requests_total{destination_workload_namespace="test",reporter="destination"}[1m]))
      spec:
        maxReplicas: 5
        minReplicas: 1
        scaleTargetRef:
          apiVersion: apps/v1
          kind: Deployment
          name: podinfo
        metrics:
          - type: External
            external:
              metric:
                name: prometheus-query
                selector:
                  matchLabels:
                    query-name: processed-requests-per-second
              target:
                type: AverageValue
                averageValue: "10"
      • annotations: Add annotations to configure the HPA to scale the pods of the application based on the value of the istio_requests_total parameter.
      • target: In this example, set the averageValue parameter to 10. If the average number of requests that are sent to the application is greater than or equal to 10, the HPA automatically scales out the pods of the application.
    2. Run the following command to deploy the HPA:
      kubectl --kubeconfig <Path of the kubeconfig file> apply -f resources_hpa/requests_total_hpa.yaml
    3. Verify whether the HPA is deployed as expected.
      kubectl --kubeconfig <Path of the kubeconfig file> get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

      Expected output:

      {
        "kind": "APIResourceList",
        "apiVersion": "v1",
        "groupVersion": "external.metrics.k8s.io/v1beta1",
        "resources": [
          {
            "name": "prometheus-query",
            "singularName": "",
            "namespaced": true,
            "kind": "ExternalMetricValueList",
            "verbs": [
              "get"
            ]
          }
        ]
      }
  2. Create an HPA to scale the pods of an application based on the value of the istio_request_duration_milliseconds_sum parameter. The istio_request_duration_milliseconds_sum parameter indicates the average latency of requests that are sent to the application. Use the following content to create the podinfo-latency-avg.yaml file:
    Repeat Substep b in Step 1 to deploy the HPA.
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
      name: podinfo-latency-avg
      namespace: test
      annotations:
        metric-config.external.prometheus-query.prometheus/latency-average: |
          sum(rate(istio_request_duration_milliseconds_sum{destination_workload_namespace="test",reporter="destination"}[1m]))
          /sum(rate(istio_request_duration_milliseconds_count{destination_workload_namespace="test",reporter="destination"}[1m]))
    spec:
      maxReplicas: 5
      minReplicas: 1
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: podinfo
      metrics:
        - type: External
          external:
            metric:
              name: prometheus-query
              selector:
                matchLabels:
                  query-name: latency-average
            target:
              type: AverageValue
              averageValue: "0.005"
    • annotations: Add annotations to configure the HPA to scale the pods of the application based on the value of the istio_request_duration_milliseconds_sum parameter.
    • target: In this example, set the averageValue parameter to 0.005. If the average latency of requests that are sent to the application is greater than or equal to 0.005s, the HPA automatically scales out the pods of the application.
  3. Create an HPA to scale the pods of an application based on the value of the istio_request_duration_milliseconds_bucket parameter. The istio_request_duration_milliseconds_bucket parameter indicates the P95 latency of requests that are sent to the application. Use the following content to create the podinfo-p95.yaml file:
    Repeat Substep b in Step 1 to deploy the HPA.
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
      name: podinfo-p95
      namespace: test
      annotations:
        metric-config.external.prometheus-query.prometheus/p95-latency: |
          histogram_quantile(0.95,sum(irate(istio_request_duration_milliseconds_bucket{destination_workload_namespace="test",destination_canonical_service="podinfo"}[5m]))by (le))
    spec:
      maxReplicas: 5
      minReplicas: 1
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: podinfo
      metrics:
        - type: External
          external:
            metric:
              name: prometheus-query
              selector:
                matchLabels:
                  query-name: p95-latency
            target:
              type: AverageValue
              averageValue: "4"
    • annotations: Add annotations to configure the HPA to scale the pods of the application based on the value of the istio_request_duration_milliseconds_bucket parameter.
    • target: In this example, set the averageValue parameter to 4. If the average P95 latency of requests that are sent to the application is greater than or equal to 4 ms, the HPA automatically scales out the pods of the application.

Verify whether the pods of an application can be scaled as expected

In this example, verify the HPA that is deployed to scale the pods of an application based on the number of requests sent to the application. Verify whether the HPA works as expected if the number of requests that are sent to the application is greater than or equal to 10.

  1. Run the following command to initiate requests for 5 minutes. Set the number of requests per second to 10 and the number of concurrent requests that are processed at a time to 2.
    alias k="kubectl --kubeconfig $USER_CONFIG"
    loadtester=$(k -n test get pod -l "app=flagger-loadtester" -o jsonpath='{.items..metadata.name}')
    k -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 2 -q 10 http://podinfo:9898
    • -z: the duration within which requests are initiated.
    • -c: the number of concurrent requests that are processed at a time.
    • -q: the number of requests per second.
  2. Run the following command to check whether the pods are scaled out as expected:
    watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total

    Expected output:

    Every 2.0s: kubectl --kubeconfig /Users/han/shop_config/ack_zjk -n test get hpa/podinfo                                            East6C16G: Tue Jan 26 18:01:30 2021
    
    NAME      REFERENCE            TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
    podinfo   Deployment/podinfo   10056m/10 (avg)   1         5         2          4m45s

    A value of 2 appears in the REPLICAS column, which indicates that the current number of pods of the application is 2.

  3. Run the following command to initiate requests for 5 minutes. Set the number of requests per second to 15 and the number of concurrent requests that are processed at a time to 2.
    alias k="kubectl --kubeconfig $USER_CONFIG"
    loadtester=$(k -n test get pod -l "app=flagger-loadtester" -o jsonpath='{.items..metadata.name}')
    k -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 2 -q 15 http://podinfo:9898
  4. Run the following command to check whether the pods are scaled out as expected:
    watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total

    Expected output:

    Every 2.0s: kubectl --kubeconfig /Users/han/shop_config/ack_zjk -n test get hpa/podinfo                                            East6C16G: Tue Jan 26 18:01:30 2021
    
    NAME      REFERENCE            TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
    podinfo   Deployment/podinfo   10056m/10 (avg)   1         5         3         4m45s

    A value of 3 appears in the REPLICAS column, which indicates that the current number of pods of the application is 3. The result shows that the pods of the application are scaled out when the number of requests that are sent to the application increases. If you decrease the number of requests that are sent to the application to a specific level, a value of 1 appears in the REPLICAS column. The result shows that the pods of the application are scaled in when the number of requests that are sent to the application decreases.