Alibaba Cloud Service Mesh (ASM) collects telemetry data for Container Service for Kubernetes (ACK) clusters in a non-intrusive manner, which makes the service communication in the clusters observable. This telemetry feature makes service behaviors observable and helps O&M staff troubleshoot, maintain, and optimize applications without increasing maintenance costs. Based on the four key monitoring metrics, including latency, traffic, errors, and saturation, ASM generates a series of metrics for the services that it manages. This topic shows you how to implement auto scaling for workloads by using ASM metrics.

Prerequisites

Background information

ASM generates a series of metrics for the services that it manages. For more information, visit Istio Standard Metrics.

Auto scaling is an approach that is used to automatically scale up or down workloads based on the resource usage. In Kubernetes, two autoscalers are used to implement auto scaling.
  • Cluster Autoscaler (CA): CAs are used to increase or decrease the number of nodes in a cluster.
  • Horizontal Pod Autoscaler (HPA): HPAs are used to increase or decrease the number of pods that are used to deploy applications.
The aggregation layer of Kubernetes allows third-party applications to extend the Kubernetes API by registering themselves as API add-ons. These add-ons can be used to implement the custom metrics API and allow HPAs to query any metrics. HPAs periodically query core metrics such as CPU utilization and memory usage by using the resource metrics API. In addition, HPAs use the custom metrics API to query specific application metrics, such as the observability metrics that are provided by ASM.Auto scaling

Step 1: Enable Prometheus monitoring for the ASM instance

  1. Log on to the ASM console.
  2. In the left-side navigation pane, choose Service Mesh > Mesh Management.
  3. On the Mesh Management page, find the ASM instance that you want to configure. Click the name of the ASM instance or click Manage in the Actions column of the ASM instance.
  4. On the management page of the ASM instance, click Settings in the upper-right corner.
    Note Make sure that the Istio version of the ASM instance is 1.6.8.4 or later.
  5. In the Settings Update panel, select Enable Prometheus. Then, click OK.

    After that, ASM automatically configures the Envoy filters that are required for Prometheus.

Step 2: Deploy the adapter for the custom metrics API

  1. Download the installation package of the adapter. For more information, visit kube-metrics-adapter. Then, install and deploy the adapter for the custom metrics API in the ACK cluster.
    ## Use Helm 3.
    helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter  --set prometheus.url=http://prometheus.istio-system.svc:9090
  2. After the installation is completed, run the following commands to check whether kube-metrics-adapter is enabled.
    • Check whether the autoscaling/v2beta API group exists.
      kubectl api-versions |grep "autoscaling/v2beta"

      The following output is expected:

      autoscaling/v2beta
    • Check the status of the pod of kube-metrics-adapter.
      kubectl get po -n kube-system |grep metrics-adapter

      The following output is expected:

      asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2cm57          1/1     Running   0          19s
    • Query the custom metrics that are provided by kube-metrics-adapter.
      kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

      The following output is expected:

      {
        "kind": "APIResourceList",
        "apiVersion": "v1",
        "groupVersion": "external.metrics.k8s.io/v1beta1",
        "resources": []
      }

Step 3: Deploy a sample application

  1. Create a namespace named test. For more information, see Create a namespace.
  2. Enable automatic sidecar injection. For more information, see Install a sidecar proxy.
  3. Deploy a sample application.
    1. Create a file named podinfo.yaml.
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: podinfo
        namespace: test
        labels:
          app: podinfo
      spec:
        minReadySeconds: 5
        strategy:
          rollingUpdate:
            maxUnavailable: 0
          type: RollingUpdate
        selector:
          matchLabels:
            app: podinfo
        template:
          metadata:
            annotations:
              prometheus.io/scrape: "true"
            labels:
              app: podinfo
          spec:
            containers:
            - name: podinfod
              image: stefanprodan/podinfo:latest
              imagePullPolicy: IfNotPresent
              ports:
              - containerPort: 9898
                name: http
                protocol: TCP
              command:
              - ./podinfo
              - --port=9898
              - --level=info
              livenessProbe:
                exec:
                  command:
                  - podcli
                  - check
                  - http
                  - localhost:9898/healthz
                initialDelaySeconds: 5
                timeoutSeconds: 5
              readinessProbe:
                exec:
                  command:
                  - podcli
                  - check
                  - http
                  - localhost:9898/readyz
                initialDelaySeconds: 5
                timeoutSeconds: 5
              resources:
                limits:
                  cpu: 2000m
                  memory: 512Mi
                requests:
                  cpu: 100m
                  memory: 64Mi
      ---
      apiVersion: v1
      kind: Service
      metadata:
        name: podinfo
        namespace: test
        labels:
          app: podinfo
      spec:
        type: ClusterIP
        ports:
          - name: http
            port: 9898
            targetPort: 9898
            protocol: TCP
        selector:
          app: podinfo
    2. Deploy the podinfo application.
      kubectl apply -n test -f podinfo.yaml
  4. To trigger auto scaling, you must deploy a load testing service in the test namespace for triggering requests.
    1. Create a file named loadtester.yaml.
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: loadtester
        namespace: test
        labels:
          app: loadtester
      spec:
        selector:
          matchLabels:
            app: loadtester
        template:
          metadata:
            labels:
              app: loadtester
            annotations:
              prometheus.io/scrape: "true"
          spec:
            containers:
              - name: loadtester
                image: weaveworks/flagger-loadtester:0.18.0
                imagePullPolicy: IfNotPresent
                ports:
                  - name: http
                    containerPort: 8080
                command:
                  - ./loadtester
                  - -port=8080
                  - -log-level=info
                  - -timeout=1h
                livenessProbe:
                  exec:
                    command:
                      - wget
                      - --quiet
                      - --tries=1
                      - --timeout=4
                      - --spider
                      - http://localhost:8080/healthz
                  timeoutSeconds: 5
                readinessProbe:
                  exec:
                    command:
                      - wget
                      - --quiet
                      - --tries=1
                      - --timeout=4
                      - --spider
                      - http://localhost:8080/healthz
                  timeoutSeconds: 5
                resources:
                  limits:
                    memory: "512Mi"
                    cpu: "1000m"
                  requests:
                    memory: "32Mi"
                    cpu: "10m"
                securityContext:
                  readOnlyRootFilesystem: true
                  runAsUser: 10001
      ---
      apiVersion: v1
      kind: Service
      metadata:
        name: loadtester
        namespace: test
        labels:
          app: loadtester
      spec:
        type: ClusterIP
        selector:
          app: loadtester
        ports:
          - name: http
            port: 80
            protocol: TCP
            targetPort: http
    2. Deploy the load testing service.
      kubectl apply -n test -f loadtester.yaml
  5. Check whether the sample application and the load testing service are deployed.
    1. Check the pod status.
      kubectl get pod -n test

      The following output is expected:

      NAME                          READY   STATUS    RESTARTS   AGE
      loadtester-64df4846b9-nxhvv   2/2     Running   0          2m8s
      podinfo-6d845cc8fc-26xbq      2/2     Running   0          11m
    2. Log on to the container for load testing and run the hey command to generate loads.
      export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}')
      kubectl -n test exec -it ${loadtester} -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898
      A load is generated, which indicates that the sample application and the load testing service are deployed.

Step 4: Configure an HPA by using ASM metrics

Define an HPA to scale the workloads of the Podinfo application based on the number of requests that the Podinfo application receives per second. When more than 10 requests are received per second on average, the HPA increases the number of replicas.

  1. Create a file named hpa.yaml.
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
      name: podinfo
      namespace: test
      annotations:
        metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
          sum(
              rate(
                  istio_requests_total{
                    destination_workload="podinfo",
                    destination_workload_namespace="test",
                    reporter="destination"
                  }[1m]
              )
          ) 
    spec:
      maxReplicas: 10
      minReplicas: 1
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: podinfo
      metrics:
        - type: External
          external:
            metric:
              name: prometheus-query
              selector:
                matchLabels:
                  query-name: processed-requests-per-second
            target:
              type: AverageValue
              averageValue: "10"
  2. Deploy the HPA.
    kubectl apply -f hpa.yaml
  3. Check whether the HPA is deployed.

    Query the custom metrics that are provided by kube-metrics-adapter.

    kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

    The following output is expected:

    {
      "kind": "APIResourceList",
      "apiVersion": "v1",
      "groupVersion": "external.metrics.k8s.io/v1beta1",
      "resources": [
        {
          "name": "prometheus-query",
          "singularName": "",
          "namespaced": true,
          "kind": "ExternalMetricValueList",
          "verbs": [
            "get"
          ]
        }
      ]
    }

    The output contains the resource list of custom ASM metrics, which indicates that the HPA is deployed.

Verify auto scaling

  1. Log on to the container for load testing and run the hey command to generate loads.
    kubectl -n test exec -it ${loadtester} -- sh
    ~ $ hey -z 5m -c 10 -q 5 http://podinfo.test:9898
  2. View the effect of auto scaling.
    Note Metrics are synchronized every 30 seconds by default. The container can be scaled only once in every 3 to 5 minutes. This way, the HPA can reserve time for automatic scaling before the conflict strategy is executed.
    watch kubectl -n test get hpa/podinfo

    The following output is expected:

    NAME      REFERENCE            TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
    podinfo   Deployment/podinfo   8308m/10 (avg)   1         10        6          124m
    The HPA starts to scale up workloads in 1 minute until the number of requests per second decreases under the specified threshold. After the load testing is completed, the number of requests per second decreases to zero. Then, the HPA starts to decrease the number of pods. A few minutes later, the number of replicas decreases from the value in the preceding output to one.