All Products
Search
Document Center

Alibaba Cloud Service Mesh:Implement auto scaling for workloads by using ASM metrics

Last Updated:Dec 25, 2025

Service Mesh (ASM) provides a non-intrusive method to generate telemetry data for service communication within Alibaba Cloud Container Service for Kubernetes (ACK) and Alibaba Cloud Container Service (ACS) clusters. This telemetry feature provides observability into service behavior. It helps operations and maintenance (O&M) engineers troubleshoot, maintain, and optimize applications without requiring changes to the application code. Based on the four golden signals of monitoring (latency, traffic, errors, and saturation), ASM generates a series of metrics for the services that it manages. This topic describes how to use ASM metrics to implement automatic scaling for workloads.

Prerequisites

Background information

Service Mesh generates a series of metrics for the services that it manages. For more information, see Istio standard metrics.

Automatic scaling is a method to automatically scale workloads up or down based on resource usage. Kubernetes provides two dimensions for automatic scaling:

  • Cluster Autoscaler (CA): handles node scaling operations to increase or decrease the number of nodes.

  • Horizontal Pod Autoscaler (HPA): automatically scales the number of pods in a deployment.

The aggregation layer in Kubernetes allows third-party applications to extend the Kubernetes API by registering as API add-on components. These add-on components can implement the Custom Metrics API and allow the HPA to access any metric. The HPA periodically queries core metrics, such as CPU or memory, through the Resource Metrics API. It also retrieves application-specific metrics, including the observability metrics that ASM provides, through the Custom Metrics API.

image

Step 1: Enable collection of Prometheus monitoring metrics

For more information, see Collect monitoring metrics in Managed Service for Prometheus.

Step 2: Deploy the custom metrics API adapter

  1. Download and install the kube-metrics-adapter to the ACK cluster.

    helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter  --set prometheus.url=http://prometheus.istio-system.svc:9090
  2. Confirm that kube-metrics-adapter is enabled.

    1. Confirm that autoscaling/v2 exists.

      kubectl api-versions | grep "autoscaling/v2"

      Expected output:

      autoscaling/v2
    2. Check the status of the kube-metrics-adapter pod.

      kubectl get po -n kube-system |grep metrics-adapter

      Expected output:

      asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2****          1/1     Running   0          19s
    3. List the custom external metrics that the Prometheus adapter provides.

      kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

      Expected output:

      {
        "kind": "APIResourceList",
        "apiVersion": "v1",
        "groupVersion": "external.metrics.k8s.io/v1beta1",
        "resources": []
      }

Step 3: Deploy the sample application

  1. Create the test namespace and enable automatic sidecar injection. For more information, see Manage namespaces and quotas and Enable automatic injection.

  2. Deploy the sample application.

    1. Create a file named podinfo.yaml with the following content.

      Click to view the YAML content

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: podinfo
        namespace: test
        labels:
          app: podinfo
      spec:
        minReadySeconds: 5
        strategy:
          rollingUpdate:
            maxUnavailable: 0
          type: RollingUpdate
        selector:
          matchLabels:
            app: podinfo
        template:
          metadata:
            annotations:
              prometheus.io/scrape: "true"
            labels:
              app: podinfo
          spec:
            containers:
            - name: podinfod
              image: stefanprodan/podinfo:latest
              imagePullPolicy: IfNotPresent
              ports:
              - containerPort: 9898
                name: http
                protocol: TCP
              command:
              - ./podinfo
              - --port=9898
              - --level=info
              livenessProbe:
                exec:
                  command:
                  - podcli
                  - check
                  - http
                  - localhost:9898/healthz
                initialDelaySeconds: 5
                timeoutSeconds: 5
              readinessProbe:
                exec:
                  command:
                  - podcli
                  - check
                  - http
                  - localhost:9898/readyz
                initialDelaySeconds: 5
                timeoutSeconds: 5
              resources:
                limits:
                  cpu: 2000m
                  memory: 512Mi
                requests:
                  cpu: 100m
                  memory: 64Mi
      ---
      apiVersion: v1
      kind: Service
      metadata:
        name: podinfo
        namespace: test
        labels:
          app: podinfo
      spec:
        type: ClusterIP
        ports:
          - name: http
            port: 9898
            targetPort: 9898
            protocol: TCP
        selector:
          app: podinfo
    2. Deploy podinfo.

      kubectl apply -n test -f podinfo.yaml
  3. Deploy a load testing service in the test namespace to trigger automatic scaling.

    1. Create a file named loadtester.yaml.

      Click to view the YAML content

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: loadtester
        namespace: test
        labels:
          app: loadtester
      spec:
        selector:
          matchLabels:
            app: loadtester
        template:
          metadata:
            labels:
              app: loadtester
            annotations:
              prometheus.io/scrape: "true"
          spec:
            containers:
              - name: loadtester
                image: weaveworks/flagger-loadtester:0.18.0
                imagePullPolicy: IfNotPresent
                ports:
                  - name: http
                    containerPort: 8080
                command:
                  - ./loadtester
                  - -port=8080
                  - -log-level=info
                  - -timeout=1h
                livenessProbe:
                  exec:
                    command:
                      - wget
                      - --quiet
                      - --tries=1
                      - --timeout=4
                      - --spider
                      - http://localhost:8080/healthz
                  timeoutSeconds: 5
                readinessProbe:
                  exec:
                    command:
                      - wget
                      - --quiet
                      - --tries=1
                      - --timeout=4
                      - --spider
                      - http://localhost:8080/healthz
                  timeoutSeconds: 5
                resources:
                  limits:
                    memory: "512Mi"
                    cpu: "1000m"
                  requests:
                    memory: "32Mi"
                    cpu: "10m"
                securityContext:
                  readOnlyRootFilesystem: true
                  runAsUser: 10001
      ---
      apiVersion: v1
      kind: Service
      metadata:
        name: loadtester
        namespace: test
        labels:
          app: loadtester
      spec:
        type: ClusterIP
        selector:
          app: loadtester
        ports:
          - name: http
            port: 80
            protocol: TCP
            targetPort: http
    2. Deploy the load testing service.

      kubectl apply -n test -f loadtester.yaml
  4. Verify that the sample application and the load testing service are deployed.

    1. Check the pod status.

      kubectl get pod -n test

      Expected output:

      NAME                          READY   STATUS    RESTARTS   AGE
      loadtester-64df4846b9-nxhvv   2/2     Running   0          2m8s
      podinfo-6d845cc8fc-26xbq      2/2     Running   0          11m
    2. Log on to the load tester container and generate a load.

      export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}')
      kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898

      A successful response indicates that a load is generated and that the sample application and the load testing service are deployed.

Step 4: Configure an HPA using ASM metrics

Define an HPA that scales the podinfo workload based on the number of requests received per second. When the average traffic load exceeds 10 requests per second, the HPA scales out the deployment.

Note

Note: This example uses HPA API version autoscaling/v2, which applies to Kubernetes 1.23 and later. For clusters that run Kubernetes 1.26 or later, use version v2. The v2beta2 version was removed in Kubernetes 1.26.

  1. Create a file named hpa.yaml.

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: podinfo
      namespace: test
      annotations:
        metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
          sum(
              rate(
                  istio_requests_total{
                    destination_workload="podinfo",
                    destination_workload_namespace="test",
                    reporter="destination"
                  }[1m]
              )
          ) 
    spec:
      maxReplicas: 10
      minReplicas: 1
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: podinfo
      metrics:
        - type: External
          external:
            metric:
              name: prometheus-query
              selector:
                matchLabels:
                  query-name: processed-requests-per-second
            target:
              type: AverageValue
              averageValue: "10"
  2. Deploy the HPA.

    kubectl apply -f hpa.yaml
  3. Verify that the HPA is deployed.

    List the custom external metrics that the Prometheus adapter provides.

    kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

    Expected output:

    {
      "kind": "APIResourceList",
      "apiVersion": "v1",
      "groupVersion": "external.metrics.k8s.io/v1beta1",
      "resources": [
        {
          "name": "prometheus-query",
          "singularName": "",
          "namespaced": true,
          "kind": "ExternalMetricValueList",
          "verbs": [
            "get"
          ]
        }
      ]
    }

    The output contains the resource list of custom ASM metrics. This indicates that the HPA is deployed successfully.

Verify automatic scaling

  1. Log on to the load tester container to generate workload requests.

    kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 10 -q 5 http://podinfo.test:9898
  2. Check the automatic scaling status.

    Note

    By default, metrics are synchronized every 30 seconds. A scaling operation can occur only if the workload has not been rescaled in the last 3 to 5 minutes. This prevents the HPA from making rapid, conflicting decisions and allows time for the cluster autoscaler to operate.

    watch kubectl -n test get hpa/podinfo

    Expected output:

    NAME      REFERENCE            TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
    podinfo   Deployment/podinfo   8308m/10 (avg)   1         10        6          124m

    After one minute, the HPA starts to scale up the workload until the number of requests per second falls below the target value. When the load test is complete, the number of requests per second drops to zero, and the HPA starts to scale down the number of workload pods. After a few minutes, the number of replicas in the command output returns to one.