All Products
Search
Document Center

Alibaba Cloud Service Mesh:Use ASM metrics to automatically scale workloads

Last Updated:Mar 11, 2026

Service Mesh (ASM) collects telemetry data from Container Service for Kubernetes (ACK) clusters without modifying application code. Based on four dimensions -- latency, traffic, errors, and saturation -- ASM produces Istio Standard Metrics for every managed service.

How it works

Feeding these metrics into a Kubernetes Horizontal Pod Autoscaler (HPA) lets you scale workloads based on real traffic patterns rather than CPU or memory usage alone. This guide walks through connecting ASM metrics to Prometheus, deploying a custom metrics adapter, and configuring an HPA that scales pods when requests per second exceed a threshold.

Kubernetes supports two autoscaling mechanisms:

  • Cluster Autoscaler (CA) -- adjusts the number of nodes in a cluster.

  • Horizontal Pod Autoscaler (HPA) -- adjusts the number of pod replicas for a workload.

By default, HPAs query core metrics such as CPU utilization and memory usage through the resource metrics API. To scale on application-specific metrics, you need the custom metrics API. The Kubernetes aggregation layer allows third-party adapters to register as API extensions, bridging external metric sources (like Prometheus) into the HPA control loop.

The following diagram illustrates the data flow:

Architecture diagram showing the relationship between ASM, Prometheus, the custom metrics adapter, and HPA

The data flows through four stages:

  1. ASM sidecar proxies emit Istio standard metrics (for example, istio_requests_total).

  2. Prometheus scrapes and stores these metrics.

  3. The kube-metrics-adapter queries Prometheus and exposes the results through the Kubernetes external metrics API.

  4. The HPA periodically reads these external metrics and adjusts the replica count.

Prerequisites

Before you begin, make sure you have:

Step 1: Enable Prometheus monitoring for ASM

Configure ASM to export metrics to Prometheus. For instructions, see Collect metrics to Managed Service for Prometheus.

Step 2: Deploy the custom metrics adapter

The kube-metrics-adapter bridges Prometheus metrics into the Kubernetes external metrics API, making them available to HPAs.

  1. Install the adapter using Helm 3. See the kube-metrics-adapter chart on GitHub for chart details. The prometheus.url parameter points to the Prometheus instance that scrapes ASM metrics.

       helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter \
         --set prometheus.url=http://prometheus.istio-system.svc:9090
  2. Verify the installation. Check that the autoscaling/v2beta API is registered: Expected output: Confirm the adapter pod is running: Expected output: Query the external metrics API to confirm the endpoint is available: Expected output: The resources array is empty because no HPA has registered a metric query yet. It populates after you configure an HPA in Step 4.

       kubectl api-versions | grep "autoscaling/v2beta"
       autoscaling/v2beta
       kubectl get po -n kube-system | grep metrics-adapter
       asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2****   1/1     Running   0          19s
       kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
       {
         "kind": "APIResourceList",
         "apiVersion": "v1",
         "groupVersion": "external.metrics.k8s.io/v1beta1",
         "resources": []
       }

Step 3: Deploy a sample application

To demonstrate auto scaling, deploy a sample application (podinfo) and a load testing tool in the same namespace.

  1. Create a namespace named test. See Manage namespaces and resource quotas.

  2. Enable automatic sidecar proxy injection for the test namespace so that ASM can collect metrics. See Enable automatic sidecar proxy injection.

  3. Deploy the podinfo application. Create a file named podinfo.yaml with the following content: Apply the manifest:

    Expand to view the YAML content

       apiVersion: apps/v1
       kind: Deployment
       metadata:
         name: podinfo
         namespace: test
         labels:
           app: podinfo
       spec:
         minReadySeconds: 5
         strategy:
           rollingUpdate:
             maxUnavailable: 0
           type: RollingUpdate
         selector:
           matchLabels:
             app: podinfo
         template:
           metadata:
             annotations:
               prometheus.io/scrape: "true"
             labels:
               app: podinfo
           spec:
             containers:
             - name: podinfod
               image: stefanprodan/podinfo:latest
               imagePullPolicy: IfNotPresent
               ports:
               - containerPort: 9898
                 name: http
                 protocol: TCP
               command:
               - ./podinfo
               - --port=9898
               - --level=info
               livenessProbe:
                 exec:
                   command:
                   - podcli
                   - check
                   - http
                   - localhost:9898/healthz
                 initialDelaySeconds: 5
                 timeoutSeconds: 5
               readinessProbe:
                 exec:
                   command:
                   - podcli
                   - check
                   - http
                   - localhost:9898/readyz
                 initialDelaySeconds: 5
                 timeoutSeconds: 5
               resources:
                 limits:
                   cpu: 2000m
                   memory: 512Mi
                 requests:
                   cpu: 100m
                   memory: 64Mi
       ---
       apiVersion: v1
       kind: Service
       metadata:
         name: podinfo
         namespace: test
         labels:
           app: podinfo
       spec:
         type: ClusterIP
         ports:
           - name: http
             port: 9898
             targetPort: 9898
             protocol: TCP
         selector:
           app: podinfo
       kubectl apply -n test -f podinfo.yaml
  4. Deploy the load testing tool. Create a file named loadtester.yaml with the following content: Apply the manifest:

    Expand to view the YAML content

       apiVersion: apps/v1
       kind: Deployment
       metadata:
         name: loadtester
         namespace: test
         labels:
           app: loadtester
       spec:
         selector:
           matchLabels:
             app: loadtester
         template:
           metadata:
             labels:
               app: loadtester
             annotations:
               prometheus.io/scrape: "true"
           spec:
             containers:
               - name: loadtester
                 image: weaveworks/flagger-loadtester:0.18.0
                 imagePullPolicy: IfNotPresent
                 ports:
                   - name: http
                     containerPort: 8080
                 command:
                   - ./loadtester
                   - -port=8080
                   - -log-level=info
                   - -timeout=1h
                 livenessProbe:
                   exec:
                     command:
                       - wget
                       - --quiet
                       - --tries=1
                       - --timeout=4
                       - --spider
                       - http://localhost:8080/healthz
                   timeoutSeconds: 5
                 readinessProbe:
                   exec:
                     command:
                       - wget
                       - --quiet
                       - --tries=1
                       - --timeout=4
                       - --spider
                       - http://localhost:8080/healthz
                   timeoutSeconds: 5
                 resources:
                   limits:
                     memory: "512Mi"
                     cpu: "1000m"
                   requests:
                     memory: "32Mi"
                     cpu: "10m"
                 securityContext:
                   readOnlyRootFilesystem: true
                   runAsUser: 10001
       ---
       apiVersion: v1
       kind: Service
       metadata:
         name: loadtester
         namespace: test
         labels:
           app: loadtester
       spec:
         type: ClusterIP
         selector:
           app: loadtester
         ports:
           - name: http
             port: 80
             protocol: TCP
             targetPort: http
       kubectl apply -n test -f loadtester.yaml
  5. Verify that both pods are running. Expected output: Each pod shows 2/2 because the ASM sidecar proxy runs alongside the application container.

       kubectl get pod -n test
       NAME                          READY   STATUS    RESTARTS   AGE
       loadtester-64df4846b9-nxhvv   2/2     Running   0          2m8s
       podinfo-6d845cc8fc-26xbq      2/2     Running   0          11m
  6. Send a test request to confirm connectivity. A successful response confirms that the load tester can reach the podinfo service through the mesh.

       export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}')
       kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898

Step 4: Configure an HPA with ASM metrics

Define an HPA that scales the podinfo Deployment based on incoming request rate. When the average requests per second exceeds 10, the HPA adds replicas. When traffic drops, it removes them.

  1. Create a file named hpa.yaml with the following content: About the PromQL query: sum(rate(istio_requests_total{...}[1m])) calculates the per-second request rate to the podinfo workload over a 1-minute window, summed across all instances. The label filters restrict the query to requests reported by the destination sidecar (reporter="destination") targeting the podinfo workload in the test namespace. How the annotation-to-metric mapping works: The kube-metrics-adapter uses a naming convention to connect HPA metric specs with PromQL queries:

    1. You define a PromQL query in an annotation: metric-config.external.prometheus-query.prometheus/processed-requests-per-second.

    2. In spec.metrics, you reference prometheus-query as the metric name and processed-requests-per-second as the query-name label.

    3. The adapter matches the label value to the annotation suffix to find the corresponding query.

    HPA field reference:

    FieldDescription
    annotationsEmbeds the PromQL query. The kube-metrics-adapter reads queries from annotations that follow the naming convention metric-config.external.prometheus-query.prometheus/<query-name>.
    maxReplicas / minReplicasThe replica range. The HPA scales between 1 and 10 replicas.
    scaleTargetRefThe Deployment that the HPA manages.
    metrics[].type: ExternalTells the HPA to read from the external metrics API rather than the resource metrics API. The External type is used because the metric comes from an external monitoring system (Prometheus) rather than from a Kubernetes object directly.
    metrics[].external.metric.selector.matchLabels.query-nameLinks the metric spec to the annotation-defined PromQL query by name (processed-requests-per-second).
    target.type: AverageValueThe threshold is evaluated per pod. With averageValue: "10", the HPA scales up when the average per-pod request rate exceeds 10 requests per second.
       apiVersion: autoscaling/v2beta2
       kind: HorizontalPodAutoscaler
       metadata:
         name: podinfo
         namespace: test
         annotations:
           metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
             sum(
                 rate(
                     istio_requests_total{
                       destination_workload="podinfo",
                       destination_workload_namespace="test",
                       reporter="destination"
                     }[1m]
                 )
             )
       spec:
         maxReplicas: 10
         minReplicas: 1
         scaleTargetRef:
           apiVersion: apps/v1
           kind: Deployment
           name: podinfo
         metrics:
           - type: External
             external:
               metric:
                 name: prometheus-query
                 selector:
                   matchLabels:
                     query-name: processed-requests-per-second
               target:
                 type: AverageValue
                 averageValue: "10"
  2. Apply the HPA:

       kubectl apply -f hpa.yaml
  3. Verify that the external metrics API now lists the registered metric. Expected output: The prometheus-query resource appears in the list, which confirms that the adapter is serving the HPA's metric query.

       kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
       {
         "kind": "APIResourceList",
         "apiVersion": "v1",
         "groupVersion": "external.metrics.k8s.io/v1beta1",
         "resources": [
           {
             "name": "prometheus-query",
             "singularName": "",
             "namespaced": true,
             "kind": "ExternalMetricValueList",
             "verbs": [
               "get"
             ]
           }
         ]
       }

Verify auto scaling

Generate sustained traffic and observe the HPA scaling the podinfo Deployment up and back down.

  1. Open a shell in the load tester container and run a 5-minute load test: Inside the container, run:

       kubectl -n test exec -it ${loadtester} -c loadtester -- sh
       hey -z 5m -c 10 -q 5 http://podinfo.test:9898
  2. In a separate terminal, watch the HPA status: Expected output (after approximately 1 minute): The TARGETS column shows the current metric value (8308m, or about 8.3 requests per second per pod) against the threshold (10). REPLICAS shows the scaled-up count.

    Note

    The adapter synchronizes metrics every 30 seconds by default. The container can be scaled only once every 3 to 5 minutes. This allows the HPA to reserve time for automatic scaling before the conflict strategy is executed.

       watch kubectl -n test get hpa/podinfo
       NAME      REFERENCE            TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
       podinfo   Deployment/podinfo   8308m/10 (avg)   1         10        6          124m
  3. After the load test finishes, requests drop to zero. The HPA gradually reduces replicas back to 1 over several minutes.