All Products
Search
Document Center

Alibaba Cloud Service Mesh:Use ASM metrics to automatically scale workloads

Last Updated:Mar 11, 2026

Service Mesh (ASM) generates telemetry data for every request flowing through your service mesh, without any changes to your application code. Based on the four golden signals of monitoring (latency, traffic, errors, and saturation), ASM produces a set of metrics for the services it manages. Feed these metrics into a Kubernetes Horizontal Pod Autoscaler (HPA) to scale workloads based on real traffic signals such as request rate, rather than relying solely on CPU or memory utilization.

This walkthrough covers the end-to-end setup: collecting ASM metrics in Prometheus, deploying a custom metrics adapter, and configuring an HPA that scales a sample workload when the average request rate exceeds a threshold.

How it works

Kubernetes supports two dimensions of automatic scaling:

  • Cluster Autoscaler (CA) -- adds or removes nodes based on pending pod scheduling demands.

  • Horizontal Pod Autoscaler (HPA) -- adjusts the replica count of a Deployment or StatefulSet.

The aggregation layer in Kubernetes allows third-party applications to extend the Kubernetes API by registering as API add-on components. These add-on components can implement the Custom Metrics API and allow the HPA to access any metric. The HPA retrieves metrics from three Kubernetes APIs:

APIPurposeExample metrics
Resource Metrics API (metrics.k8s.io)Core resource metricsCPU, memory
Custom Metrics API (custom.metrics.k8s.io)Application-specific metrics tied to Kubernetes objectsPods, Ingresses
External Metrics API (external.metrics.k8s.io)Metrics from systems outside the clusterPrometheus queries, cloud monitoring

ASM metrics originate from Prometheus, which sits outside the Kubernetes object model. This makes the External Metrics API the right choice. The kube-metrics-adapter bridges Prometheus and the External Metrics API so the HPA can query ASM metrics directly.

Architecture: ASM sidecar proxies report Istio metrics to Prometheus. The kube-metrics-adapter exposes those metrics through the External Metrics API. The HPA reads the metrics and scales the target Deployment.

Available ASM metrics for autoscaling

ASM generates Istio standard metrics for HTTP, HTTP/2, and gRPC traffic. The most commonly used metrics for autoscaling are:

MetricTypeUse caseExample PromQL
istio_requests_totalCounterScale on request ratesum(rate(istio_requests_total{destination_workload="<workload>"}[1m]))
istio_request_duration_millisecondsDistributionScale on P99 latencyhistogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{destination_workload="<workload>"}[1m])) by (le))
istio_requests_total (filtered)CounterScale on error ratesum(rate(istio_requests_total{destination_workload="<workload>", response_code=~"5.."}[1m]))

Key labels used in these queries:

  • destination_workload -- the name of the target workload

  • destination_workload_namespace -- the namespace of the target workload

  • reporter -- set to destination for server-side reporting, source for client-side

This tutorial uses istio_requests_total to scale based on request rate. Replace the PromQL query with one of the alternatives above to scale on latency or error rate instead.

Prerequisites

Before you begin, make sure that you have:

Step 1: Enable Prometheus metric collection

Enable ASM to export metrics to your Prometheus instance. For detailed instructions, see Collect monitoring metrics in Managed Service for Prometheus.

Step 2: Deploy the custom metrics adapter

Install kube-metrics-adapter, which registers itself as a provider for the External Metrics API and translates HPA metric requests into Prometheus queries.

  1. Install the adapter with Helm. Replace http://prometheus.istio-system.svc:9090 with the URL of your Prometheus server if it runs in a different namespace or uses a different service name.

       helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter \
         --set prometheus.url=http://prometheus.istio-system.svc:9090
  2. Verify the installation.

    1. Confirm that the autoscaling/v2 API is available.

         kubectl api-versions | grep "autoscaling/v2"

      Expected output:

         autoscaling/v2
    2. Check that the adapter pod is running.

         kubectl get po -n kube-system | grep metrics-adapter

      Expected output:

         asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2****   1/1     Running   0   19s
    3. Confirm that the External Metrics API endpoint is registered.

         kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

      Expected output:

         {
           "kind": "APIResourceList",
           "apiVersion": "v1",
           "groupVersion": "external.metrics.k8s.io/v1beta1",
           "resources": []
         }

      The resources array is empty because no HPA has registered a metric query yet. It populates after you deploy an HPA in Step 4.

Step 3: Deploy the sample application

Create the namespace

Create a namespace called test and enable automatic sidecar injection. For more information, see Manage namespaces and quotas and Enable automatic injection.

Deploy the podinfo application

  1. Create a file named podinfo.yaml with the following content.

    podinfo.yaml

       apiVersion: apps/v1
       kind: Deployment
       metadata:
         name: podinfo
         namespace: test
         labels:
           app: podinfo
       spec:
         minReadySeconds: 5
         strategy:
           rollingUpdate:
             maxUnavailable: 0
           type: RollingUpdate
         selector:
           matchLabels:
             app: podinfo
         template:
           metadata:
             annotations:
               prometheus.io/scrape: "true"
             labels:
               app: podinfo
           spec:
             containers:
             - name: podinfod
               image: stefanprodan/podinfo:latest
               imagePullPolicy: IfNotPresent
               ports:
               - containerPort: 9898
                 name: http
                 protocol: TCP
               command:
               - ./podinfo
               - --port=9898
               - --level=info
               livenessProbe:
                 exec:
                   command:
                   - podcli
                   - check
                   - http
                   - localhost:9898/healthz
                 initialDelaySeconds: 5
                 timeoutSeconds: 5
               readinessProbe:
                 exec:
                   command:
                   - podcli
                   - check
                   - http
                   - localhost:9898/readyz
                 initialDelaySeconds: 5
                 timeoutSeconds: 5
               resources:
                 limits:
                   cpu: 2000m
                   memory: 512Mi
                 requests:
                   cpu: 100m
                   memory: 64Mi
       ---
       apiVersion: v1
       kind: Service
       metadata:
         name: podinfo
         namespace: test
         labels:
           app: podinfo
       spec:
         type: ClusterIP
         ports:
           - name: http
             port: 9898
             targetPort: 9898
             protocol: TCP
         selector:
           app: podinfo
  2. Apply the manifest.

       kubectl apply -n test -f podinfo.yaml

Deploy the load tester

  1. Create a file named loadtester.yaml with the following content.

    loadtester.yaml

       apiVersion: apps/v1
       kind: Deployment
       metadata:
         name: loadtester
         namespace: test
         labels:
           app: loadtester
       spec:
         selector:
           matchLabels:
             app: loadtester
         template:
           metadata:
             labels:
               app: loadtester
             annotations:
               prometheus.io/scrape: "true"
           spec:
             containers:
               - name: loadtester
                 image: weaveworks/flagger-loadtester:0.18.0
                 imagePullPolicy: IfNotPresent
                 ports:
                   - name: http
                     containerPort: 8080
                 command:
                   - ./loadtester
                   - -port=8080
                   - -log-level=info
                   - -timeout=1h
                 livenessProbe:
                   exec:
                     command:
                       - wget
                       - --quiet
                       - --tries=1
                       - --timeout=4
                       - --spider
                       - http://localhost:8080/healthz
                   timeoutSeconds: 5
                 readinessProbe:
                   exec:
                     command:
                       - wget
                       - --quiet
                       - --tries=1
                       - --timeout=4
                       - --spider
                       - http://localhost:8080/healthz
                   timeoutSeconds: 5
                 resources:
                   limits:
                     memory: "512Mi"
                     cpu: "1000m"
                   requests:
                     memory: "32Mi"
                     cpu: "10m"
                 securityContext:
                   readOnlyRootFilesystem: true
                   runAsUser: 10001
       ---
       apiVersion: v1
       kind: Service
       metadata:
         name: loadtester
         namespace: test
         labels:
           app: loadtester
       spec:
         type: ClusterIP
         selector:
           app: loadtester
         ports:
           - name: http
             port: 80
             protocol: TCP
             targetPort: http
  2. Apply the manifest.

       kubectl apply -n test -f loadtester.yaml

Verify the deployment

  1. Check that both pods are running with sidecar injection (2/2 containers). Expected output:

       kubectl get pod -n test
       NAME                          READY   STATUS    RESTARTS   AGE
       loadtester-64df4846b9-nxhvv   2/2     Running   0          2m8s
       podinfo-6d845cc8fc-26xbq      2/2     Running   0          11m
  2. Send a test request from the load tester to confirm connectivity. A successful response confirms that the sample application is reachable through the mesh.

       export loadtester=$(kubectl -n test get pod -l "app=loadtester" \
         -o jsonpath='{.items[0].metadata.name}')
       kubectl -n test exec -it ${loadtester} -c loadtester -- \
         hey -z 5s -c 10 -q 2 http://podinfo.test:9898

Step 4: Configure the HPA with ASM metrics

Define an HPA that scales the podinfo Deployment based on the request rate. The configuration maps three pieces together:

What you configureWhere it goesPurpose
PromQL queryHPA annotationDefines which Prometheus metric to query
Metric name + labelspec.metrics[].externalLookup key the HPA uses to reference the annotation
Target thresholdspec.metrics[].external.targetRequest rate at which scale-out begins
This example uses HPA API version autoscaling/v2, which requires Kubernetes 1.23 or later. The v2beta2 version was removed in Kubernetes 1.26.
  1. Create a file named hpa.yaml with the following content.

    Key fields explained:

    FieldDescription
    metric-config.external.prometheus-query.prometheus/processed-requests-per-secondAnnotation that embeds the PromQL query. The suffix (processed-requests-per-second) is a user-defined name that links the annotation to the HPA metric selector.
    metrics[].external.metric.nameMust be prometheus-query -- the fixed identifier that kube-metrics-adapter recognizes.
    metrics[].external.metric.selector.matchLabels.query-nameMust match the suffix of the annotation key (processed-requests-per-second).
    target.averageValueThe threshold per pod. The HPA divides the total query result by the current replica count and compares it to this value.
       apiVersion: autoscaling/v2
       kind: HorizontalPodAutoscaler
       metadata:
         name: podinfo
         namespace: test
         annotations:
           # Annotation format: metric-config.external.prometheus-query.prometheus/<query-name>
           # <query-name> must match the query-name label in spec.metrics below.
           # The value is the PromQL query that kube-metrics-adapter sends to Prometheus.
           metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
             sum(
                 rate(
                     istio_requests_total{
                       destination_workload="podinfo",
                       destination_workload_namespace="test",
                       reporter="destination"
                     }[1m]
                 )
             )
       spec:
         maxReplicas: 10
         minReplicas: 1
         scaleTargetRef:
           apiVersion: apps/v1
           kind: Deployment
           name: podinfo
         metrics:
           - type: External          # Use External because Prometheus metrics are not Kubernetes objects
             external:
               metric:
                 name: prometheus-query                  # Fixed metric name for kube-metrics-adapter
                 selector:
                   matchLabels:
                     query-name: processed-requests-per-second  # Must match the annotation key above
               target:
                 type: AverageValue
                 averageValue: "10"   # Scale out when the average request rate exceeds 10 req/s per pod
  2. Apply the HPA.

       kubectl apply -f hpa.yaml
  3. Verify that the External Metrics API now lists the registered metric. Expected output: The prometheus-query resource confirms that the HPA registered its metric with the adapter.

       kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
       {
         "kind": "APIResourceList",
         "apiVersion": "v1",
         "groupVersion": "external.metrics.k8s.io/v1beta1",
         "resources": [
           {
             "name": "prometheus-query",
             "singularName": "",
             "namespaced": true,
             "kind": "ExternalMetricValueList",
             "verbs": [
               "get"
             ]
           }
         ]
       }

Verify automatic scaling

  1. Generate sustained load from the load tester pod. This sends 50 requests per second (10 concurrent workers at 5 QPS each) for 5 minutes.

       # Get the load tester pod name (skip this if $loadtester is already set from Step 3)
       export loadtester=$(kubectl -n test get pod -l "app=loadtester" \
         -o jsonpath='{.items[0].metadata.name}')
    
       kubectl -n test exec -it ${loadtester} -c loadtester -- \
         hey -z 5m -c 10 -q 5 http://podinfo.test:9898
  2. Monitor the HPA in a separate terminal. Expected output (after approximately one minute):

    Note

    The HPA syncs metrics every 30 seconds. A scaling decision takes effect only if no scaling event occurred in the previous 3--5 minutes. This cooldown prevents rapid, conflicting scaling decisions and gives the Cluster Autoscaler time to provision new nodes if needed.

       watch kubectl -n test get hpa/podinfo
       NAME      REFERENCE            TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
       podinfo   Deployment/podinfo   8308m/10 (avg)   1         10        6          124m
  3. Observe scale-down after the load test ends. When the load test completes, the request rate drops to zero. The HPA gradually scales down the Deployment until the replica count returns to 1.

What's next

  • To scale on latency or error rate instead of request rate, replace the PromQL query in the HPA annotation with one of the alternatives from the Available ASM metrics for autoscaling table.

  • To tune scale-up and scale-down behavior (stabilization windows, scaling policies), add a behavior field to the HPA spec. See Configurable scaling behavior in the Kubernetes documentation.

  • For a full list of Istio metrics and their labels, see Istio standard metrics.