Configure HPA to autoscale workloads using ASM metrics - Alibaba Cloud Service Mesh

Service Mesh (ASM) collects telemetry data from Container Service for Kubernetes (ACK) clusters without modifying application code. Based on four dimensions -- latency, traffic, errors, and saturation -- ASM produces Istio Standard Metrics for every managed service.

How it works

Feeding these metrics into a Kubernetes Horizontal Pod Autoscaler (HPA) lets you scale workloads based on real traffic patterns rather than CPU or memory usage alone. This guide walks through connecting ASM metrics to Prometheus, deploying a custom metrics adapter, and configuring an HPA that scales pods when requests per second exceed a threshold.

Kubernetes supports two autoscaling mechanisms:

Cluster Autoscaler (CA) -- adjusts the number of nodes in a cluster.
Horizontal Pod Autoscaler (HPA) -- adjusts the number of pod replicas for a workload.

By default, HPAs query core metrics such as CPU utilization and memory usage through the resource metrics API. To scale on application-specific metrics, you need the custom metrics API. The Kubernetes aggregation layer allows third-party adapters to register as API extensions, bridging external metric sources (like Prometheus) into the HPA control loop.

The following diagram illustrates the data flow:

Architecture diagram showing the relationship between ASM, Prometheus, the custom metrics adapter, and HPA

The data flows through four stages:

ASM sidecar proxies emit Istio standard metrics (for example, istio_requests_total).
Prometheus scrapes and stores these metrics.
The kube-metrics-adapter queries Prometheus and exposes the results through the Kubernetes external metrics API.
The HPA periodically reads these external metrics and adjusts the replica count.

Prerequisites

Before you begin, make sure you have:

An ACK managed cluster -- see Create an ACK managed cluster
An ASM instance -- see Create an ASM instance
A Prometheus instance and a Grafana instance deployed in the ACK cluster -- see Use open source Prometheus to monitor an ACK cluster
Prometheus monitoring enabled for the ASM instance -- see Monitor ASM instances by using a self-managed Prometheus instance

Step 1: Enable Prometheus monitoring for ASM

Configure ASM to export metrics to Prometheus. For instructions, see Collect metrics to Managed Service for Prometheus.

Step 2: Deploy the custom metrics adapter

The kube-metrics-adapter bridges Prometheus metrics into the Kubernetes external metrics API, making them available to HPAs.

Install the adapter using Helm 3. See the kube-metrics-adapter chart on GitHub for chart details. The prometheus.url parameter points to the Prometheus instance that scrapes ASM metrics.
```
   helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter \
     --set prometheus.url=http://prometheus.istio-system.svc:9090
```

Verify the installation. Check that the autoscaling/v2beta API is registered: Expected output: Confirm the adapter pod is running: Expected output: Query the external metrics API to confirm the endpoint is available: Expected output: The resources array is empty because no HPA has registered a metric query yet. It populates after you configure an HPA in Step 4.

   kubectl api-versions | grep "autoscaling/v2beta"

   autoscaling/v2beta

   kubectl get po -n kube-system | grep metrics-adapter

   asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2****   1/1     Running   0          19s

   kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

   {
     "kind": "APIResourceList",
     "apiVersion": "v1",
     "groupVersion": "external.metrics.k8s.io/v1beta1",
     "resources": []
   }

Step 3: Deploy a sample application

To demonstrate auto scaling, deploy a sample application (podinfo) and a load testing tool in the same namespace.

Create a namespace named test. See Manage namespaces and resource quotas.
Enable automatic sidecar proxy injection for the test namespace so that ASM can collect metrics. See Enable automatic sidecar proxy injection.

Deploy the podinfo application. Create a file named podinfo.yaml with the following content: Apply the manifest:

Expand to view the YAML content

   apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: podinfo
     namespace: test
     labels:
       app: podinfo
   spec:
     minReadySeconds: 5
     strategy:
       rollingUpdate:
         maxUnavailable: 0
       type: RollingUpdate
     selector:
       matchLabels:
         app: podinfo
     template:
       metadata:
         annotations:
           prometheus.io/scrape: "true"
         labels:
           app: podinfo
       spec:
         containers:
         - name: podinfod
           image: stefanprodan/podinfo:latest
           imagePullPolicy: IfNotPresent
           ports:
           - containerPort: 9898
             name: http
             protocol: TCP
           command:
           - ./podinfo
           - --port=9898
           - --level=info
           livenessProbe:
             exec:
               command:
               - podcli
               - check
               - http
               - localhost:9898/healthz
             initialDelaySeconds: 5
             timeoutSeconds: 5
           readinessProbe:
             exec:
               command:
               - podcli
               - check
               - http
               - localhost:9898/readyz
             initialDelaySeconds: 5
             timeoutSeconds: 5
           resources:
             limits:
               cpu: 2000m
               memory: 512Mi
             requests:
               cpu: 100m
               memory: 64Mi
   ---
   apiVersion: v1
   kind: Service
   metadata:
     name: podinfo
     namespace: test
     labels:
       app: podinfo
   spec:
     type: ClusterIP
     ports:
       - name: http
         port: 9898
         targetPort: 9898
         protocol: TCP
     selector:
       app: podinfo

   kubectl apply -n test -f podinfo.yaml

Deploy the load testing tool. Create a file named loadtester.yaml with the following content: Apply the manifest:

Expand to view the YAML content

   apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: loadtester
     namespace: test
     labels:
       app: loadtester
   spec:
     selector:
       matchLabels:
         app: loadtester
     template:
       metadata:
         labels:
           app: loadtester
         annotations:
           prometheus.io/scrape: "true"
       spec:
         containers:
           - name: loadtester
             image: weaveworks/flagger-loadtester:0.18.0
             imagePullPolicy: IfNotPresent
             ports:
               - name: http
                 containerPort: 8080
             command:
               - ./loadtester
               - -port=8080
               - -log-level=info
               - -timeout=1h
             livenessProbe:
               exec:
                 command:
                   - wget
                   - --quiet
                   - --tries=1
                   - --timeout=4
                   - --spider
                   - http://localhost:8080/healthz
               timeoutSeconds: 5
             readinessProbe:
               exec:
                 command:
                   - wget
                   - --quiet
                   - --tries=1
                   - --timeout=4
                   - --spider
                   - http://localhost:8080/healthz
               timeoutSeconds: 5
             resources:
               limits:
                 memory: "512Mi"
                 cpu: "1000m"
               requests:
                 memory: "32Mi"
                 cpu: "10m"
             securityContext:
               readOnlyRootFilesystem: true
               runAsUser: 10001
   ---
   apiVersion: v1
   kind: Service
   metadata:
     name: loadtester
     namespace: test
     labels:
       app: loadtester
   spec:
     type: ClusterIP
     selector:
       app: loadtester
     ports:
       - name: http
         port: 80
         protocol: TCP
         targetPort: http

   kubectl apply -n test -f loadtester.yaml

Verify that both pods are running. Expected output: Each pod shows 2/2 because the ASM sidecar proxy runs alongside the application container.

   kubectl get pod -n test

   NAME                          READY   STATUS    RESTARTS   AGE
   loadtester-64df4846b9-nxhvv   2/2     Running   0          2m8s
   podinfo-6d845cc8fc-26xbq      2/2     Running   0          11m

Send a test request to confirm connectivity. A successful response confirms that the load tester can reach the podinfo service through the mesh.

   export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}')
   kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898

Step 4: Configure an HPA with ASM metrics

Define an HPA that scales the podinfo Deployment based on incoming request rate. When the average requests per second exceeds 10, the HPA adds replicas. When traffic drops, it removes them.

Create a file named hpa.yaml with the following content: About the PromQL query: sum(rate(istio_requests_total{...}[1m])) calculates the per-second request rate to the podinfo workload over a 1-minute window, summed across all instances. The label filters restrict the query to requests reported by the destination sidecar (reporter="destination") targeting the podinfo workload in the test namespace. How the annotation-to-metric mapping works: The kube-metrics-adapter uses a naming convention to connect HPA metric specs with PromQL queries:

You define a PromQL query in an annotation: metric-config.external.prometheus-query.prometheus/processed-requests-per-second.
In spec.metrics, you reference prometheus-query as the metric name and processed-requests-per-second as the query-name label.
The adapter matches the label value to the annotation suffix to find the corresponding query.

HPA field reference:

Field	Description
`annotations`	Embeds the PromQL query. The kube-metrics-adapter reads queries from annotations that follow the naming convention `metric-config.external.prometheus-query.prometheus/<query-name>`.
`maxReplicas` / `minReplicas`	The replica range. The HPA scales between 1 and 10 replicas.
`scaleTargetRef`	The Deployment that the HPA manages.
`metrics[].type: External`	Tells the HPA to read from the external metrics API rather than the resource metrics API. The `External` type is used because the metric comes from an external monitoring system (Prometheus) rather than from a Kubernetes object directly.
`metrics[].external.metric.selector.matchLabels.query-name`	Links the metric spec to the annotation-defined PromQL query by name (`processed-requests-per-second`).
`target.type: AverageValue`	The threshold is evaluated per pod. With `averageValue: "10"`, the HPA scales up when the average per-pod request rate exceeds 10 requests per second.

   apiVersion: autoscaling/v2beta2
   kind: HorizontalPodAutoscaler
   metadata:
     name: podinfo
     namespace: test
     annotations:
       metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
         sum(
             rate(
                 istio_requests_total{
                   destination_workload="podinfo",
                   destination_workload_namespace="test",
                   reporter="destination"
                 }[1m]
             )
         )
   spec:
     maxReplicas: 10
     minReplicas: 1
     scaleTargetRef:
       apiVersion: apps/v1
       kind: Deployment
       name: podinfo
     metrics:
       - type: External
         external:
           metric:
             name: prometheus-query
             selector:
               matchLabels:
                 query-name: processed-requests-per-second
           target:
             type: AverageValue
             averageValue: "10"

Apply the HPA:
```
   kubectl apply -f hpa.yaml
```

Verify that the external metrics API now lists the registered metric. Expected output: The prometheus-query resource appears in the list, which confirms that the adapter is serving the HPA's metric query.

   kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

   {
     "kind": "APIResourceList",
     "apiVersion": "v1",
     "groupVersion": "external.metrics.k8s.io/v1beta1",
     "resources": [
       {
         "name": "prometheus-query",
         "singularName": "",
         "namespaced": true,
         "kind": "ExternalMetricValueList",
         "verbs": [
           "get"
         ]
       }
     ]
   }

Verify auto scaling

Generate sustained traffic and observe the HPA scaling the podinfo Deployment up and back down.

Open a shell in the load tester container and run a 5-minute load test: Inside the container, run:

   kubectl -n test exec -it ${loadtester} -c loadtester -- sh

   hey -z 5m -c 10 -q 5 http://podinfo.test:9898

In a separate terminal, watch the HPA status: Expected output (after approximately 1 minute): The TARGETS column shows the current metric value (8308m, or about 8.3 requests per second per pod) against the threshold (10). REPLICAS shows the scaled-up count.
Note
The adapter synchronizes metrics every 30 seconds by default. The container can be scaled only once every 3 to 5 minutes. This allows the HPA to reserve time for automatic scaling before the conflict strategy is executed.
```
   watch kubectl -n test get hpa/podinfo
```
```
   NAME      REFERENCE            TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
   podinfo   Deployment/podinfo   8308m/10 (avg)   1         10        6          124m
```
After the load test finishes, requests drop to zero. The HPA gradually reduces replicas back to 1 over several minutes.