Scale ACK Workloads with ASM Metrics and HPA for Optimal Performance - Container Service for Kubernetes

Service Mesh (ASM) collects telemetry data from ACK and ACS clusters non-intrusively, generating service metrics based on the four golden signals: latency, traffic, errors, and saturation. This topic explains how to configure a Horizontal Pod Autoscaler (HPA) that scales workloads based on these ASM metrics — going beyond CPU and memory to scale on real traffic patterns.

How it works

ASM exposes service metrics (such as requests per second) to Prometheus. A custom metrics adapter — kube-metrics-adapter — registers these Prometheus metrics with the Kubernetes aggregation layer, making them available to HPAs via the custom metrics API.

The end-to-end flow:

ASM collects request metrics and writes them to Prometheus.
kube-metrics-adapter queries Prometheus and registers the metrics as external metrics in Kubernetes.
The HPA polls the external metrics API every 30 seconds and adjusts the replica count when the metric value crosses the threshold.

For a full list of ASM-generated metrics, see Istio Standard Metrics.

Prerequisites

Before you begin, ensure that you have:

An ACK cluster or ACS cluster. See Create an ACK managed cluster or Create an ACS cluster.
An ASM instance. See Create an ASM instance.
A Prometheus instance and a Grafana instance deployed in the clusters. See Use open source Prometheus to monitor an ACK cluster.
A Prometheus instance configured to monitor the ASM instance. See Monitor ASM instances by using a self-managed Prometheus instance.

Step 1: Enable Prometheus monitoring for the ASM instance

Follow the instructions in Collect metrics to Managed Service for Prometheus to enable Prometheus scraping for your ASM instance.

Step 2: Deploy the custom metrics adapter

The custom metrics adapter (kube-metrics-adapter) bridges Prometheus metrics and the Kubernetes external metrics API, so HPAs can query ASM metrics directly.

Install kube-metrics-adapter into the kube-system namespace using Helm 3. Set prometheus.url to the in-cluster address of your Prometheus instance. For the chart source, see kube-metrics-adapter.

Parameter	Description
`asm-custom-metrics`	Helm release name
`prometheus.url`	In-cluster address of the Prometheus instance that scrapes ASM metrics

helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter \
  --set prometheus.url=http://prometheus.istio-system.svc:9090

Verify that the adapter is running:

Check that the autoscaling/v2beta API group is registered:
```
kubectl api-versions | grep "autoscaling/v2beta"
```
Expected output:
```
autoscaling/v2beta
```

Check that the adapter pod is running:

kubectl get po -n kube-system | grep metrics-adapter

Expected output:

asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2****   1/1   Running   0   19s

Check that the external metrics API is available (no metrics registered yet):

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

Expected output:

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": []
}

Step 3: Deploy a sample application

This step deploys a podinfo application and a load testing service in the test namespace, so you can later trigger and observe auto scaling.

Create the test namespace. See Manage namespaces and resource quotas.
Enable automatic sidecar proxy injection for the test namespace. See Enable automatic sidecar proxy injection.

Deploy the podinfo application. Create a file named podinfo.yaml with the following content, then apply it.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: podinfo
  namespace: test
  labels:
    app: podinfo
spec:
  minReadySeconds: 5
  strategy:
    rollingUpdate:
      maxUnavailable: 0
    type: RollingUpdate
  selector:
    matchLabels:
      app: podinfo
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
      labels:
        app: podinfo
    spec:
      containers:
      - name: podinfod
        image: stefanprodan/podinfo:latest
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9898
          name: http
          protocol: TCP
        command:
        - ./podinfo
        - --port=9898
        - --level=info
        livenessProbe:
          exec:
            command:
            - podcli
            - check
            - http
            - localhost:9898/healthz
          initialDelaySeconds: 5
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command:
            - podcli
            - check
            - http
            - localhost:9898/readyz
          initialDelaySeconds: 5
          timeoutSeconds: 5
        resources:
          limits:
            cpu: 2000m
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 64Mi
---
apiVersion: v1
kind: Service
metadata:
  name: podinfo
  namespace: test
  labels:
    app: podinfo
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 9898
      targetPort: 9898
      protocol: TCP
  selector:
    app: podinfo

kubectl apply -n test -f podinfo.yaml

Deploy the load testing service. Create a file named loadtester.yaml with the following content, then apply it.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: loadtester
  namespace: test
  labels:
    app: loadtester
spec:
  selector:
    matchLabels:
      app: loadtester
  template:
    metadata:
      labels:
        app: loadtester
      annotations:
        prometheus.io/scrape: "true"
    spec:
      containers:
        - name: loadtester
          image: weaveworks/flagger-loadtester:0.18.0
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
          command:
            - ./loadtester
            - -port=8080
            - -log-level=info
            - -timeout=1h
          livenessProbe:
            exec:
              command:
                - wget
                - --quiet
                - --tries=1
                - --timeout=4
                - --spider
                - http://localhost:8080/healthz
            timeoutSeconds: 5
          readinessProbe:
            exec:
              command:
                - wget
                - --quiet
                - --tries=1
                - --timeout=4
                - --spider
                - http://localhost:8080/healthz
            timeoutSeconds: 5
          resources:
            limits:
              memory: "512Mi"
              cpu: "1000m"
            requests:
              memory: "32Mi"
              cpu: "10m"
          securityContext:
            readOnlyRootFilesystem: true
            runAsUser: 10001
---
apiVersion: v1
kind: Service
metadata:
  name: loadtester
  namespace: test
  labels:
    app: loadtester
spec:
  type: ClusterIP
  selector:
    app: loadtester
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: http

kubectl apply -n test -f loadtester.yaml

Verify that both workloads are running:

kubectl get pod -n test

Expected output (both pods show 2/2 Running, indicating the app container and the Istio sidecar are both ready):

NAME                          READY   STATUS    RESTARTS   AGE
loadtester-64df4846b9-nxhvv   2/2     Running   0          2m8s
podinfo-6d845cc8fc-26xbq      2/2     Running   0          11m

Send a short burst of traffic to confirm the setup is working end-to-end:

export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}')
kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898

A successful response from hey confirms that the podinfo service is reachable through the mesh.

Step 4: Configure an HPA using ASM metrics

Define an HPA that scales the podinfo deployment based on the number of incoming requests per second, as measured by the istio_requests_total metric in Prometheus.

The HPA uses two Kubernetes constructs together:

An annotation that embeds the PromQL query and gives it a name (processed-requests-per-second).
A metric reference in spec.metrics that points to the named query and sets the scale threshold.

Create a file named hpa.yaml with the following content, then apply it:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
  namespace: test
  annotations:
    # The annotation key format is:
    # metric-config.external.prometheus-query.prometheus/<query-name>
    # The query-name must match the value of the matchLabels selector below.
    metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
      sum(
          rate(
              istio_requests_total{
                destination_workload="podinfo",
                destination_workload_namespace="test",
                reporter="destination"
              }[1m]
          )
      )
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  metrics:
    - type: External
      external:
        metric:
          name: prometheus-query
          selector:
            matchLabels:
              query-name: processed-requests-per-second  # matches the annotation key suffix above
        target:
          type: AverageValue
          averageValue: "10"  # scale out when average RPS per replica exceeds 10

kubectl apply -f hpa.yaml

Key fields explained:

Field	Value	Description
`metric-config.external.prometheus-query.prometheus/<query-name>` annotation	PromQL expression	Defines the query that kube-metrics-adapter runs against Prometheus. The `<query-name>` suffix must match the `query-name` label in `spec.metrics`.
`query-name` label	`processed-requests-per-second`	Links the annotation (the PromQL query) to the metric reference in `spec.metrics`.
`averageValue`	`"10"`	The HPA scales out when the average number of requests per second per replica exceeds 10.
`minReplicas` / `maxReplicas`	`1` / `10`	Replica count bounds.

After applying the HPA, verify that the external metric is now registered:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

Expected output:

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "prometheus-query",
      "singularName": "",
      "namespaced": true,
      "kind": "ExternalMetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

The prometheus-query entry in resources confirms that kube-metrics-adapter has registered the metric and the HPA is active.

Verify auto scaling

Open a terminal and start a sustained load against podinfo (5 minutes, 10 concurrent users, 5 requests/second each):
```
kubectl -n test exec -it ${loadtester} -c loadtester -- sh
~ $ hey -z 5m -c 10 -q 5 http://podinfo.test:9898
```
In a separate terminal, watch the HPA scale up:

Metrics are synchronized every 30 seconds by default. The HPA also enforces a cooldown of 3–5 minutes between scale events to prevent thrashing.
```
watch kubectl -n test get hpa/podinfo
```
As load increases above the threshold, the HPA scales out:
```
NAME      REFERENCE            TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
podinfo   Deployment/podinfo   8308m/10 (avg)   1         10        6          124m
```
The value 8308m is Kubernetes milli-unit notation for 8.308 requests per second. Because the average RPS per replica (8.3) is below the threshold of 10, the HPA has stabilized at 6 replicas. If the load were higher, the HPA would continue scaling up toward the 10-replica maximum.
After the load test finishes, the request rate drops to zero. The HPA begins scaling down, and within a few minutes the replica count returns to 1.