Service Mesh (ASM) collects telemetry data from ACK and ACS clusters non-intrusively, generating service metrics based on the four golden signals: latency, traffic, errors, and saturation. This topic explains how to configure a Horizontal Pod Autoscaler (HPA) that scales workloads based on these ASM metrics — going beyond CPU and memory to scale on real traffic patterns.
How it works
ASM exposes service metrics (such as requests per second) to Prometheus. A custom metrics adapter — kube-metrics-adapter — registers these Prometheus metrics with the Kubernetes aggregation layer, making them available to HPAs via the custom metrics API.
The end-to-end flow:
-
ASM collects request metrics and writes them to Prometheus.
-
kube-metrics-adapter queries Prometheus and registers the metrics as external metrics in Kubernetes.
-
The HPA polls the external metrics API every 30 seconds and adjusts the replica count when the metric value crosses the threshold.
For a full list of ASM-generated metrics, see Istio Standard Metrics.
Prerequisites
Before you begin, ensure that you have:
-
An ACK cluster or ACS cluster. See Create an ACK managed cluster or Create an ACS cluster.
-
An ASM instance. See Create an ASM instance.
-
A Prometheus instance and a Grafana instance deployed in the clusters. See Use open source Prometheus to monitor an ACK cluster.
-
A Prometheus instance configured to monitor the ASM instance. See Monitor ASM instances by using a self-managed Prometheus instance.
Step 1: Enable Prometheus monitoring for the ASM instance
Follow the instructions in Collect metrics to Managed Service for Prometheus to enable Prometheus scraping for your ASM instance.
Step 2: Deploy the custom metrics adapter
The custom metrics adapter (kube-metrics-adapter) bridges Prometheus metrics and the Kubernetes external metrics API, so HPAs can query ASM metrics directly.
-
Install kube-metrics-adapter into the
kube-systemnamespace using Helm 3. Setprometheus.urlto the in-cluster address of your Prometheus instance. For the chart source, see kube-metrics-adapter.Parameter Description asm-custom-metricsHelm release name prometheus.urlIn-cluster address of the Prometheus instance that scrapes ASM metrics helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter \ --set prometheus.url=http://prometheus.istio-system.svc:9090 -
Verify that the adapter is running:
-
Check that the
autoscaling/v2betaAPI group is registered:kubectl api-versions | grep "autoscaling/v2beta"Expected output:
autoscaling/v2beta -
Check that the adapter pod is running:
kubectl get po -n kube-system | grep metrics-adapterExpected output:
asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2**** 1/1 Running 0 19s -
Check that the external metrics API is available (no metrics registered yet):
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .Expected output:
{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [] }
-
Step 3: Deploy a sample application
This step deploys a podinfo application and a load testing service in the test namespace, so you can later trigger and observe auto scaling.
-
Create the
testnamespace. See Manage namespaces and resource quotas. -
Enable automatic sidecar proxy injection for the
testnamespace. See Enable automatic sidecar proxy injection. -
Deploy the podinfo application. Create a file named
podinfo.yamlwith the following content, then apply it.apiVersion: apps/v1 kind: Deployment metadata: name: podinfo namespace: test labels: app: podinfo spec: minReadySeconds: 5 strategy: rollingUpdate: maxUnavailable: 0 type: RollingUpdate selector: matchLabels: app: podinfo template: metadata: annotations: prometheus.io/scrape: "true" labels: app: podinfo spec: containers: - name: podinfod image: stefanprodan/podinfo:latest imagePullPolicy: IfNotPresent ports: - containerPort: 9898 name: http protocol: TCP command: - ./podinfo - --port=9898 - --level=info livenessProbe: exec: command: - podcli - check - http - localhost:9898/healthz initialDelaySeconds: 5 timeoutSeconds: 5 readinessProbe: exec: command: - podcli - check - http - localhost:9898/readyz initialDelaySeconds: 5 timeoutSeconds: 5 resources: limits: cpu: 2000m memory: 512Mi requests: cpu: 100m memory: 64Mi --- apiVersion: v1 kind: Service metadata: name: podinfo namespace: test labels: app: podinfo spec: type: ClusterIP ports: - name: http port: 9898 targetPort: 9898 protocol: TCP selector: app: podinfokubectl apply -n test -f podinfo.yaml -
Deploy the load testing service. Create a file named
loadtester.yamlwith the following content, then apply it.apiVersion: apps/v1 kind: Deployment metadata: name: loadtester namespace: test labels: app: loadtester spec: selector: matchLabels: app: loadtester template: metadata: labels: app: loadtester annotations: prometheus.io/scrape: "true" spec: containers: - name: loadtester image: weaveworks/flagger-loadtester:0.18.0 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 8080 command: - ./loadtester - -port=8080 - -log-level=info - -timeout=1h livenessProbe: exec: command: - wget - --quiet - --tries=1 - --timeout=4 - --spider - http://localhost:8080/healthz timeoutSeconds: 5 readinessProbe: exec: command: - wget - --quiet - --tries=1 - --timeout=4 - --spider - http://localhost:8080/healthz timeoutSeconds: 5 resources: limits: memory: "512Mi" cpu: "1000m" requests: memory: "32Mi" cpu: "10m" securityContext: readOnlyRootFilesystem: true runAsUser: 10001 --- apiVersion: v1 kind: Service metadata: name: loadtester namespace: test labels: app: loadtester spec: type: ClusterIP selector: app: loadtester ports: - name: http port: 80 protocol: TCP targetPort: httpkubectl apply -n test -f loadtester.yaml -
Verify that both workloads are running:
kubectl get pod -n testExpected output (both pods show
2/2 Running, indicating the app container and the Istio sidecar are both ready):NAME READY STATUS RESTARTS AGE loadtester-64df4846b9-nxhvv 2/2 Running 0 2m8s podinfo-6d845cc8fc-26xbq 2/2 Running 0 11m -
Send a short burst of traffic to confirm the setup is working end-to-end:
export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}') kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898A successful response from
heyconfirms that the podinfo service is reachable through the mesh.
Step 4: Configure an HPA using ASM metrics
Define an HPA that scales the podinfo deployment based on the number of incoming requests per second, as measured by the istio_requests_total metric in Prometheus.
The HPA uses two Kubernetes constructs together:
-
An annotation that embeds the PromQL query and gives it a name (
processed-requests-per-second). -
A metric reference in
spec.metricsthat points to the named query and sets the scale threshold.
Create a file named hpa.yaml with the following content, then apply it:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
namespace: test
annotations:
# The annotation key format is:
# metric-config.external.prometheus-query.prometheus/<query-name>
# The query-name must match the value of the matchLabels selector below.
metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
sum(
rate(
istio_requests_total{
destination_workload="podinfo",
destination_workload_namespace="test",
reporter="destination"
}[1m]
)
)
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
metrics:
- type: External
external:
metric:
name: prometheus-query
selector:
matchLabels:
query-name: processed-requests-per-second # matches the annotation key suffix above
target:
type: AverageValue
averageValue: "10" # scale out when average RPS per replica exceeds 10kubectl apply -f hpa.yaml
Key fields explained:
| Field | Value | Description |
|---|---|---|
metric-config.external.prometheus-query.prometheus/<query-name> annotation |
PromQL expression | Defines the query that kube-metrics-adapter runs against Prometheus. The <query-name> suffix must match the query-name label in spec.metrics. |
query-name label |
processed-requests-per-second |
Links the annotation (the PromQL query) to the metric reference in spec.metrics. |
averageValue |
"10" |
The HPA scales out when the average number of requests per second per replica exceeds 10. |
minReplicas / maxReplicas |
1 / 10 |
Replica count bounds. |
After applying the HPA, verify that the external metric is now registered:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
Expected output:
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "external.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "prometheus-query",
"singularName": "",
"namespaced": true,
"kind": "ExternalMetricValueList",
"verbs": [
"get"
]
}
]
}
The prometheus-query entry in resources confirms that kube-metrics-adapter has registered the metric and the HPA is active.
Verify auto scaling
-
Open a terminal and start a sustained load against podinfo (5 minutes, 10 concurrent users, 5 requests/second each):
kubectl -n test exec -it ${loadtester} -c loadtester -- sh ~ $ hey -z 5m -c 10 -q 5 http://podinfo.test:9898 -
In a separate terminal, watch the HPA scale up:
Metrics are synchronized every 30 seconds by default. The HPA also enforces a cooldown of 3–5 minutes between scale events to prevent thrashing.
watch kubectl -n test get hpa/podinfoAs load increases above the threshold, the HPA scales out:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 8308m/10 (avg) 1 10 6 124mThe value
8308mis Kubernetes milli-unit notation for 8.308 requests per second. Because the average RPS per replica (8.3) is below the threshold of 10, the HPA has stabilized at 6 replicas. If the load were higher, the HPA would continue scaling up toward the 10-replica maximum. -
After the load test finishes, the request rate drops to zero. The HPA begins scaling down, and within a few minutes the replica count returns to 1.