Service Mesh (ASM) collects telemetry data from Container Service for Kubernetes (ACK) clusters without modifying application code. Based on four dimensions -- latency, traffic, errors, and saturation -- ASM produces Istio Standard Metrics for every managed service.
How it works
Feeding these metrics into a Kubernetes Horizontal Pod Autoscaler (HPA) lets you scale workloads based on real traffic patterns rather than CPU or memory usage alone. This guide walks through connecting ASM metrics to Prometheus, deploying a custom metrics adapter, and configuring an HPA that scales pods when requests per second exceed a threshold.
Kubernetes supports two autoscaling mechanisms:
Cluster Autoscaler (CA) -- adjusts the number of nodes in a cluster.
Horizontal Pod Autoscaler (HPA) -- adjusts the number of pod replicas for a workload.
By default, HPAs query core metrics such as CPU utilization and memory usage through the resource metrics API. To scale on application-specific metrics, you need the custom metrics API. The Kubernetes aggregation layer allows third-party adapters to register as API extensions, bridging external metric sources (like Prometheus) into the HPA control loop.
The following diagram illustrates the data flow:
The data flows through four stages:
ASM sidecar proxies emit Istio standard metrics (for example,
istio_requests_total).Prometheus scrapes and stores these metrics.
The kube-metrics-adapter queries Prometheus and exposes the results through the Kubernetes external metrics API.
The HPA periodically reads these external metrics and adjusts the replica count.
Prerequisites
Before you begin, make sure you have:
An ACK managed cluster -- see Create an ACK managed cluster
An ASM instance -- see Create an ASM instance
A Prometheus instance and a Grafana instance deployed in the ACK cluster -- see Use open source Prometheus to monitor an ACK cluster
Prometheus monitoring enabled for the ASM instance -- see Monitor ASM instances by using a self-managed Prometheus instance
Step 1: Enable Prometheus monitoring for ASM
Configure ASM to export metrics to Prometheus. For instructions, see Collect metrics to Managed Service for Prometheus.
Step 2: Deploy the custom metrics adapter
The kube-metrics-adapter bridges Prometheus metrics into the Kubernetes external metrics API, making them available to HPAs.
Install the adapter using Helm 3. See the kube-metrics-adapter chart on GitHub for chart details. The
prometheus.urlparameter points to the Prometheus instance that scrapes ASM metrics.helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter \ --set prometheus.url=http://prometheus.istio-system.svc:9090Verify the installation. Check that the
autoscaling/v2betaAPI is registered: Expected output: Confirm the adapter pod is running: Expected output: Query the external metrics API to confirm the endpoint is available: Expected output: Theresourcesarray is empty because no HPA has registered a metric query yet. It populates after you configure an HPA in Step 4.kubectl api-versions | grep "autoscaling/v2beta"autoscaling/v2betakubectl get po -n kube-system | grep metrics-adapterasm-custom-metrics-kube-metrics-adapter-85c6d5d865-2**** 1/1 Running 0 19skubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [] }
Step 3: Deploy a sample application
To demonstrate auto scaling, deploy a sample application (podinfo) and a load testing tool in the same namespace.
Create a namespace named
test. See Manage namespaces and resource quotas.Enable automatic sidecar proxy injection for the
testnamespace so that ASM can collect metrics. See Enable automatic sidecar proxy injection.Deploy the podinfo application. Create a file named
podinfo.yamlwith the following content: Apply the manifest:kubectl apply -n test -f podinfo.yamlDeploy the load testing tool. Create a file named
loadtester.yamlwith the following content: Apply the manifest:kubectl apply -n test -f loadtester.yamlVerify that both pods are running. Expected output: Each pod shows
2/2because the ASM sidecar proxy runs alongside the application container.kubectl get pod -n testNAME READY STATUS RESTARTS AGE loadtester-64df4846b9-nxhvv 2/2 Running 0 2m8s podinfo-6d845cc8fc-26xbq 2/2 Running 0 11mSend a test request to confirm connectivity. A successful response confirms that the load tester can reach the podinfo service through the mesh.
export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}') kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898
Step 4: Configure an HPA with ASM metrics
Define an HPA that scales the podinfo Deployment based on incoming request rate. When the average requests per second exceeds 10, the HPA adds replicas. When traffic drops, it removes them.
Create a file named
hpa.yamlwith the following content: About the PromQL query:sum(rate(istio_requests_total{...}[1m]))calculates the per-second request rate to the podinfo workload over a 1-minute window, summed across all instances. The label filters restrict the query to requests reported by the destination sidecar (reporter="destination") targeting thepodinfoworkload in thetestnamespace. How the annotation-to-metric mapping works: The kube-metrics-adapter uses a naming convention to connect HPA metric specs with PromQL queries:You define a PromQL query in an annotation:
metric-config.external.prometheus-query.prometheus/processed-requests-per-second.In
spec.metrics, you referenceprometheus-queryas the metric name andprocessed-requests-per-secondas thequery-namelabel.The adapter matches the label value to the annotation suffix to find the corresponding query.
HPA field reference:
Field Description annotationsEmbeds the PromQL query. The kube-metrics-adapter reads queries from annotations that follow the naming convention metric-config.external.prometheus-query.prometheus/<query-name>.maxReplicas/minReplicasThe replica range. The HPA scales between 1 and 10 replicas. scaleTargetRefThe Deployment that the HPA manages. metrics[].type: ExternalTells the HPA to read from the external metrics API rather than the resource metrics API. The Externaltype is used because the metric comes from an external monitoring system (Prometheus) rather than from a Kubernetes object directly.metrics[].external.metric.selector.matchLabels.query-nameLinks the metric spec to the annotation-defined PromQL query by name ( processed-requests-per-second).target.type: AverageValueThe threshold is evaluated per pod. With averageValue: "10", the HPA scales up when the average per-pod request rate exceeds 10 requests per second.apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: podinfo namespace: test annotations: metric-config.external.prometheus-query.prometheus/processed-requests-per-second: | sum( rate( istio_requests_total{ destination_workload="podinfo", destination_workload_namespace="test", reporter="destination" }[1m] ) ) spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: podinfo metrics: - type: External external: metric: name: prometheus-query selector: matchLabels: query-name: processed-requests-per-second target: type: AverageValue averageValue: "10"Apply the HPA:
kubectl apply -f hpa.yamlVerify that the external metrics API now lists the registered metric. Expected output: The
prometheus-queryresource appears in the list, which confirms that the adapter is serving the HPA's metric query.kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [ { "name": "prometheus-query", "singularName": "", "namespaced": true, "kind": "ExternalMetricValueList", "verbs": [ "get" ] } ] }
Verify auto scaling
Generate sustained traffic and observe the HPA scaling the podinfo Deployment up and back down.
Open a shell in the load tester container and run a 5-minute load test: Inside the container, run:
kubectl -n test exec -it ${loadtester} -c loadtester -- shhey -z 5m -c 10 -q 5 http://podinfo.test:9898In a separate terminal, watch the HPA status: Expected output (after approximately 1 minute): The
TARGETScolumn shows the current metric value (8308m, or about 8.3 requests per second per pod) against the threshold (10).REPLICASshows the scaled-up count.NoteThe adapter synchronizes metrics every 30 seconds by default. The container can be scaled only once every 3 to 5 minutes. This allows the HPA to reserve time for automatic scaling before the conflict strategy is executed.
watch kubectl -n test get hpa/podinfoNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 8308m/10 (avg) 1 10 6 124mAfter the load test finishes, requests drop to zero. The HPA gradually reduces replicas back to 1 over several minutes.