Service Mesh (ASM) generates telemetry data for every request flowing through your service mesh, without any changes to your application code. Based on the four golden signals of monitoring (latency, traffic, errors, and saturation), ASM produces a set of metrics for the services it manages. Feed these metrics into a Kubernetes Horizontal Pod Autoscaler (HPA) to scale workloads based on real traffic signals such as request rate, rather than relying solely on CPU or memory utilization.
This walkthrough covers the end-to-end setup: collecting ASM metrics in Prometheus, deploying a custom metrics adapter, and configuring an HPA that scales a sample workload when the average request rate exceeds a threshold.
How it works
Kubernetes supports two dimensions of automatic scaling:
Cluster Autoscaler (CA) -- adds or removes nodes based on pending pod scheduling demands.
Horizontal Pod Autoscaler (HPA) -- adjusts the replica count of a Deployment or StatefulSet.
The aggregation layer in Kubernetes allows third-party applications to extend the Kubernetes API by registering as API add-on components. These add-on components can implement the Custom Metrics API and allow the HPA to access any metric. The HPA retrieves metrics from three Kubernetes APIs:
| API | Purpose | Example metrics |
|---|---|---|
Resource Metrics API (metrics.k8s.io) | Core resource metrics | CPU, memory |
Custom Metrics API (custom.metrics.k8s.io) | Application-specific metrics tied to Kubernetes objects | Pods, Ingresses |
External Metrics API (external.metrics.k8s.io) | Metrics from systems outside the cluster | Prometheus queries, cloud monitoring |
ASM metrics originate from Prometheus, which sits outside the Kubernetes object model. This makes the External Metrics API the right choice. The kube-metrics-adapter bridges Prometheus and the External Metrics API so the HPA can query ASM metrics directly.
Available ASM metrics for autoscaling
ASM generates Istio standard metrics for HTTP, HTTP/2, and gRPC traffic. The most commonly used metrics for autoscaling are:
| Metric | Type | Use case | Example PromQL |
|---|---|---|---|
istio_requests_total | Counter | Scale on request rate | sum(rate(istio_requests_total{destination_workload="<workload>"}[1m])) |
istio_request_duration_milliseconds | Distribution | Scale on P99 latency | histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{destination_workload="<workload>"}[1m])) by (le)) |
istio_requests_total (filtered) | Counter | Scale on error rate | sum(rate(istio_requests_total{destination_workload="<workload>", response_code=~"5.."}[1m])) |
Key labels used in these queries:
destination_workload-- the name of the target workloaddestination_workload_namespace-- the namespace of the target workloadreporter-- set todestinationfor server-side reporting,sourcefor client-side
This tutorial uses istio_requests_total to scale based on request rate. Replace the PromQL query with one of the alternatives above to scale on latency or error rate instead.
Prerequisites
Before you begin, make sure that you have:
An ACK cluster or ACS cluster
An ASM instance
A Prometheus and Grafana instance deployed in the cluster
Prometheus integrated for mesh monitoring
Step 1: Enable Prometheus metric collection
Enable ASM to export metrics to your Prometheus instance. For detailed instructions, see Collect monitoring metrics in Managed Service for Prometheus.
Step 2: Deploy the custom metrics adapter
Install kube-metrics-adapter, which registers itself as a provider for the External Metrics API and translates HPA metric requests into Prometheus queries.
Install the adapter with Helm. Replace
http://prometheus.istio-system.svc:9090with the URL of your Prometheus server if it runs in a different namespace or uses a different service name.helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter \ --set prometheus.url=http://prometheus.istio-system.svc:9090Verify the installation.
Confirm that the
autoscaling/v2API is available.kubectl api-versions | grep "autoscaling/v2"Expected output:
autoscaling/v2Check that the adapter pod is running.
kubectl get po -n kube-system | grep metrics-adapterExpected output:
asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2**** 1/1 Running 0 19sConfirm that the External Metrics API endpoint is registered.
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .Expected output:
{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [] }The
resourcesarray is empty because no HPA has registered a metric query yet. It populates after you deploy an HPA in Step 4.
Step 3: Deploy the sample application
Create the namespace
Create a namespace called test and enable automatic sidecar injection. For more information, see Manage namespaces and quotas and Enable automatic injection.
Deploy the podinfo application
Create a file named
podinfo.yamlwith the following content.Apply the manifest.
kubectl apply -n test -f podinfo.yaml
Deploy the load tester
Create a file named
loadtester.yamlwith the following content.Apply the manifest.
kubectl apply -n test -f loadtester.yaml
Verify the deployment
Check that both pods are running with sidecar injection (2/2 containers). Expected output:
kubectl get pod -n testNAME READY STATUS RESTARTS AGE loadtester-64df4846b9-nxhvv 2/2 Running 0 2m8s podinfo-6d845cc8fc-26xbq 2/2 Running 0 11mSend a test request from the load tester to confirm connectivity. A successful response confirms that the sample application is reachable through the mesh.
export loadtester=$(kubectl -n test get pod -l "app=loadtester" \ -o jsonpath='{.items[0].metadata.name}') kubectl -n test exec -it ${loadtester} -c loadtester -- \ hey -z 5s -c 10 -q 2 http://podinfo.test:9898
Step 4: Configure the HPA with ASM metrics
Define an HPA that scales the podinfo Deployment based on the request rate. The configuration maps three pieces together:
| What you configure | Where it goes | Purpose |
|---|---|---|
| PromQL query | HPA annotation | Defines which Prometheus metric to query |
| Metric name + label | spec.metrics[].external | Lookup key the HPA uses to reference the annotation |
| Target threshold | spec.metrics[].external.target | Request rate at which scale-out begins |
This example uses HPA API versionautoscaling/v2, which requires Kubernetes 1.23 or later. Thev2beta2version was removed in Kubernetes 1.26.
Create a file named
hpa.yamlwith the following content.Key fields explained:
Field Description metric-config.external.prometheus-query.prometheus/processed-requests-per-secondAnnotation that embeds the PromQL query. The suffix ( processed-requests-per-second) is a user-defined name that links the annotation to the HPA metric selector.metrics[].external.metric.nameMust be prometheus-query-- the fixed identifier that kube-metrics-adapter recognizes.metrics[].external.metric.selector.matchLabels.query-nameMust match the suffix of the annotation key ( processed-requests-per-second).target.averageValueThe threshold per pod. The HPA divides the total query result by the current replica count and compares it to this value. apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: podinfo namespace: test annotations: # Annotation format: metric-config.external.prometheus-query.prometheus/<query-name> # <query-name> must match the query-name label in spec.metrics below. # The value is the PromQL query that kube-metrics-adapter sends to Prometheus. metric-config.external.prometheus-query.prometheus/processed-requests-per-second: | sum( rate( istio_requests_total{ destination_workload="podinfo", destination_workload_namespace="test", reporter="destination" }[1m] ) ) spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: podinfo metrics: - type: External # Use External because Prometheus metrics are not Kubernetes objects external: metric: name: prometheus-query # Fixed metric name for kube-metrics-adapter selector: matchLabels: query-name: processed-requests-per-second # Must match the annotation key above target: type: AverageValue averageValue: "10" # Scale out when the average request rate exceeds 10 req/s per podApply the HPA.
kubectl apply -f hpa.yamlVerify that the External Metrics API now lists the registered metric. Expected output: The
prometheus-queryresource confirms that the HPA registered its metric with the adapter.kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [ { "name": "prometheus-query", "singularName": "", "namespaced": true, "kind": "ExternalMetricValueList", "verbs": [ "get" ] } ] }
Verify automatic scaling
Generate sustained load from the load tester pod. This sends 50 requests per second (10 concurrent workers at 5 QPS each) for 5 minutes.
# Get the load tester pod name (skip this if $loadtester is already set from Step 3) export loadtester=$(kubectl -n test get pod -l "app=loadtester" \ -o jsonpath='{.items[0].metadata.name}') kubectl -n test exec -it ${loadtester} -c loadtester -- \ hey -z 5m -c 10 -q 5 http://podinfo.test:9898Monitor the HPA in a separate terminal. Expected output (after approximately one minute):
NoteThe HPA syncs metrics every 30 seconds. A scaling decision takes effect only if no scaling event occurred in the previous 3--5 minutes. This cooldown prevents rapid, conflicting scaling decisions and gives the Cluster Autoscaler time to provision new nodes if needed.
watch kubectl -n test get hpa/podinfoNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 8308m/10 (avg) 1 10 6 124mObserve scale-down after the load test ends. When the load test completes, the request rate drops to zero. The HPA gradually scales down the Deployment until the replica count returns to 1.
What's next
To scale on latency or error rate instead of request rate, replace the PromQL query in the HPA annotation with one of the alternatives from the Available ASM metrics for autoscaling table.
To tune scale-up and scale-down behavior (stabilization windows, scaling policies), add a
behaviorfield to the HPA spec. See Configurable scaling behavior in the Kubernetes documentation.For a full list of Istio metrics and their labels, see Istio standard metrics.