When application traffic fluctuates, manually adjusting pod replicas is slow and error-prone. Mixerless Telemetry in Alibaba Cloud Service Mesh (ASM) collects telemetry data from Istio sidecar proxies without code changes. By feeding these metrics into a Horizontal Pod Autoscaler (HPA), pods scale automatically based on real-time traffic signals -- request rate, average latency, or P95 latency.
How it works
Autoscaling with Mixerless Telemetry follows this data flow:
Istio Sidecar Proxy --> Prometheus --> Metrics Adapter --> HPA --> Pod Scale Out/In
(collects metrics) (stores & (bridges K8s (evaluates (adjusts
queries) Metrics API) thresholds) replicas)Istio sidecar proxies generate telemetry metrics (
istio_requests_total,istio_request_duration_milliseconds_*) for every request.Prometheus scrapes and stores these metrics.
A metrics adapter exposes Prometheus query results through the Kubernetes External Metrics API.
The HPA evaluates the metrics against your defined thresholds and adjusts the pod replica count.
Prerequisites
Before you begin, ensure that you have:
Application metrics collected by Prometheus. For more information, see Use Mixerless Telemetry to observe ASM instances
kubectl configured to connect to your cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster
Deploy the metrics adapter and load tester
Before creating HPAs, deploy a metrics adapter to bridge Prometheus and the Kubernetes Metrics API, and a load tester to simulate traffic for validation.
Deploy the metrics adapter
Run the following Helm command to install the metrics adapter in the kube-system namespace:
helm --kubeconfig <kubeconfig-path> -n kube-system install asm-custom-metrics \
$KUBE_METRICS_ADAPTER_SRC/deploy/charts/kube-metrics-adapter \
--set prometheus.url=http://prometheus.istio-system.svc:9090Replace <kubeconfig-path> with the path to your kubeconfig file.
For the complete deployment script, see demo_hpa.sh on GitHub.
Verify the metrics adapter
Check that the metrics adapter pod is running: Expected output:
kubectl --kubeconfig <kubeconfig-path> get po -n kube-system | grep metrics-adapterasm-custom-metrics-kube-metrics-adapter-6fb4949988-ht8pv 1/1 Running 0 30sConfirm that the
autoscaling/v2beta2API is available: Expected output:kubectl --kubeconfig <kubeconfig-path> api-versions | grep "autoscaling/v2beta"autoscaling/v2beta1 autoscaling/v2beta2Verify that the External Metrics API endpoint is accessible: Expected output: The
resourcesarray is empty because no HPA has registered external metrics yet.kubectl --kubeconfig <kubeconfig-path> get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [] }
Deploy the Flagger load tester
Download the Flagger YAML files from the Flagger GitHub repository.
Deploy the load tester in the
testnamespace: Replace<flagger-path>with the local path to the downloaded Flagger repository.kubectl --kubeconfig <kubeconfig-path> apply -f <flagger-path>/kustomize/tester/deployment.yaml -n test kubectl --kubeconfig <kubeconfig-path> apply -f <flagger-path>/kustomize/tester/service.yaml -n test
Create HPAs based on Istio metrics
This section shows three HPA configurations, each targeting a different Istio metric. Choose the metric that best matches your scaling needs, or deploy multiple HPAs for the same workload.
Metrics reference
The following table summarizes the Istio metrics available for HPA scaling:
| Metric | Type | Typical use |
|---|---|---|
istio_requests_total | COUNTER | Scale based on requests per second |
istio_request_duration_milliseconds_sum / _count | Used together to compute average latency | Scale based on average response time |
istio_request_duration_milliseconds_bucket | HISTOGRAM | Scale based on percentile latency (P95, P99) |
Common PromQL labels:
| Label | Description |
|---|---|
destination_workload_namespace | Namespace of the destination workload |
destination_canonical_service | Canonical name of the destination service |
reporter | destination for server-side metrics, source for client-side metrics |
Option 1: Scale based on request count
This HPA scales the podinfo deployment when the per-second request rate exceeds a threshold.
Create a file named
requests_total_hpa.yaml:Key fields:
Field Description annotationsDefines the PromQL query. This query computes the per-second request rate for all services in the testnamespace, using server-side (reporter="destination") metrics over a 1-minute window.averageValue: "10"Triggers a scale-out when the average request rate reaches or exceeds 10 requests per second. maxReplicas/minReplicasPod count stays between 1 and 5. apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: podinfo-total namespace: test annotations: metric-config.external.prometheus-query.prometheus/processed-requests-per-second: | sum(rate(istio_requests_total{destination_workload_namespace="test",reporter="destination"}[1m])) spec: maxReplicas: 5 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: podinfo metrics: - type: External external: metric: name: prometheus-query selector: matchLabels: query-name: processed-requests-per-second target: type: AverageValue averageValue: "10"Deploy the HPA:
kubectl --kubeconfig <kubeconfig-path> apply -f resources_hpa/requests_total_hpa.yamlVerify that the HPA registered its external metric: Expected output: The
prometheus-queryresource now appears in the list, confirming the HPA is connected to the metrics adapter.kubectl --kubeconfig <kubeconfig-path> get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [ { "name": "prometheus-query", "singularName": "", "namespaced": true, "kind": "ExternalMetricValueList", "verbs": [ "get" ] } ] }
Option 2: Scale based on average latency
This HPA scales the podinfo deployment when the average request latency exceeds a threshold.
Create a file named podinfo-latency-avg.yaml, then deploy it with kubectl apply:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: podinfo-latency-avg
namespace: test
annotations:
metric-config.external.prometheus-query.prometheus/latency-average: |
sum(rate(istio_request_duration_milliseconds_sum{destination_workload_namespace="test",reporter="destination"}[1m]))
/sum(rate(istio_request_duration_milliseconds_count{destination_workload_namespace="test",reporter="destination"}[1m]))
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
metrics:
- type: External
external:
metric:
name: prometheus-query
selector:
matchLabels:
query-name: latency-average
target:
type: AverageValue
averageValue: "0.005"Key fields:
| Field | Description |
|---|---|
annotations | The PromQL query divides the total request duration by the request count to compute the average latency in seconds. |
averageValue: "0.005" | Triggers a scale-out when the average latency reaches or exceeds 0.005s. |
Option 3: Scale based on P95 latency
This HPA scales the podinfo deployment when the 95th percentile (P95) request latency exceeds a threshold.
Create a file named podinfo-p95.yaml, then deploy it with kubectl apply:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: podinfo-p95
namespace: test
annotations:
metric-config.external.prometheus-query.prometheus/p95-latency: |
histogram_quantile(0.95,sum(irate(istio_request_duration_milliseconds_bucket{destination_workload_namespace="test",destination_canonical_service="podinfo"}[5m]))by (le))
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
metrics:
- type: External
external:
metric:
name: prometheus-query
selector:
matchLabels:
query-name: p95-latency
target:
type: AverageValue
averageValue: "4"Key fields:
| Field | Description |
|---|---|
annotations | The PromQL query uses histogram_quantile(0.95, ...) to compute P95 latency from the histogram buckets. It uses irate with a 5-minute window for a smoother signal. |
averageValue: "4" | Triggers a scale-out when the P95 latency reaches or exceeds 4 ms. |
Verify autoscaling behavior
These steps validate the request-count-based HPA (podinfo-total). To test other HPAs, adjust the load parameters to trigger the corresponding metric threshold.
Send sustained traffic at 10 requests per second for 5 minutes:
Flag Description -z 5mSend requests for 5 minutes -c 2Use 2 concurrent connections -q 10Send 10 requests per second alias k="kubectl --kubeconfig $USER_CONFIG" loadtester=$(k -n test get pod -l "app=flagger-loadtester" -o jsonpath='{.items..metadata.name}') k -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 2 -q 10 http://podinfo:9898In a separate terminal, watch the HPA status: Expected output: The
REPLICAScolumn shows2-- the HPA scaled the deployment from 1 to 2 pods in response to the traffic load.watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-totalNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 10056m/10 (avg) 1 5 2 4m45sIncrease the traffic to 15 requests per second:
alias k="kubectl --kubeconfig $USER_CONFIG" loadtester=$(k -n test get pod -l "app=flagger-loadtester" -o jsonpath='{.items..metadata.name}') k -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 2 -q 15 http://podinfo:9898Watch the HPA status again: Expected output: The
REPLICAScolumn now shows3. As traffic increases, the HPA adds pods up to themaxReplicaslimit (5). When traffic drops, the HPA scales the deployment back down tominReplicas(1).watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-totalNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 10056m/10 (avg) 1 5 3 4m45s
What's next
Use Mixerless Telemetry to observe ASM instances -- Set up Prometheus metric collection for ASM.
Horizontal pod autoscaling -- Learn about advanced HPA features such as scaling behavior policies and stabilization windows.