All Products
Search
Document Center

Alibaba Cloud Service Mesh:Autoscale pods based on Mixerless Telemetry metrics

Last Updated:Mar 11, 2026

When application traffic fluctuates, manually adjusting pod replicas is slow and error-prone. Mixerless Telemetry in Alibaba Cloud Service Mesh (ASM) collects telemetry data from Istio sidecar proxies without code changes. By feeding these metrics into a Horizontal Pod Autoscaler (HPA), pods scale automatically based on real-time traffic signals -- request rate, average latency, or P95 latency.

How it works

Autoscaling with Mixerless Telemetry follows this data flow:

Istio Sidecar Proxy --> Prometheus --> Metrics Adapter --> HPA --> Pod Scale Out/In
  (collects metrics)     (stores &       (bridges K8s        (evaluates    (adjusts
                          queries)        Metrics API)        thresholds)   replicas)
  1. Istio sidecar proxies generate telemetry metrics (istio_requests_total, istio_request_duration_milliseconds_*) for every request.

  2. Prometheus scrapes and stores these metrics.

  3. A metrics adapter exposes Prometheus query results through the Kubernetes External Metrics API.

  4. The HPA evaluates the metrics against your defined thresholds and adjusts the pod replica count.

Prerequisites

Before you begin, ensure that you have:

Deploy the metrics adapter and load tester

Before creating HPAs, deploy a metrics adapter to bridge Prometheus and the Kubernetes Metrics API, and a load tester to simulate traffic for validation.

Deploy the metrics adapter

Run the following Helm command to install the metrics adapter in the kube-system namespace:

helm --kubeconfig <kubeconfig-path> -n kube-system install asm-custom-metrics \
  $KUBE_METRICS_ADAPTER_SRC/deploy/charts/kube-metrics-adapter \
  --set prometheus.url=http://prometheus.istio-system.svc:9090

Replace <kubeconfig-path> with the path to your kubeconfig file.

Note

For the complete deployment script, see demo_hpa.sh on GitHub.

Verify the metrics adapter

  1. Check that the metrics adapter pod is running: Expected output:

       kubectl --kubeconfig <kubeconfig-path> get po -n kube-system | grep metrics-adapter
       asm-custom-metrics-kube-metrics-adapter-6fb4949988-ht8pv   1/1     Running     0          30s
  2. Confirm that the autoscaling/v2beta2 API is available: Expected output:

       kubectl --kubeconfig <kubeconfig-path> api-versions | grep "autoscaling/v2beta"
       autoscaling/v2beta1
       autoscaling/v2beta2
  3. Verify that the External Metrics API endpoint is accessible: Expected output: The resources array is empty because no HPA has registered external metrics yet.

       kubectl --kubeconfig <kubeconfig-path> get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
       {
         "kind": "APIResourceList",
         "apiVersion": "v1",
         "groupVersion": "external.metrics.k8s.io/v1beta1",
         "resources": []
       }

Deploy the Flagger load tester

  1. Download the Flagger YAML files from the Flagger GitHub repository.

  2. Deploy the load tester in the test namespace: Replace <flagger-path> with the local path to the downloaded Flagger repository.

       kubectl --kubeconfig <kubeconfig-path> apply -f <flagger-path>/kustomize/tester/deployment.yaml -n test
       kubectl --kubeconfig <kubeconfig-path> apply -f <flagger-path>/kustomize/tester/service.yaml -n test

Create HPAs based on Istio metrics

This section shows three HPA configurations, each targeting a different Istio metric. Choose the metric that best matches your scaling needs, or deploy multiple HPAs for the same workload.

Metrics reference

The following table summarizes the Istio metrics available for HPA scaling:

MetricTypeTypical use
istio_requests_totalCOUNTERScale based on requests per second
istio_request_duration_milliseconds_sum / _countUsed together to compute average latencyScale based on average response time
istio_request_duration_milliseconds_bucketHISTOGRAMScale based on percentile latency (P95, P99)

Common PromQL labels:

LabelDescription
destination_workload_namespaceNamespace of the destination workload
destination_canonical_serviceCanonical name of the destination service
reporterdestination for server-side metrics, source for client-side metrics

Option 1: Scale based on request count

This HPA scales the podinfo deployment when the per-second request rate exceeds a threshold.

  1. Create a file named requests_total_hpa.yaml:

    Key fields:

    FieldDescription
    annotationsDefines the PromQL query. This query computes the per-second request rate for all services in the test namespace, using server-side (reporter="destination") metrics over a 1-minute window.
    averageValue: "10"Triggers a scale-out when the average request rate reaches or exceeds 10 requests per second.
    maxReplicas / minReplicasPod count stays between 1 and 5.
       apiVersion: autoscaling/v2beta2
       kind: HorizontalPodAutoscaler
       metadata:
         name: podinfo-total
         namespace: test
         annotations:
           metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
             sum(rate(istio_requests_total{destination_workload_namespace="test",reporter="destination"}[1m]))
       spec:
         maxReplicas: 5
         minReplicas: 1
         scaleTargetRef:
           apiVersion: apps/v1
           kind: Deployment
           name: podinfo
         metrics:
           - type: External
             external:
               metric:
                 name: prometheus-query
                 selector:
                   matchLabels:
                     query-name: processed-requests-per-second
               target:
                 type: AverageValue
                 averageValue: "10"
  2. Deploy the HPA:

       kubectl --kubeconfig <kubeconfig-path> apply -f resources_hpa/requests_total_hpa.yaml
  3. Verify that the HPA registered its external metric: Expected output: The prometheus-query resource now appears in the list, confirming the HPA is connected to the metrics adapter.

       kubectl --kubeconfig <kubeconfig-path> get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
       {
         "kind": "APIResourceList",
         "apiVersion": "v1",
         "groupVersion": "external.metrics.k8s.io/v1beta1",
         "resources": [
           {
             "name": "prometheus-query",
             "singularName": "",
             "namespaced": true,
             "kind": "ExternalMetricValueList",
             "verbs": [
               "get"
             ]
           }
         ]
       }

Option 2: Scale based on average latency

This HPA scales the podinfo deployment when the average request latency exceeds a threshold.

Create a file named podinfo-latency-avg.yaml, then deploy it with kubectl apply:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo-latency-avg
  namespace: test
  annotations:
    metric-config.external.prometheus-query.prometheus/latency-average: |
      sum(rate(istio_request_duration_milliseconds_sum{destination_workload_namespace="test",reporter="destination"}[1m]))
      /sum(rate(istio_request_duration_milliseconds_count{destination_workload_namespace="test",reporter="destination"}[1m]))
spec:
  maxReplicas: 5
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  metrics:
    - type: External
      external:
        metric:
          name: prometheus-query
          selector:
            matchLabels:
              query-name: latency-average
        target:
          type: AverageValue
          averageValue: "0.005"

Key fields:

FieldDescription
annotationsThe PromQL query divides the total request duration by the request count to compute the average latency in seconds.
averageValue: "0.005"Triggers a scale-out when the average latency reaches or exceeds 0.005s.

Option 3: Scale based on P95 latency

This HPA scales the podinfo deployment when the 95th percentile (P95) request latency exceeds a threshold.

Create a file named podinfo-p95.yaml, then deploy it with kubectl apply:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo-p95
  namespace: test
  annotations:
    metric-config.external.prometheus-query.prometheus/p95-latency: |
      histogram_quantile(0.95,sum(irate(istio_request_duration_milliseconds_bucket{destination_workload_namespace="test",destination_canonical_service="podinfo"}[5m]))by (le))
spec:
  maxReplicas: 5
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  metrics:
    - type: External
      external:
        metric:
          name: prometheus-query
          selector:
            matchLabels:
              query-name: p95-latency
        target:
          type: AverageValue
          averageValue: "4"

Key fields:

FieldDescription
annotationsThe PromQL query uses histogram_quantile(0.95, ...) to compute P95 latency from the histogram buckets. It uses irate with a 5-minute window for a smoother signal.
averageValue: "4"Triggers a scale-out when the P95 latency reaches or exceeds 4 ms.

Verify autoscaling behavior

These steps validate the request-count-based HPA (podinfo-total). To test other HPAs, adjust the load parameters to trigger the corresponding metric threshold.

  1. Send sustained traffic at 10 requests per second for 5 minutes:

    FlagDescription
    -z 5mSend requests for 5 minutes
    -c 2Use 2 concurrent connections
    -q 10Send 10 requests per second
       alias k="kubectl --kubeconfig $USER_CONFIG"
       loadtester=$(k -n test get pod -l "app=flagger-loadtester" -o jsonpath='{.items..metadata.name}')
       k -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 2 -q 10 http://podinfo:9898
  2. In a separate terminal, watch the HPA status: Expected output: The REPLICAS column shows 2 -- the HPA scaled the deployment from 1 to 2 pods in response to the traffic load.

       watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total
       NAME      REFERENCE            TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
       podinfo   Deployment/podinfo   10056m/10 (avg)   1         5         2          4m45s
  3. Increase the traffic to 15 requests per second:

       alias k="kubectl --kubeconfig $USER_CONFIG"
       loadtester=$(k -n test get pod -l "app=flagger-loadtester" -o jsonpath='{.items..metadata.name}')
       k -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 2 -q 15 http://podinfo:9898
  4. Watch the HPA status again: Expected output: The REPLICAS column now shows 3. As traffic increases, the HPA adds pods up to the maxReplicas limit (5). When traffic drops, the HPA scales the deployment back down to minReplicas (1).

       watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total
       NAME      REFERENCE            TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
       podinfo   Deployment/podinfo   10056m/10 (avg)   1         5         3          4m45s

What's next