All Products
Search
Document Center

Container Service for Kubernetes:Use ASM metrics to automatically scale workloads

Last Updated:Mar 26, 2026

Service Mesh (ASM) collects telemetry data from ACK and ACS clusters non-intrusively, generating service metrics based on the four golden signals: latency, traffic, errors, and saturation. This topic explains how to configure a Horizontal Pod Autoscaler (HPA) that scales workloads based on these ASM metrics — going beyond CPU and memory to scale on real traffic patterns.

How it works

ASM exposes service metrics (such as requests per second) to Prometheus. A custom metrics adapter — kube-metrics-adapter — registers these Prometheus metrics with the Kubernetes aggregation layer, making them available to HPAs via the custom metrics API.

image

The end-to-end flow:

  1. ASM collects request metrics and writes them to Prometheus.

  2. kube-metrics-adapter queries Prometheus and registers the metrics as external metrics in Kubernetes.

  3. The HPA polls the external metrics API every 30 seconds and adjusts the replica count when the metric value crosses the threshold.

For a full list of ASM-generated metrics, see Istio Standard Metrics.

Prerequisites

Before you begin, ensure that you have:

Step 1: Enable Prometheus monitoring for the ASM instance

Follow the instructions in Collect metrics to Managed Service for Prometheus to enable Prometheus scraping for your ASM instance.

Step 2: Deploy the custom metrics adapter

The custom metrics adapter (kube-metrics-adapter) bridges Prometheus metrics and the Kubernetes external metrics API, so HPAs can query ASM metrics directly.

  1. Install kube-metrics-adapter into the kube-system namespace using Helm 3. Set prometheus.url to the in-cluster address of your Prometheus instance. For the chart source, see kube-metrics-adapter.

    Parameter Description
    asm-custom-metrics Helm release name
    prometheus.url In-cluster address of the Prometheus instance that scrapes ASM metrics
    helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter \
      --set prometheus.url=http://prometheus.istio-system.svc:9090
  2. Verify that the adapter is running:

    1. Check that the autoscaling/v2beta API group is registered:

      kubectl api-versions | grep "autoscaling/v2beta"

      Expected output:

      autoscaling/v2beta
    2. Check that the adapter pod is running:

      kubectl get po -n kube-system | grep metrics-adapter

      Expected output:

      asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2****   1/1   Running   0   19s
    3. Check that the external metrics API is available (no metrics registered yet):

      kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

      Expected output:

      {
        "kind": "APIResourceList",
        "apiVersion": "v1",
        "groupVersion": "external.metrics.k8s.io/v1beta1",
        "resources": []
      }

Step 3: Deploy a sample application

This step deploys a podinfo application and a load testing service in the test namespace, so you can later trigger and observe auto scaling.

  1. Create the test namespace. See Manage namespaces and resource quotas.

  2. Enable automatic sidecar proxy injection for the test namespace. See Enable automatic sidecar proxy injection.

  3. Deploy the podinfo application. Create a file named podinfo.yaml with the following content, then apply it.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: podinfo
      namespace: test
      labels:
        app: podinfo
    spec:
      minReadySeconds: 5
      strategy:
        rollingUpdate:
          maxUnavailable: 0
        type: RollingUpdate
      selector:
        matchLabels:
          app: podinfo
      template:
        metadata:
          annotations:
            prometheus.io/scrape: "true"
          labels:
            app: podinfo
        spec:
          containers:
          - name: podinfod
            image: stefanprodan/podinfo:latest
            imagePullPolicy: IfNotPresent
            ports:
            - containerPort: 9898
              name: http
              protocol: TCP
            command:
            - ./podinfo
            - --port=9898
            - --level=info
            livenessProbe:
              exec:
                command:
                - podcli
                - check
                - http
                - localhost:9898/healthz
              initialDelaySeconds: 5
              timeoutSeconds: 5
            readinessProbe:
              exec:
                command:
                - podcli
                - check
                - http
                - localhost:9898/readyz
              initialDelaySeconds: 5
              timeoutSeconds: 5
            resources:
              limits:
                cpu: 2000m
                memory: 512Mi
              requests:
                cpu: 100m
                memory: 64Mi
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: podinfo
      namespace: test
      labels:
        app: podinfo
    spec:
      type: ClusterIP
      ports:
        - name: http
          port: 9898
          targetPort: 9898
          protocol: TCP
      selector:
        app: podinfo
    kubectl apply -n test -f podinfo.yaml
  4. Deploy the load testing service. Create a file named loadtester.yaml with the following content, then apply it.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: loadtester
      namespace: test
      labels:
        app: loadtester
    spec:
      selector:
        matchLabels:
          app: loadtester
      template:
        metadata:
          labels:
            app: loadtester
          annotations:
            prometheus.io/scrape: "true"
        spec:
          containers:
            - name: loadtester
              image: weaveworks/flagger-loadtester:0.18.0
              imagePullPolicy: IfNotPresent
              ports:
                - name: http
                  containerPort: 8080
              command:
                - ./loadtester
                - -port=8080
                - -log-level=info
                - -timeout=1h
              livenessProbe:
                exec:
                  command:
                    - wget
                    - --quiet
                    - --tries=1
                    - --timeout=4
                    - --spider
                    - http://localhost:8080/healthz
                timeoutSeconds: 5
              readinessProbe:
                exec:
                  command:
                    - wget
                    - --quiet
                    - --tries=1
                    - --timeout=4
                    - --spider
                    - http://localhost:8080/healthz
                timeoutSeconds: 5
              resources:
                limits:
                  memory: "512Mi"
                  cpu: "1000m"
                requests:
                  memory: "32Mi"
                  cpu: "10m"
              securityContext:
                readOnlyRootFilesystem: true
                runAsUser: 10001
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: loadtester
      namespace: test
      labels:
        app: loadtester
    spec:
      type: ClusterIP
      selector:
        app: loadtester
      ports:
        - name: http
          port: 80
          protocol: TCP
          targetPort: http
    kubectl apply -n test -f loadtester.yaml
  5. Verify that both workloads are running:

    kubectl get pod -n test

    Expected output (both pods show 2/2 Running, indicating the app container and the Istio sidecar are both ready):

    NAME                          READY   STATUS    RESTARTS   AGE
    loadtester-64df4846b9-nxhvv   2/2     Running   0          2m8s
    podinfo-6d845cc8fc-26xbq      2/2     Running   0          11m
  6. Send a short burst of traffic to confirm the setup is working end-to-end:

    export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}')
    kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898

    A successful response from hey confirms that the podinfo service is reachable through the mesh.

Step 4: Configure an HPA using ASM metrics

Define an HPA that scales the podinfo deployment based on the number of incoming requests per second, as measured by the istio_requests_total metric in Prometheus.

The HPA uses two Kubernetes constructs together:

  • An annotation that embeds the PromQL query and gives it a name (processed-requests-per-second).

  • A metric reference in spec.metrics that points to the named query and sets the scale threshold.

Create a file named hpa.yaml with the following content, then apply it:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
  namespace: test
  annotations:
    # The annotation key format is:
    # metric-config.external.prometheus-query.prometheus/<query-name>
    # The query-name must match the value of the matchLabels selector below.
    metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
      sum(
          rate(
              istio_requests_total{
                destination_workload="podinfo",
                destination_workload_namespace="test",
                reporter="destination"
              }[1m]
          )
      )
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  metrics:
    - type: External
      external:
        metric:
          name: prometheus-query
          selector:
            matchLabels:
              query-name: processed-requests-per-second  # matches the annotation key suffix above
        target:
          type: AverageValue
          averageValue: "10"  # scale out when average RPS per replica exceeds 10
kubectl apply -f hpa.yaml

Key fields explained:

Field Value Description
metric-config.external.prometheus-query.prometheus/<query-name> annotation PromQL expression Defines the query that kube-metrics-adapter runs against Prometheus. The <query-name> suffix must match the query-name label in spec.metrics.
query-name label processed-requests-per-second Links the annotation (the PromQL query) to the metric reference in spec.metrics.
averageValue "10" The HPA scales out when the average number of requests per second per replica exceeds 10.
minReplicas / maxReplicas 1 / 10 Replica count bounds.

After applying the HPA, verify that the external metric is now registered:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

Expected output:

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "prometheus-query",
      "singularName": "",
      "namespaced": true,
      "kind": "ExternalMetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

The prometheus-query entry in resources confirms that kube-metrics-adapter has registered the metric and the HPA is active.

Verify auto scaling

  1. Open a terminal and start a sustained load against podinfo (5 minutes, 10 concurrent users, 5 requests/second each):

    kubectl -n test exec -it ${loadtester} -c loadtester -- sh
    ~ $ hey -z 5m -c 10 -q 5 http://podinfo.test:9898
  2. In a separate terminal, watch the HPA scale up:

    Metrics are synchronized every 30 seconds by default. The HPA also enforces a cooldown of 3–5 minutes between scale events to prevent thrashing.
    watch kubectl -n test get hpa/podinfo

    As load increases above the threshold, the HPA scales out:

    NAME      REFERENCE            TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
    podinfo   Deployment/podinfo   8308m/10 (avg)   1         10        6          124m

    The value 8308m is Kubernetes milli-unit notation for 8.308 requests per second. Because the average RPS per replica (8.3) is below the threshold of 10, the HPA has stabilized at 6 replicas. If the load were higher, the HPA would continue scaling up toward the 10-replica maximum.

  3. After the load test finishes, the request rate drops to zero. The HPA begins scaling down, and within a few minutes the replica count returns to 1.