All Products
Search
Document Center

Container Service for Kubernetes:Configure custom metrics using AHPA for application scaling

Last Updated:May 26, 2026

For workloads such as web services or message processing, scaling on CPU or memory alone often wastes resources or adds latency, because real load correlates with business signals such as queries per second (QPS) and queue depth. Configure AdvancedHorizontalPodAutoscaler (AHPA) to autoscale on Managed Service for Prometheus metrics matching real traffic.

How it works

image

  1. Expose metrics: Applications expose business metrics, such as requests_per_second, in Prometheus format through an HTTP endpoint, such as /metrics.

  2. Scrape metrics: The Managed Service for Prometheus agent in the cluster discovers and periodically scrapes the metric data exposed by the application and stores the data in Managed Service for Prometheus.

  3. Query metrics: The AHPA controller periodically queries the external metrics API to get the custom metric's current value.

  4. Adapt metrics: The request is forwarded to the ack-alibaba-cloud-metrics-adapter, which translates it into a PromQL query and sends it to Managed Service for Prometheus.

  5. Return value: After executing the query, Managed Service for Prometheus returns the result to the adapter.

  6. Report result: The metrics adapter reports the result to the AHPA controller.

  7. Scaling decision: The AHPA controller calculates the desired number of pod replicas from the current metric values and the preset target threshold. It then adjusts the number of replicas for the target Deployment to scale the application in or out.

Before you begin

Step 1: Deploy the application and ServiceMonitor

First, deploy a sample application that can expose custom metrics, then configure a ServiceMonitor so that Prometheus can scrape the application's metrics endpoint.

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Workloads > Deployments.

  3. On the Deployments page, click Create from YAML, and follow the on-screen instructions to deploy the YAML, which creates the sample-app application, a Service to provide in-cluster access, and a ServiceMonitor for metrics collection.

    • Deployment

      This container exposes the custom metric requests_per_second at the /metrics path on port 8080, which indicates the number of requests per second.

      YAML content

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: sample-app
        namespace: default
        labels:
          app: sample-app
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: sample-app
        template:
          metadata:
            labels:
              app: sample-app
          spec:
            containers:
            - image: registry-cn-hangzhou.ack.aliyuncs.com/acs/ahpa-external-sample:v1
              name: metrics-provider
              ports:
              - name: http
                containerPort: 8080
              env:
              - name: NAMESPACE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.namespace
              - name: POD_NAME
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.name
    • Service

      This creates a stable in-cluster endpoint for the Deployment.

      YAML content

      apiVersion: v1
      kind: Service
      metadata:
        name: sample-app
        namespace: default
        labels:
          app: sample-app
      spec:
        # Selects pods with the label app: sample-app.
        selector:
          app: sample-app
        ports:
        - name: http
          port: 8080
          targetPort: 8080
        type: ClusterIP
    • ServiceMonitor

      After this resource is created, metric scraping begins. ServiceMonitor is enabled by default. To verify its status, see Enable features.

      YAML content

      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        name: sample-app
        namespace: default
      spec:
        endpoints:
        - interval: 30s
          port: http
          path: /metrics
        namespaceSelector:
          any: true
        selector:
          matchLabels:
            # Associates with the previously created Service through this label.
            app: sample-app

Step 2: Deploy the metrics adapter

The metrics adapter acts as a bridge between AHPA and Prometheus. After deploying the add-on, configure it to connect to your Managed Service for Prometheus instance.

  1. On the Clusters page, click the name of your cluster. In the left navigation pane, click Applications > Helm.

  2. Click Create, then search for and deploy ack-alibaba-cloud-metrics-adapter.

    • Chart Version: Use the latest version.

    • Parameter configuration: In the Chart YAML, in the Parameter section, configure prometheus.url and prometheus.prometheusHeader, and then click OK.

      • prometheus.url: The HTTP API address of Managed Service for Prometheus (the Grafana read endpoint). For more information, see How to obtain the Prometheus data request URL.

      • prometheus.prometheusHeader:

        Detailed steps

        • Prometheus V1 (token authentication is disabled by default): If token authentication is enabled, copy the token from the Prometheus console and set it in the adapter configuration.

          image

          prometheus:
            prometheusHeader:
            - Authorization: {Token}
        • Prometheus V2 (AccessKey authentication is enabled by default): If password-free access is not enabled, Base64-encode your AccessKey ID and AccessKey secret, then set them in the adapter configuration.

          1. Generate a Base64-encoded string.

            Concatenate the AccessKey ID and AccessKey secret in the AccessKey:AccessSecret format, then Base64-encode the result.

            echo -n 'accessKey:secretKey' | base64
          2. Configure the component.

            Enter the generated string in the Basic <encoded string> format into the Authorization field of prometheusHeader.

            ...
                prometheus:
                  prometheusHeader:
                  - Authorization: Basic YWxxxxeQ==
            ...

Step 3: Configure custom metrics

  1. On the Clusters page, click the name of your cluster. In the left navigation pane, click Applications > Helm.

  2. Locate ack-alibaba-cloud-metrics-adapter and click Actions in the Actions column.

  3. Replace the corresponding parameters in the template with the following YAML content, and then click Update.

    In the example, replace requests_per_second with the actual metric for requests per second in Prometheus.

      ......
      prometheus:
        adapter:
          rules:
            custom:
            - metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>})
              name:
                as: requests_per_second
              resources:
                overrides:
                  namespace:
                    resource: namespace
              seriesQuery: requests_per_second # Set the metric name. Make sure this name matches the metric in Managed Service for Prometheus.
        ......
  4. Use the Custom Metrics API to view details about the available metrics.

    kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/requests_per_second"

    Expected output:

    {"kind":"ExternalMetricValueList","apiVersion":"external.metrics.k8s.io/v1beta1","metadata":{},"items":[{"metricName":"requests_per_second","metricLabels":{},"timestamp":"2025-10-15T07:57:00Z","value":"1"}]}

Step 4: Create an AHPA rule and verify scaling

Next, create an AHPA rule to automate scaling and run a stress test to verify its behavior.

  1. Create an AHPA resource.

    Use the following YAML to create an AHPA resource. AHPA scales out when the average requests_per_second value per pod exceeds 10, and scales in when it falls below 10.

    • Configure external.metric by specifying the metric name and matchLabels. The metric name must match the one specified in Configure custom metrics. In this example, the custom metric is set to requests_per_second.

    • Set the target threshold. For example, set AverageValue to 10. This means that a scale-out begins if the number of requests per second exceeds 10.

    YAML content

    apiVersion: autoscaling.alibabacloud.com/v1beta1
    kind: AdvancedHorizontalPodAutoscaler
    metadata:
      name: customer-deployment
      namespace: default
    spec:
      metrics:
      - external:
          metric:
            # This must match the name.as value in the metrics adapter rule.
            name: requests_per_second
            selector:
              matchLabels:
                # Filter the metric source by using labels.
                namespace: default
                service: sample-app
          target:
            # Indicates that the target value is an average per pod.
            type: AverageValue
            # Indicates that a scale-out is triggered when the average QPS per pod exceeds 10.
            averageValue: 10
        type: External
      # Minimum and maximum replica limits.
      minReplicas: 0
      maxReplicas: 50
      prediction:
        quantile: 95
        scaleUpForward: 180
      scaleStrategy: observer
      # Declare the target workload for AHPA to scale.
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: sample-app
      instanceBounds:
      - startTime: "2023-08-01 00:00:00"
        endTime: "2033-08-01 00:00:00"
        bounds:
        - cron: "* 0-8 ? * MON-FRI"
          maxReplicas: 50
          minReplicas: 4
        - cron: "* 9-15 ? * MON-FRI"
          maxReplicas: 50
          minReplicas: 5
        - cron: "* 16-23 ? * MON-FRI"
          maxReplicas: 50
          minReplicas: 1
  2. After the stress test, check the status of the AHPA object.

    kubectl get ahpa

    Expected output:

    NAME                  STRATEGY   PERIODICITY   REFERENCE               METRIC                TARGETS   DESIREDPODS   REPLICAS   MINPODS   MAXPODS   AGE
    customer-deployment   observer                 Deployment/sample-app   requests_per_second   16/10     2             1          1         50        102s
    • TARGETS: The current metric value is 16, and the target value is 10.

    • DESIREDPODS: AHPA calculates the desired number of replicas as 2 based on Current Value (16) / Target Value (10) = 2.

    • REPLICAS: Displays the actual number of replicas of sample-app.

      Because the current AHPA's STRATEGY is observer, it only performs calculations and observations and does not execute scaling operations. Therefore, even though DESIREDPODS is 2, REPLICAS remains 1.

    Run the kubectl get deployment sample-app command to check the real-time changes in the pod replica count.

Production considerations

Aspect

Description

Metric selection

Use smoothed metrics that reflect the actual workload rather than instantaneous values. This prevents scaling fluctuation caused by traffic spikes.

Scaling policy configuration

  • Set explicit safety boundaries (maxReplicas and minReplicas) for scaling actions to prevent resource exhaustion and uncontrolled costs caused by traffic attacks or metric anomalies.

  • Configure a scale-in stabilization window by setting a longer observation period for scale-in than for scale-out. This prevents a premature scale-in when traffic drops, avoiding fluctuation if traffic rebounds quickly.

Monitoring and alerting

Set up alerts for the AHPA operational status to promptly identify potential issues, such as capacity bottlenecks, improperly configured policies, or abnormal upstream traffic.