All Products
Search
Document Center

Container Service for Kubernetes:Configure custom metrics using AHPA for application scaling

Last Updated:Mar 26, 2026

CPU and memory metrics don't always reflect real application load. When your scaling signal is business-level — such as HTTP requests per second (RPS) or message queue depth — custom metrics give you a more accurate trigger. This guide shows how to configure AdvancedHorizontalPodAutoscaler (AHPA) with the ack-alibaba-cloud-metrics-adapter component to autoscale a deployment based on a Prometheus-scraped metric.

AHPA uses the Kubernetes External Metrics API, which lets it query any metric available in your Prometheus instance — not just pod-level metrics. Compared to standard HPA custom metrics (Pods/Object types), External Metrics provide broader flexibility. Use External Metrics when your scaling signal comes from a monitoring service such as Managed Service for Prometheus.

Prerequisites

Before you begin, ensure that you have:

Step 1: Deploy the sample app and configure metric scraping

This step deploys a sample app that exposes requests_per_second as a Prometheus metric and a load generator, then configures Prometheus to scrape it.

Deploy the app and load generator

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. Click the name of your cluster. In the left navigation pane, click Workloads > Deployments.

  3. On the Deployments page, click Create from YAML. Paste the following YAML, then click Create. This deploys:

    • sample-app: a server that exposes the requests_per_second metric at /metrics on port 8080

    • fib-loader-qps: a load generator that sends traffic to sample-app

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: sample-app
      labels:
        app: sample-app
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: sample-app
      template:
        metadata:
          labels:
            app: sample-app
        spec:
          containers:
          - image: registry.cn-hangzhou.aliyuncs.com/acs/knative-sample-fib-server:v1
            name: metrics-provider
            ports:
            - name: http
              containerPort: 8080
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: sample-app
      namespace: default
      labels:
        app: sample-app
    spec:
      ports:
        - port: 8080
          name: http
          protocol: TCP
          targetPort: 8080
      selector:
        app: sample-app
      type: ClusterIP
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: fib-loader-qps
      namespace: default
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: fib-loader-qps
      strategy:
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%
        type: RollingUpdate
      template:
        metadata:
          creationTimestamp: null
          labels:
            app: fib-loader-qps
        spec:
          containers:
          - args:
            - -c
            - |
              /ko-app/fib-loader --service-url="http://sample-app.${NAMESPACE}:8080/" --save-path=/tmp/fib-loader-chart.html
            command:
            - sh
            env:
            - name: NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-loader:20201126-110434
            imagePullPolicy: IfNotPresent
            name: loader
            ports:
            - containerPort: 8090
              name: chart
              protocol: TCP

Create a ServiceMonitor

  1. On the Create page, paste the following YAML to create a ServiceMonitor, then click Create. The ServiceMonitor tells Prometheus to scrape the /metrics endpoint of the sample-app Service every 30 seconds.

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: sample-app
      namespace: default
    spec:
      endpoints:
      - interval: 30s
        port: http
        path: /metrics
      namespaceSelector:
        any: true
      selector:
        matchLabels:
          app: sample-app

Enable the ServiceMonitor in ARMS

  1. Log on to the ARMS console. In the left navigation pane, click Integration Management.

  2. On the Integrated Environments tab, click the Container Service tab. Find your ACK instance and click Metric Scraping in the Actions column.

  3. On the Metric Collection page, click the Service Monitor tab. Find sample-app, click Enable in the Actions column, and confirm.

    image

Step 2: Deploy the metrics adapter

The metrics adapter bridges Managed Service for Prometheus to the Kubernetes External Metrics API so AHPA can query your custom metric.

Get the Prometheus internal endpoint

  1. Log on to the ARMS console. In the left navigation pane, choose Managed Service for Prometheus > Instances.

  2. On the Prometheus Instances page, select the region of your instance and click the instance name. Instance names follow the format arms_metrics_{RegionId}_XXX.

  3. In the left navigation bar, click Settings. Under HTTP API Address (Grafana Read Address), copy the Internal network URL.

    • If token-based authentication is enabled for your instance, also copy the access token.

    2.jpg

Install ack-alibaba-cloud-metrics-adapter

  1. In the ACK console, click Marketplace > Marketplace in the left navigation pane.

  2. Click the App Catalog tab, search for ack-alibaba-cloud-metrics-adapter, and click Deploy in the upper-right corner.

  3. On the Basic Information page, select your Cluster and Namespace, then click Next.

  4. In the Parameter Configuration wizard, select a Chart Version. In the Parameters area, set the following values, then click OK.

    Parameter Required Description
    prometheus.url Yes Internal network URL of the Managed Service for Prometheus instance
    prometheus.prometheusHeader No Authorization header for token-based authentication. Leave blank if authentication is not enabled.
    prometheus:
      enabled: true
      # Required: the internal network URL of Managed Service for Prometheus
      url: http://cn-beijing-intranet.arms.aliyuncs.com:9090/api/v1/prometheus/<instance-id>/<uid>/<cluster-id>/<region>
      # Required only if token-based authentication is enabled
      prometheusHeader:
      - Authorization: <your-access-token>

Configure custom metrics

  1. In the ACK console, click Applications > Helm in the left navigation pane. Find alibaba-cloud-metrics-adapter and click Update in the Actions column.

  2. Copy the following YAML content and paste it to overwrite the corresponding parameters in the template. Note that the requests_per_second parameter must be changed to the actual Prometheus metric name. Then, click Update.

    The following is a partial configuration snippet. Paste it to overwrite the corresponding parameters in the existing template; do not replace the entire Helm values file.
    Parameter Required Description
    seriesQuery Yes The metric name in Prometheus. Must match exactly.
    name.as Yes The name exposed via the External Metrics API. AHPA references this name.
    metricsQuery Yes The PromQL aggregation applied when AHPA queries the metric.
    enabled Yes Set to true to activate the Prometheus adapter.
    ......
    prometheus:
      adapter:
        rules:
          custom:
          - metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>})
            name:
              as: requests_per_second   # Name exposed to the External Metrics API
            resources:
              overrides:
                namespace:
                  resource: namespace
            seriesQuery: requests_per_second  # Must match the metric name in Prometheus
          default: false
    enabled: true   # Set to true to enable the Prometheus adapter
    ......
  3. Verify the metric is available via the External Metrics API:

     kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/requests_per_second"

    Expected output:

     {"kind":"ExternalMetricValueList","apiVersion":"external.metrics.k8s.io/v1beta1","metadata":{},"items":[{"metricName":"requests_per_second","metricLabels":{},"timestamp":"2023-08-15T07:59:09Z","value":"10"}]}

    If the metric appears in the response, the adapter is correctly configured.

Step 3: Create an AHPA resource

With the External Metrics API returning data, create an AHPA resource to autoscale sample-app based on requests_per_second.

  1. Apply the following YAML. Adjust the metric name, averageValue threshold, and minReplicas/maxReplicas to match your workload.

    This example sets minReplicas: 0, which allows AHPA to scale the deployment to zero. Scaling from zero to one replica requires the metric value to exceed the averageValue threshold. When the deployment is scaled to zero, AHPA is responsible for the 0-to-1 transition. Once at least one replica is running, the standard scaling logic (1 to N replicas) applies based on the averageValue threshold.
    Parameter Description
    metrics[].external.metric.name Must match the name.as value in the adapter configuration.
    target.averageValue Trigger threshold per pod. Scale out when the metric exceeds this value.
    prediction.quantile Percentile of the historical metric forecast used for proactive scaling.
    prediction.scaleUpForward Seconds to scale ahead of a predicted traffic spike.
    scaleStrategy Set to observer to let AHPA observe and report without enforcing scale decisions.
    instanceBounds Time-based min/max replica overrides using cron expressions.
    apiVersion: autoscaling.alibabacloud.com/v1beta1
    kind: AdvancedHorizontalPodAutoscaler
    metadata:
      name: customer-deployment
      namespace: default
    spec:
      metrics:
      - external:
          metric:
            name: requests_per_second      # Must match name.as in the adapter config
            selector:
              matchLabels:
                namespace: default
                service: sample-app
          target:
            type: AverageValue
            averageValue: 10               # Scale out when average RPS per pod exceeds 10
        type: External
      minReplicas: 0
      maxReplicas: 50
      prediction:
        quantile: 95                       # Use the 95th-percentile forecast for proactive scaling
        scaleUpForward: 180                # Pre-scale 180 seconds before predicted demand spike
      scaleStrategy: observer
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: sample-app
      instanceBounds:
      - startTime: "2023-08-01 00:00:00"
        endTime: "2033-08-01 00:00:00"
        bounds:
        - cron: "* 0-8 ? * MON-FRI"
          maxReplicas: 50
          minReplicas: 4
        - cron: "* 9-15 ? * MON-FRI"
          maxReplicas: 50
          minReplicas: 5
        - cron: "* 16-23 ? * MON-FRI"
          maxReplicas: 50
          minReplicas: 1
  2. Verify that AHPA is scaling correctly:

    kubectl get ahpa

    Expected output:

    NAME                  STRATEGY   REFERENCE                   METRIC                TARGETS     DESIREDPODS   REPLICAS   MINPODS   MAXPODS   AGE
    customer-deployment   observer   Deployment/sample-app       requests_per_second   60000m/10   6             1          1         50        7h53m

    Kubernetes expresses metric values in milli units (m) for precision. In this example, 60000m equals 60 requests per second. With a threshold of 10, AHPA calculates DESIREDPODS as 6 (60 / 10).

Troubleshoot AHPA scaling issues

If AHPA is not scaling as expected, inspect the status conditions:

kubectl describe ahpa customer-deployment

Check the Conditions section of the output. Key fields:

Field What to check
AbleToScale Whether AHPA can issue scale decisions. False indicates a controller or permission issue.
ScalingActive Whether AHPA successfully retrieved the metric. False usually means the External Metrics API is not returning data.
ScalingLimited Whether the current replica count is constrained by minReplicas, maxReplicas, or instanceBounds.

If ScalingActive is False, re-run the verification command from Step 2 to confirm the metric is available:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/requests_per_second"

What's next

  • To move from observer mode to active autoscaling, change scaleStrategy to a scaling strategy and remove the observer setting.

  • To apply this pattern to a different metric, update seriesQuery and name.as in the adapter configuration, then update the metrics[].external.metric.name in the AHPA resource to match.