All Products
Search
Document Center

Container Compute Service:Deploy and use AHPA to predict resource demand

Last Updated:Mar 26, 2026

Advanced Horizontal Pod Autoscaler (AHPA) uses machine learning to predict your application's resource needs for the next 24 hours based on the last seven days of historical data. It scales out pods before predicted demand peaks arrive and scales in ahead of troughs — so your application handles traffic spikes without over-provisioning during quiet periods.

This tutorial walks you through a complete AHPA deployment: installing the controller, connecting Prometheus as a data source, deploying a test workload, creating an AHPA policy, and reading the prediction results.

Prerequisites

Before you begin, ensure that you have:

How it works

AHPA collects historical metric data through Managed Service for Prometheus and applies machine learning algorithms to predict the number of pods required over the next 24 hours. It provides two complementary prediction modes that work together:

  • Proactive prediction — scales pods out ahead of forecasted demand peaks and prefetches resources to absorb cold-start latency

  • Reactive prediction — responds to real-time metric signals, similar to standard HPA

At any point in time, AHPA recommends a pod count based on the proactive prediction, the reactive prediction, and the maximum and minimum numbers of pods defined in instanceBounds for the current time window. You can observe this in the AHPA dashboard before enabling automatic scaling.

Step 1: Install the AHPA controller

  1. Log on to the ACS console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, click the ID of the cluster you want to manage. In the left-side navigation pane of the cluster details page, choose Operations > Add-ons.

  3. On the Add-ons page, click the Others tab. Find AHPA Controller and click Install. Follow the on-screen instructions to complete the installation.

Step 2: Add Prometheus as a data source for AHPA

AHPA needs the internal endpoint of your Managed Service for Prometheus instance to pull historical metric data. This step records the endpoint, creates a ConfigMap in the cluster with that endpoint, and then registers AHPA as a monitored component in Prometheus.

Record the Prometheus endpoint

  1. Log on to the ARMS console. In the left-side navigation pane, choose Managed Service for Prometheus > Instances.

  2. On the Instances page, select the region where your Prometheus instance is deployed. Find the instance named after your ACS cluster — its Instance Type column shows General-purpose.

  3. In the Actions column, click Settings. In the HTTP API URL (Grafana Read URL) section, record the internal

Create the application-intelligence ConfigMap

This ConfigMap tells AHPA where to find your Prometheus instance.

  1. Create a file named application-intelligence.yaml with the following content:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: application-intelligence
      namespace: kube-system
    data:
      prometheusUrl: "http://cn-hangzhou-intranet.arms.aliyuncs.com:9443/api/v1/prometheus/da9d7dece901db4c9fc7f5b9c40****/158120454317****/cc6df477a982145d986e3f79c985a****/cn-hangzhou"
      token: "eyJhxxxxx"

    Replace the prometheusUrl value with the internal endpoint you recorded. If access tokens are enabled, replace the token value with your access token.

    To display Prometheus metrics on the AHPA dashboard, add the following keys to the ConfigMap: - prometheus_writer_url — the internal remote write endpoint of the Prometheus instance - prometheus_writer_ak — the AccessKey ID of the Alibaba Cloud account - prometheus_writer_sk — the AccessKey secret of the Alibaba Cloud account
  2. Apply the ConfigMap:

    kubectl apply -f application-intelligence.yaml

Enable Prometheus monitoring for AHPA

This step registers AHPA as a monitored component so that Prometheus starts collecting AHPA metrics.

  1. Log on to the ARMS console. In the left-side navigation pane, choose Managed Service for Prometheus > Instances.

  2. In the top navigation bar, click Integrate Other Components to go to the Integration Center page. Search for AHPA and click the AHPA card.

  3. On the ACK AHPA page, choose Select a Kubernetes cluster > Select Cluster. Select your ACS cluster from the drop-down list.

  4. In the Configuration Information section, set the following parameters and click OK:

    Parameter Description
    Exporter Name A name that is unique among the exporters collecting monitoring data from AHPA
    metrics collection interval (seconds) The interval at which the service collects monitoring data
  5. After the Integration Status Check step completes, click Integration Management and confirm that Managed Service for Prometheus is enabled for AHPA.

Step 3: Deploy a test service

Deploy a test setup that lets you compare AHPA predictions against standard HPA scaling behavior. The setup includes:

  • fib-deployment — the workload being scaled

  • fib-svc — a Service that exposes fib-deployment

  • fib-loader — a load generator that simulates traffic fluctuation

  • fib-hpa — a standard HPA that scales fib-deployment at 50% CPU utilization, used as a baseline

  1. Create a file named demo.yaml with the following content:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: fib-deployment
      namespace: default
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: fib-deployment
      strategy:
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%
        type: RollingUpdate
      template:
        metadata:
          creationTimestamp: null
          labels:
            app: fib-deployment
        spec:
          containers:
          - image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-server:20200820-171837
            imagePullPolicy: IfNotPresent
            name: user-container
            ports:
            - containerPort: 8080
              name: user-port
              protocol: TCP
            resources:
              limits:
                cpu: "1"
                memory: 2000Mi
              requests:
                cpu: "1"
                memory: 2000Mi
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: fib-svc
      namespace: default
    spec:
      ports:
      - name: http
        port: 80
        protocol: TCP
        targetPort: 8080
      selector:
        app: fib-deployment
      sessionAffinity: None
      type: ClusterIP
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: fib-loader
      namespace: default
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: fib-loader
      strategy:
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%
        type: RollingUpdate
      template:
        metadata:
          creationTimestamp: null
          labels:
            app: fib-loader
        spec:
          containers:
          - args:
            - -c
            - |
              /ko-app/fib-loader --service-url="http://fib-svc.${NAMESPACE}?size=35&interval=0" --save-path=/tmp/fib-loader-chart.html
            command:
            - sh
            env:
            - name: NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-loader:20201126-110434
            imagePullPolicy: IfNotPresent
            name: loader
            ports:
            - containerPort: 8090
              name: chart
              protocol: TCP
            resources:
              limits:
                cpu: "8"
                memory: 16000Mi
              requests:
                cpu: "2"
                memory: 4000Mi
    ---
    apiVersion: autoscaling/v1
    kind: HorizontalPodAutoscaler
    metadata:
      name: fib-hpa
      namespace: default
    spec:
      maxReplicas: 50
      minReplicas: 1
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: fib-deployment
      targetCPUUtilizationPercentage: 50
  2. Deploy the test service:

    kubectl apply -f demo.yaml

    Verify that all pods are running before proceeding:

    kubectl get pods

    Expected output (all pods in Running state):

    NAME                              READY   STATUS    RESTARTS   AGE
    fib-deployment-xxx                1/1     Running   0          1m
    fib-loader-xxx                    1/1     Running   0          1m

Step 4: Create an AHPA policy

An AHPA policy is a custom resource of kind AdvancedHorizontalPodAutoscaler. The example below starts in observer mode — AHPA generates predictions but does not scale. Use this mode to validate predictions before enabling automatic scaling.

  1. Create a file named ahpa-demo.yaml with the following content:

    apiVersion: autoscaling.alibabacloud.com/v1beta1
    kind: AdvancedHorizontalPodAutoscaler
    metadata:
      name: ahpa-demo
    spec:
      scaleTargetRef:                  # Required. The Deployment to manage.
        apiVersion: apps/v1
        kind: Deployment
        name: fib-deployment
      metrics:                         # Required. Metrics used to drive scaling decisions.
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 40     # Required. Scale when average CPU utilization exceeds 40%.
      maxReplicas: 100                 # Required. Hard upper bound on pod count.
      minReplicas: 2                   # Required. Hard lower bound on pod count.
      scaleStrategy: observer          # Optional. Default: observer.
                                       # auto: AHPA scales automatically.
                                       # observer: observe predictions without scaling.
                                       # scalingUpOnly: scale out only, never scale in.
                                       # proactive: proactive prediction only.
                                       # reactive: reactive prediction only.
      stabilizationWindowSeconds: 300  # Optional. Default: 300 seconds. Cooldown for scale-in.
      prediction:
        quantile: 95                   # Required. Default: 0.99. Range: 0–1, two decimal places.
                                       # Higher value = more conservative (fewer false scale-outs).
                                       # Recommended range: 0.90–0.99.
        scaleUpForward: 180            # Required. Pod cold-start duration in seconds
                                       # (time from pod creation to Ready state).
      instanceBounds:                  # Optional. Scheduled replica limits.
      - startTime: "2021-12-16 00:00:00"
        endTime: "2031-12-16 00:00:00"
        bounds:
        - cron: "* 0-8 ?  * MON-FRI"  # Mon–Fri, 00:00–08:59
          maxReplicas: 15
          minReplicas: 4
        - cron: "* 9-15 ?  * MON-FRI" # Mon–Fri, 09:00–15:59
          maxReplicas: 15
          minReplicas: 10
        - cron: "* 16-23 ?  * MON-FRI" # Mon–Fri, 16:00–23:59
          maxReplicas: 20
          minReplicas: 15

    The following table describes the key parameters.

    Parameter Required Default Description
    scaleTargetRef Yes The Deployment to manage
    metrics Yes The metrics that drive scaling. Supported: CPU, GPU, memory, QPS (queries per second), and RT (response time)
    averageUtilization Yes The scaling threshold. averageUtilization: 40 means AHPA scales when average CPU utilization exceeds 40%
    maxReplicas Yes Maximum number of pods
    minReplicas Yes Minimum number of pods
    scaleStrategy No observer Scaling mode
    stabilizationWindowSeconds No 300 Scale-in cooldown period, in seconds
    prediction.quantile Yes 0.99 Probability threshold for the predicted metric not exceeding the scaling threshold. Range: 0–1. Recommended: 0.90–0.99
    prediction.scaleUpForward Yes Pod cold-start duration: time from pod creation to the Ready state, in seconds
    instanceBounds No Time windows with scheduled maxReplicas and minReplicas overrides
    instanceBounds.bounds.cron No Cron schedule for a replica limit window

    Cron expressions in instanceBounds.bounds.cron use a five-field format (Quartz-compatible) separated by spaces. The fields are:

    Field Required Valid values Special characters
    Minutes Yes 0–59 * / , -
    Hours Yes 0–23 * / , -
    Day of month Yes 1–31 * / , - ?
    Month Yes 1–12 or JAN–DEC * / , -
    Day of week No 0–6 or SUN–SAT (default: *) * / , - ?

    Special character meanings:

    • * — any value

    • / — increment (e.g., */5 means every 5 units)

    • , — list separator

    • - — range

    • ? — placeholder (use in Day of month or Day of week when the other field is specified)

    The Month and Day of week fields are case-insensitive. For example, SUN, Sun, and sun are all valid.

    For more information, see Cron expressions.

  2. Apply the AHPA policy:

    kubectl apply -f ahpa-demo.yaml

Step 5: View prediction results

AHPA builds predictions from the last seven days of historical data. Wait at least seven days after applying the policy before evaluating prediction accuracy. For an existing application, select the corresponding Deployment in the AHPA dashboard.

Open the AHPA dashboard

On the Integration Management page, click the name of your cluster on the Container Service tab. In the Addon Type section, select ACK AHPA. Click the Dashboards tab and then click ahpa-dashboard.

Read the dashboard charts

The dashboard shows three charts:

CPU utilization & actual POD Displays the average CPU utilization and the current pod count for the Deployment. Use this chart to confirm that fib-loader is generating the expected CPU load.

Actual and predicted CPU usage Compares actual CPU usage (green line, driven by HPA) with AHPA's predicted CPU usage (yellow line). When the yellow line runs higher than the green line, AHPA has reserved enough headroom. When the yellow line rises earlier than the green line, AHPA has prepared resources in advance of the actual demand increase.

Pod trends Shows three pod count series:

Series Description
Current number of pods Pods currently running
Recommended number of pods The pod count AHPA recommends, generated based on the proactive prediction, the reactive prediction, and the maximum and minimum numbers of pods within the current time period
Proactively predicted number of pods The pod count AHPA predicts based on historical patterns

Interpret the example results

In this example, scaleStrategy is set to observer, so AHPA generates predictions without scaling. The following figure compares AHPA predictions with the HPA baseline:

image.png

Key observations from the figure:

  • Actual and predicted CPU usage: The predicted CPU usage (yellow) is consistently higher than the actual usage (green), confirming that AHPA has sized capacity conservatively. The yellow line also rises ahead of the green line, confirming that resources are prepared before demand arrives.

  • Pod trends: The predicted pod count (yellow) is lower than the HPA-provisioned count (green), and the yellow curve is smoother. This means AHPA's recommendations would produce fewer abrupt scaling events, improving workload stability.

Key AHPA metrics

Metric Description
ahpa_proactive_pods Proactively predicted pod count
ahpa_reactive_pods Reactively predicted pod count
ahpa_requested_pods Recommended pod count
ahpa_max_pods Maximum pod count
ahpa_min_pods Minimum pod count
ahpa_target_metric Scaling threshold

Enable automatic scaling

After confirming that the predictions match expectations, set scaleStrategy to auto in ahpa-demo.yaml and reapply:

kubectl apply -f ahpa-demo.yaml

AHPA then automatically scales fib-deployment based on its predictions.

What's next