Advanced Horizontal Pod Autoscaler (AHPA) uses machine learning to predict your application's resource needs for the next 24 hours based on the last seven days of historical data. It scales out pods before predicted demand peaks arrive and scales in ahead of troughs — so your application handles traffic spikes without over-provisioning during quiet periods.
This tutorial walks you through a complete AHPA deployment: installing the controller, connecting Prometheus as a data source, deploying a test workload, creating an AHPA policy, and reading the prediction results.
Prerequisites
Before you begin, ensure that you have:
-
An ACS cluster. For more information, see Create an ACS cluster
-
Managed Service for Prometheus enabled for the cluster. For more information, see Use Managed Service for Prometheus to monitor ACS clusters
How it works
AHPA collects historical metric data through Managed Service for Prometheus and applies machine learning algorithms to predict the number of pods required over the next 24 hours. It provides two complementary prediction modes that work together:
-
Proactive prediction — scales pods out ahead of forecasted demand peaks and prefetches resources to absorb cold-start latency
-
Reactive prediction — responds to real-time metric signals, similar to standard HPA
At any point in time, AHPA recommends a pod count based on the proactive prediction, the reactive prediction, and the maximum and minimum numbers of pods defined in instanceBounds for the current time window. You can observe this in the AHPA dashboard before enabling automatic scaling.
Step 1: Install the AHPA controller
-
Log on to the ACS console. In the left-side navigation pane, click Clusters.
-
On the Clusters page, click the ID of the cluster you want to manage. In the left-side navigation pane of the cluster details page, choose Operations > Add-ons.
-
On the Add-ons page, click the Others tab. Find AHPA Controller and click Install. Follow the on-screen instructions to complete the installation.
Step 2: Add Prometheus as a data source for AHPA
AHPA needs the internal endpoint of your Managed Service for Prometheus instance to pull historical metric data. This step records the endpoint, creates a ConfigMap in the cluster with that endpoint, and then registers AHPA as a monitored component in Prometheus.
Record the Prometheus endpoint
-
Log on to the ARMS console. In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
-
On the Instances page, select the region where your Prometheus instance is deployed. Find the instance named after your ACS cluster — its Instance Type column shows General-purpose.
-
In the Actions column, click Settings. In the HTTP API URL (Grafana Read URL) section, record the internal
Create the application-intelligence ConfigMap
This ConfigMap tells AHPA where to find your Prometheus instance.
-
Create a file named
application-intelligence.yamlwith the following content:apiVersion: v1 kind: ConfigMap metadata: name: application-intelligence namespace: kube-system data: prometheusUrl: "http://cn-hangzhou-intranet.arms.aliyuncs.com:9443/api/v1/prometheus/da9d7dece901db4c9fc7f5b9c40****/158120454317****/cc6df477a982145d986e3f79c985a****/cn-hangzhou" token: "eyJhxxxxx"Replace the
prometheusUrlvalue with the internal endpoint you recorded. If access tokens are enabled, replace thetokenvalue with your access token.To display Prometheus metrics on the AHPA dashboard, add the following keys to the ConfigMap: -
prometheus_writer_url— the internal remote write endpoint of the Prometheus instance -prometheus_writer_ak— the AccessKey ID of the Alibaba Cloud account -prometheus_writer_sk— the AccessKey secret of the Alibaba Cloud account -
Apply the ConfigMap:
kubectl apply -f application-intelligence.yaml
Enable Prometheus monitoring for AHPA
This step registers AHPA as a monitored component so that Prometheus starts collecting AHPA metrics.
-
Log on to the ARMS console. In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
-
In the top navigation bar, click Integrate Other Components to go to the Integration Center page. Search for AHPA and click the AHPA card.
-
On the ACK AHPA page, choose Select a Kubernetes cluster > Select Cluster. Select your ACS cluster from the drop-down list.
-
In the Configuration Information section, set the following parameters and click OK:
Parameter Description Exporter Name A name that is unique among the exporters collecting monitoring data from AHPA metrics collection interval (seconds) The interval at which the service collects monitoring data -
After the Integration Status Check step completes, click Integration Management and confirm that Managed Service for Prometheus is enabled for AHPA.
Step 3: Deploy a test service
Deploy a test setup that lets you compare AHPA predictions against standard HPA scaling behavior. The setup includes:
-
fib-deployment— the workload being scaled -
fib-svc— a Service that exposesfib-deployment -
fib-loader— a load generator that simulates traffic fluctuation -
fib-hpa— a standard HPA that scalesfib-deploymentat 50% CPU utilization, used as a baseline
-
Create a file named
demo.yamlwith the following content:apiVersion: apps/v1 kind: Deployment metadata: name: fib-deployment namespace: default spec: replicas: 1 selector: matchLabels: app: fib-deployment strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: app: fib-deployment spec: containers: - image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-server:20200820-171837 imagePullPolicy: IfNotPresent name: user-container ports: - containerPort: 8080 name: user-port protocol: TCP resources: limits: cpu: "1" memory: 2000Mi requests: cpu: "1" memory: 2000Mi --- apiVersion: v1 kind: Service metadata: name: fib-svc namespace: default spec: ports: - name: http port: 80 protocol: TCP targetPort: 8080 selector: app: fib-deployment sessionAffinity: None type: ClusterIP --- apiVersion: apps/v1 kind: Deployment metadata: name: fib-loader namespace: default spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: fib-loader strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: app: fib-loader spec: containers: - args: - -c - | /ko-app/fib-loader --service-url="http://fib-svc.${NAMESPACE}?size=35&interval=0" --save-path=/tmp/fib-loader-chart.html command: - sh env: - name: NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-loader:20201126-110434 imagePullPolicy: IfNotPresent name: loader ports: - containerPort: 8090 name: chart protocol: TCP resources: limits: cpu: "8" memory: 16000Mi requests: cpu: "2" memory: 4000Mi --- apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: fib-hpa namespace: default spec: maxReplicas: 50 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: fib-deployment targetCPUUtilizationPercentage: 50 -
Deploy the test service:
kubectl apply -f demo.yamlVerify that all pods are running before proceeding:
kubectl get podsExpected output (all pods in
Runningstate):NAME READY STATUS RESTARTS AGE fib-deployment-xxx 1/1 Running 0 1m fib-loader-xxx 1/1 Running 0 1m
Step 4: Create an AHPA policy
An AHPA policy is a custom resource of kind AdvancedHorizontalPodAutoscaler. The example below starts in observer mode — AHPA generates predictions but does not scale. Use this mode to validate predictions before enabling automatic scaling.
-
Create a file named
ahpa-demo.yamlwith the following content:apiVersion: autoscaling.alibabacloud.com/v1beta1 kind: AdvancedHorizontalPodAutoscaler metadata: name: ahpa-demo spec: scaleTargetRef: # Required. The Deployment to manage. apiVersion: apps/v1 kind: Deployment name: fib-deployment metrics: # Required. Metrics used to drive scaling decisions. - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 40 # Required. Scale when average CPU utilization exceeds 40%. maxReplicas: 100 # Required. Hard upper bound on pod count. minReplicas: 2 # Required. Hard lower bound on pod count. scaleStrategy: observer # Optional. Default: observer. # auto: AHPA scales automatically. # observer: observe predictions without scaling. # scalingUpOnly: scale out only, never scale in. # proactive: proactive prediction only. # reactive: reactive prediction only. stabilizationWindowSeconds: 300 # Optional. Default: 300 seconds. Cooldown for scale-in. prediction: quantile: 95 # Required. Default: 0.99. Range: 0–1, two decimal places. # Higher value = more conservative (fewer false scale-outs). # Recommended range: 0.90–0.99. scaleUpForward: 180 # Required. Pod cold-start duration in seconds # (time from pod creation to Ready state). instanceBounds: # Optional. Scheduled replica limits. - startTime: "2021-12-16 00:00:00" endTime: "2031-12-16 00:00:00" bounds: - cron: "* 0-8 ? * MON-FRI" # Mon–Fri, 00:00–08:59 maxReplicas: 15 minReplicas: 4 - cron: "* 9-15 ? * MON-FRI" # Mon–Fri, 09:00–15:59 maxReplicas: 15 minReplicas: 10 - cron: "* 16-23 ? * MON-FRI" # Mon–Fri, 16:00–23:59 maxReplicas: 20 minReplicas: 15The following table describes the key parameters.
Parameter Required Default Description scaleTargetRefYes — The Deployment to manage metricsYes — The metrics that drive scaling. Supported: CPU, GPU, memory, QPS (queries per second), and RT (response time) averageUtilizationYes — The scaling threshold. averageUtilization: 40means AHPA scales when average CPU utilization exceeds 40%maxReplicasYes — Maximum number of pods minReplicasYes — Minimum number of pods scaleStrategyNo observerScaling mode stabilizationWindowSecondsNo 300Scale-in cooldown period, in seconds prediction.quantileYes 0.99Probability threshold for the predicted metric not exceeding the scaling threshold. Range: 0–1. Recommended: 0.90–0.99 prediction.scaleUpForwardYes — Pod cold-start duration: time from pod creation to the Ready state, in seconds instanceBoundsNo — Time windows with scheduled maxReplicasandminReplicasoverridesinstanceBounds.bounds.cronNo — Cron schedule for a replica limit window Cron expressions in
instanceBounds.bounds.cronuse a five-field format (Quartz-compatible) separated by spaces. The fields are:Field Required Valid values Special characters Minutes Yes 0–59 */,-Hours Yes 0–23 */,-Day of month Yes 1–31 */,-?Month Yes 1–12 or JAN–DEC */,-Day of week No 0–6 or SUN–SAT (default: *)*/,-?Special character meanings:
-
*— any value -
/— increment (e.g.,*/5means every 5 units) -
,— list separator -
-— range -
?— placeholder (use in Day of month or Day of week when the other field is specified)
The Month and Day of week fields are case-insensitive. For example,
SUN,Sun, andsunare all valid.For more information, see Cron expressions.
-
-
Apply the AHPA policy:
kubectl apply -f ahpa-demo.yaml
Step 5: View prediction results
AHPA builds predictions from the last seven days of historical data. Wait at least seven days after applying the policy before evaluating prediction accuracy. For an existing application, select the corresponding Deployment in the AHPA dashboard.
Open the AHPA dashboard
On the Integration Management page, click the name of your cluster on the Container Service tab. In the Addon Type section, select ACK AHPA. Click the Dashboards tab and then click ahpa-dashboard.
Read the dashboard charts
The dashboard shows three charts:
CPU utilization & actual POD Displays the average CPU utilization and the current pod count for the Deployment. Use this chart to confirm that fib-loader is generating the expected CPU load.
Actual and predicted CPU usage Compares actual CPU usage (green line, driven by HPA) with AHPA's predicted CPU usage (yellow line). When the yellow line runs higher than the green line, AHPA has reserved enough headroom. When the yellow line rises earlier than the green line, AHPA has prepared resources in advance of the actual demand increase.
Pod trends Shows three pod count series:
| Series | Description |
|---|---|
| Current number of pods | Pods currently running |
| Recommended number of pods | The pod count AHPA recommends, generated based on the proactive prediction, the reactive prediction, and the maximum and minimum numbers of pods within the current time period |
| Proactively predicted number of pods | The pod count AHPA predicts based on historical patterns |
Interpret the example results
In this example, scaleStrategy is set to observer, so AHPA generates predictions without scaling. The following figure compares AHPA predictions with the HPA baseline:
Key observations from the figure:
-
Actual and predicted CPU usage: The predicted CPU usage (yellow) is consistently higher than the actual usage (green), confirming that AHPA has sized capacity conservatively. The yellow line also rises ahead of the green line, confirming that resources are prepared before demand arrives.
-
Pod trends: The predicted pod count (yellow) is lower than the HPA-provisioned count (green), and the yellow curve is smoother. This means AHPA's recommendations would produce fewer abrupt scaling events, improving workload stability.
Key AHPA metrics
| Metric | Description |
|---|---|
ahpa_proactive_pods |
Proactively predicted pod count |
ahpa_reactive_pods |
Reactively predicted pod count |
ahpa_requested_pods |
Recommended pod count |
ahpa_max_pods |
Maximum pod count |
ahpa_min_pods |
Minimum pod count |
ahpa_target_metric |
Scaling threshold |
Enable automatic scaling
After confirming that the predictions match expectations, set scaleStrategy to auto in ahpa-demo.yaml and reapply:
kubectl apply -f ahpa-demo.yaml
AHPA then automatically scales fib-deployment based on its predictions.
What's next
-
AHPA overview — learn about the prediction algorithms and policy design
-
Cron expressions — reference for
instanceBounds.bounds.cronsyntax