All Products
Search
Document Center

Container Compute Service:AHPA overview

Last Updated:Mar 25, 2026

Kubernetes-native scaling is reactive by design: resources scale only after a metric threshold is crossed, which means pods spin up after the traffic spike has already arrived. ACS supports predictive scaling based on Advanced Horizontal Pod Autoscaler (AHPA), which solves this by learning from historical metric data to predict how many pods a workload needs — per minute — for the next 24 hours, and provisioning them before demand peaks.

Why existing scaling approaches fall short

MethodLimitation
Manual pod countIdle pods waste resources during off-peak hours
Horizontal Pod Autoscaler (HPA)Reacts only after a metric threshold is crossed; scale-out is triggered only when resource usage exceeds the threshold, and scale-in only when it drops below — so capacity always lags demand
CronHPARequires a separate schedule for every time slot — up to 1,440 entries to cover 24 hours at per-minute granularity — and manual updates whenever traffic patterns change

AHPA addresses these limitations through predictive scaling: it provisions pods ahead of demand, eliminating the gap between a traffic spike and available capacity.

The diagram below contrasts the two approaches. With traditional HPA, pods spin up after a workload spike. With AHPA, pods reach the Ready state before the spike.

image

How it works

AHPA combines three mechanisms to guarantee sufficient resources for applications: proactive prediction, passive prediction, and service degradation.

image

The diagram above shows how the AHPA controller coordinates with HPA and Deployments to schedule pods ahead of demand.

MechanismBehavior
Proactive predictionAnalyzes historical metric data to forecast workload trends and pre-schedules pods. Best suited for workloads with periodic or predictable fluctuation patterns
Passive predictionDetects and responds to workload fluctuations in real time, deploying resources as fluctuations occur
Service degradationAllows you to specify the maximum and minimum numbers of pods within one or more time periods, giving operators a guaranteed floor and ceiling regardless of prediction results

Supported metrics: CPU, GPU, memory, queries per second (QPS), response time (RT), and external metrics.

Scaling execution: AHPA works through either HPA — simplifying its configuration and compensating for the built-in scaling delay — or directly through Deployments.

    Performance

    MetricValue
    Workload fluctuation detectionWithin milliseconds
    Resource provisioningWithin seconds
    Prediction accuracyGreater than 95%, combining proactive and passive prediction
    Scheduling granularityPod counts specified down to the minute

    Use cases

    • Periodic workloads: Applications with predictable traffic cycles — live streaming, online education, and gaming — benefit most from proactive prediction, which pre-warms capacity before each peak.

    • Fixed baseline with burst traffic: If your deployment already runs a fixed pod count for steady-state traffic, AHPA handles unexpected surges without requiring you to manage multiple scaling policies.

    • Custom autoscaling integrations: If your platform team needs to integrate scaling recommendations into internal tooling or CI/CD pipelines, AHPA exposes a standard Kubernetes API so you can retrieve prediction results and act on them programmatically.

    Next steps