All Products
Search
Document Center

Container Service for Kubernetes:AHPA overview

Last Updated:Mar 25, 2026

Advanced Horizontal Pod Autoscaler (AHPA) is a predictive autoscaling feature in Container Service for Kubernetes (ACK) that analyzes historical metric data to forecast workload demand and provisions pods before traffic spikes occur — eliminating the reactive delay inherent in standard Kubernetes autoscaling.

Background

Standard Kubernetes autoscaling approaches each carry trade-offs:

MethodLimitation
Fixed pod countIdle pods waste resources during off-peak hours
Horizontal Pod Autoscaler (HPA)Scales only after resource usage crosses a threshold, so pods are not ready until after demand has already increased
CronHPARequires manually defining pod counts for every time slot — 1,440 separate schedules to cover a full day at per-minute granularity — and must be updated whenever traffic patterns change

AHPA addresses these limitations by learning from historical data. Rather than reacting to demand after it arrives, AHPA identifies workload fluctuations based on the historical values of specific metrics and the amount of time that a pod spends before its state changes to Ready. This allows AHPA to pre-provision the right number of pods per minute for the next 24 hours — so capacity is already in place when traffic arrives.

image

How it works

image

Supported metrics

AHPA scales workloads based on CPU, GPU, memory, queries per second (QPS), response time (RT), and external metrics.

Prediction mechanisms

AHPA uses two complementary prediction mechanisms:

  • Proactive prediction — Analyzes historical metric data to detect periodic patterns and predict future demand. Suited for workloads with regular, recurring traffic cycles.

  • Passive prediction — Monitors metrics in real time and adjusts capacity dynamically as conditions change.

Together, these mechanisms achieve workload fluctuation identification accuracy above 95%.

Service degradation

When workloads behave unexpectedly, AHPA lets you define minimum and maximum pod counts for specific time periods — down to the minute — as a safety floor and ceiling.

Scaling targets

AHPA integrates with three scaling mechanisms:

Scaling targetHow AHPA uses it
KnativeResolves cold start latency in serverless scenarios by scaling based on concurrency, QPS, or RT
HPAWraps standard HPA with predictive logic, simplifying policy configuration and removing the reactive delay
DeploymentScales Deployments directly without requiring HPA as an intermediary

Key capabilities

CapabilityDetail
Prediction speedForecasts workload fluctuations within milliseconds
Scale speedProvisions pods within seconds
AccuracyIdentifies workload fluctuations with >95% accuracy
Time granularitySupports pod count bounds accurate to the minute

Use cases

  • Periodic workloads — Applications with predictable traffic cycles, such as live streaming, online education, and gaming platforms.

  • Burst traffic handling — Deployments that combine a fixed baseline pod count with autoscaling to absorb unexpected traffic spikes in regular business scenarios.

  • Capacity planning integration — Systems that need pod provisioning recommendations. AHPA exposes a standard Kubernetes API so prediction results can be retrieved and integrated into your own systems.

What's next