Auto scaling dynamically adjusts computing resources to match workload demand, reducing costs and maintaining service stability. Container Compute Service (ACS) supports five auto scaling components: Horizontal Pod Autoscaler (HPA), Cron Horizontal Pod Autoscaler (CronHPA), Adaptive Horizontal Pod Autoscaling (AHPA), Kubernetes Event-driven Autoscaling (KEDA), and Automatic Vertical Pod Autoscaler (AVPA). Use a single component or combine multiple components based on your workload type.
Auto scaling components
| Component | Description | Workload support | Constraints |
|---|---|---|---|
| HPA | Built-in Kubernetes component for online applications. Scales based on resource metrics such as CPU and memory usage. | Deployments, StatefulSets | — |
| CronHPA | Open source component for workloads with predictable, recurring resource patterns. Compatible with HPA; run both together to handle scheduled and metric-driven scaling. | Deployments, StatefulSets | — |
| AHPA | Open source component that predicts future resource needs from historical usage patterns. Suited for workloads with periodic fluctuations such as livestreaming, online education, and gaming. | Deployments, StatefulSets | Requires at least 7 days of historical data to generate predictions. |
| KEDA | Open source component that scales based on external event sources. Suited for event-driven jobs, offline audio and video transcoding, and stream data processing. | Deployments, StatefulSets | — |
| AVPA | Marketplace component that vertically adjusts pod resource specifications instead of changing the replica count. Suited for stateful workloads or jobs not suited for horizontal scaling, such as gaming services and offline jobs. | All workloads | Supports CPU adjustment only. Suitable for load changes lasting 10 minutes or longer. Not suitable for second-level load bursts or memory adjustment. |
HPA
Horizontal Pod Autoscaler (HPA) is a built-in Kubernetes component that scales the number of pod replicas based on observed resource metrics such as CPU and memory utilization. HPA works best for online applications with variable but unpredictable load.
To get started, see HPA.
CronHPA
CronHPA is an open source component that scales workloads on a defined schedule, making it well-suited for applications with predictable, recurring resource patterns — for example, a web service that sees heavy traffic every weekday morning.
Unlike HPA, which reacts to current metrics, CronHPA acts proactively based on time. The two components are compatible: running CronHPA and HPA together lets you handle both scheduled scaling and unexpected metric-driven spikes.
To get started, see CronHPA.
AHPA
Adaptive Horizontal Pod Autoscaling (AHPA) is an open source component that analyzes historical workload metrics to predict future resource needs and scales pods in advance. This proactive approach avoids the delayed scaling that can occur with standard HPA.
AHPA is suited for workloads with periodic fluctuations — such as livestreaming, online education, and gaming services — where load patterns repeat over days or weeks.
Constraint: AHPA requires at least 7 days of historical data before it can generate predictions. It supports Deployments and StatefulSets.
To get started, see Adaptive Horizontal Pod Autoscaling (AHPA).
KEDA
Kubernetes Event-driven Autoscaling (KEDA) scales workloads based on external event sources rather than resource metrics. KEDA differs from HPA in a fundamental way: HPA is metrics-driven, while KEDA is event-driven.
KEDA is suited for offline audio and video transcoding, event-driven jobs, and stream data processing, where the volume of incoming events is a more meaningful scaling signal than resource utilization.
To get started, see Kubernetes Event-driven Autoscaling (KEDA).
AVPA
Automatic Vertical Pod Autoscaler (AVPA) adjusts the CPU resource specifications of individual pods rather than changing the number of replicas. It is suited for stateful workloads or jobs that cannot scale horizontally, such as gaming services and offline jobs, as well as workloads with improperly sized resource requests.
AVPA also supports application startup acceleration by proactively adjusting resource specifications before load increases.
Constraints:
-
Supports CPU resource adjustment only. Memory adjustment is not supported.
-
Suited for load changes lasting 10 minutes or longer. Not suited for second-level load bursts.
To get started, see vertical pod autoscaling (AVPA).
What's next
-
To combine CronHPA with HPA for scheduled and metric-driven scaling, see CronHPA.
-
To use predictive scaling for periodic workloads, see Adaptive Horizontal Pod Autoscaling (AHPA).
-
To scale based on external event sources, see Kubernetes Event-driven Autoscaling (KEDA).