All Products
Search
Document Center

Container Service for Kubernetes:Auto scaling overview

Last Updated:Mar 25, 2026

When traffic spikes unexpectedly or follows a known schedule, manually adjusting pod resources is slow and error-prone. ACK Serverless provides four auto scaling solutions that dynamically adjust pod replicas or resource allocations to match workload demand — so your applications stay responsive without over-provisioning. This service is suitable for online workloads, large-scale computing and training tasks, GPU-accelerated deep learning tasks, and model inference and model training tasks that use shared GPUs.

Before choosing a solution, it helps to understand two dimensions of scaling:

  • Horizontal vs. vertical: Horizontal scaling adjusts the number of pod replicas. Vertical scaling adjusts the CPU and memory allocated to each pod without changing the replica count.

  • Reactive vs. predictive: Reactive scaling responds to current metrics. Predictive scaling uses historical data to scale ahead of anticipated demand.

The following table summarizes all four solutions.

SolutionHow it worksScaling metricsUse caseSupported resourcesDocumentation
HPA (Horizontal Pod Autoscaler)The standard Kubernetes solution for horizontal scaling. Scales pod replicas out when demand spikes and in when demand drops.CPU and memory usage; custom metricsServices with unpredictable traffic spikes — e-commerce, online education, and financial servicesDeployments and StatefulSets (any resource that implements the scale interface)Horizontal pod autoscaling
CronHPA (Cron Horizontal Pod Autoscaler)Scales pods on a crontab-like schedule. Supports time zones, specific execution dates, and exclusion dates such as holidays. Works alongside HPA.Schedule-basedApplications with predictable peak periods or applications that need to perform tasks at specific timesDeployments and StatefulSetsUse CronHPA for scheduled horizontal scaling
VPA (Vertical Pod Autoscaler)Monitors pod resource consumption and right-sizes CPU and memory requests and limits automatically — without changing the replica count.VPA recommends and automatically adjusts the requests and limits for CPU and memory resources in the containers of the podStateful or monolithic applications that need stable resource supply; also useful when recovering pods from anomaliesDeployments, DaemonSets, and StatefulSetsVertical pod autoscaling
AHPA (Advanced Horizontal Pod Autoscaler)Extends HPA with predictive scaling. Analyzes historical metric patterns to forecast demand and scales pods before traffic arrives.CPU, memory, and GPU usage; queries per second (QPS); response time (RT); custom metricsApplications with periodic traffic patterns — live streaming, online education, and gamingDeployments and Knative ServicesAHPA overview