When traffic spikes unexpectedly or follows a known schedule, manually adjusting pod resources is slow and error-prone. ACK Serverless provides four auto scaling solutions that dynamically adjust pod replicas or resource allocations to match workload demand — so your applications stay responsive without over-provisioning. This service is suitable for online workloads, large-scale computing and training tasks, GPU-accelerated deep learning tasks, and model inference and model training tasks that use shared GPUs.
Before choosing a solution, it helps to understand two dimensions of scaling:
Horizontal vs. vertical: Horizontal scaling adjusts the number of pod replicas. Vertical scaling adjusts the CPU and memory allocated to each pod without changing the replica count.
Reactive vs. predictive: Reactive scaling responds to current metrics. Predictive scaling uses historical data to scale ahead of anticipated demand.
The following table summarizes all four solutions.
| Solution | How it works | Scaling metrics | Use case | Supported resources | Documentation |
|---|---|---|---|---|---|
| HPA (Horizontal Pod Autoscaler) | The standard Kubernetes solution for horizontal scaling. Scales pod replicas out when demand spikes and in when demand drops. | CPU and memory usage; custom metrics | Services with unpredictable traffic spikes — e-commerce, online education, and financial services | Deployments and StatefulSets (any resource that implements the scale interface) | Horizontal pod autoscaling |
| CronHPA (Cron Horizontal Pod Autoscaler) | Scales pods on a crontab-like schedule. Supports time zones, specific execution dates, and exclusion dates such as holidays. Works alongside HPA. | Schedule-based | Applications with predictable peak periods or applications that need to perform tasks at specific times | Deployments and StatefulSets | Use CronHPA for scheduled horizontal scaling |
| VPA (Vertical Pod Autoscaler) | Monitors pod resource consumption and right-sizes CPU and memory requests and limits automatically — without changing the replica count. | VPA recommends and automatically adjusts the requests and limits for CPU and memory resources in the containers of the pod | Stateful or monolithic applications that need stable resource supply; also useful when recovering pods from anomalies | Deployments, DaemonSets, and StatefulSets | Vertical pod autoscaling |
| AHPA (Advanced Horizontal Pod Autoscaler) | Extends HPA with predictive scaling. Analyzes historical metric patterns to forecast demand and scales pods before traffic arrives. | CPU, memory, and GPU usage; queries per second (QPS); response time (RT); custom metrics | Applications with periodic traffic patterns — live streaming, online education, and gaming | Deployments and Knative Services | AHPA overview |