Background information of auto scaling and related components - Container Service for Kubernetes

When traffic spikes unexpectedly or follows a known schedule, manually adjusting pod resources is slow and error-prone. ACK Serverless provides four auto scaling solutions that dynamically adjust pod replicas or resource allocations to match workload demand — so your applications stay responsive without over-provisioning. This service is suitable for online workloads, large-scale computing and training tasks, GPU-accelerated deep learning tasks, and model inference and model training tasks that use shared GPUs.

Before choosing a solution, it helps to understand two dimensions of scaling:

Horizontal vs. vertical: Horizontal scaling adjusts the number of pod replicas. Vertical scaling adjusts the CPU and memory allocated to each pod without changing the replica count.
Reactive vs. predictive: Reactive scaling responds to current metrics. Predictive scaling uses historical data to scale ahead of anticipated demand.

The following table summarizes all four solutions.

Solution	How it works	Scaling metrics	Use case	Supported resources	Documentation
HPA (Horizontal Pod Autoscaler)	The standard Kubernetes solution for horizontal scaling. Scales pod replicas out when demand spikes and in when demand drops.	CPU and memory usage; custom metrics	Services with unpredictable traffic spikes — e-commerce, online education, and financial services	Deployments and StatefulSets (any resource that implements the `scale` interface)	Horizontal pod autoscaling
CronHPA (Cron Horizontal Pod Autoscaler)	Scales pods on a crontab-like schedule. Supports time zones, specific execution dates, and exclusion dates such as holidays. Works alongside HPA.	Schedule-based	Applications with predictable peak periods or applications that need to perform tasks at specific times	Deployments and StatefulSets	Use CronHPA for scheduled horizontal scaling
VPA (Vertical Pod Autoscaler)	Monitors pod resource consumption and right-sizes CPU and memory requests and limits automatically — without changing the replica count.	VPA recommends and automatically adjusts the requests and limits for CPU and memory resources in the containers of the pod	Stateful or monolithic applications that need stable resource supply; also useful when recovering pods from anomalies	Deployments, DaemonSets, and StatefulSets	Vertical pod autoscaling
AHPA (Advanced Horizontal Pod Autoscaler)	Extends HPA with predictive scaling. Analyzes historical metric patterns to forecast demand and scales pods before traffic arrives.	CPU, memory, and GPU usage; queries per second (QPS); response time (RT); custom metrics	Applications with periodic traffic patterns — live streaming, online education, and gaming	Deployments and Knative Services	AHPA overview