Background information of auto scaling and related components - Container Service for Kubernetes

Auto scaling is a management service that dynamically scales computing resources to meet your business requirements. This service is suitable for online workloads, large-scale computing and training tasks, GPU-accelerated deep learning tasks, and model inference and model training tasks that use shared GPUs. This topic describes auto scaling solutions supported by ACK Serverless clusters.

Solution	Overview	Scaling metrics	Scenarios	Supported resource types	References
HPA	For Kubernetes, Horizontal Pod Autoscaler (HPA) is the most commonly used solution for automatically scaling pods. HPA can rapidly scale out replicated pods to handle heavy stress when the workloads surge and scale in appropriately to save resources when demand subsides.	Resource metrics such as CPU and memory usage. Other custom metrics.	Businesses with large fluctuations in service demands, numerous services, and frequent scaling requirements, such as e-commerce, online education, and financial services.	Objects such as Deployments and StatefulSets that are compatible with the `scale` interface.	Horizontal pod autoscaling
CronHPA	Cron Horizontal Pod Autoscaler (CronHPA) scales pods in the cluster based on crontab-like schedules. It supports the configuration of time zones, execution dates, and exclusion dates (such as holidays), and can work in coordination with the HPA.	Scheduled scaling.	Business traffic has significant peak periods, or applications need to perform tasks at specific times.	Resources such as Deployments and StatefulSets.	Use CronHPA for scheduled horizontal scaling
VPA	Vertical Pod Autoscaler (VPA) monitors pod resource usage, offers flexible CPU and memory allocation recommendations, and automatically adjusts these settings as needed without changing the number of pod replicas.	VPA recommends and automatically adjusts the requests and limits for CPU and memory resources in the containers of the pod.	Stateful applications or monolithic applications that require stable resource supply. Typically, the VPA is used when pods are recovered from anomalies.	Resources such as Deployments, DaemonSets, and StatefulSets.	Vertical pod autoscaling
AHPA	Advanced Horizontal Pod Autoscaler (AHPA) automatically identifies workload fluctuations and predicts resource demand based on historical metric data to help you implement predictive scaling.	Resource metrics such as CPU, memory, and GPU usage. Traffic metrics such as queries per second (QPS) and response time (RT). Other custom metrics.	Applications whose workloads periodically fluctuate, such as live streaming, online education, and gaming applications.	Resources such as Deployments and Knative Services.	AHPA overview