AHPA overview - Container Service for Kubernetes - Alibaba Cloud Documentation Center

Resource demand is difficult to predict in cloud native scenarios. Horizontal Pod Autoscaler (HPA) provided by Kubernetes scales resources with a scaling delay and the configuration is complex. To resolve the preceding issues, ACK released Advanced Horizontal Pod Autoscaler (AHPA), which is powered by time series intelligence from DAMO Academy. AHPA can automatically learn the pattern of workload fluctuations and predict resource demand based on historical metric data to help you implement predictive scaling. This topic describes the business architecture, advantages, and scenarios of AHPA.

Background information

The following traditional methods are used to manage the pods of an application: manually specify the number of pods, use HPA, and use CronHPA. The following table describes the disadvantages of the preceding methods.


Method	Disadvantage
Manually specify the number of pods	Resources are wasted during off-peak hours. Idle resources are still billed.
HPA	Scaling activities are performed after a scaling delay. Scale-out activities are triggered only if the resource usage exceeds the threshold and scale-in activities are triggered only if the resource usage drops below the threshold.
CronHPA	You need to specify the number of pods required during each time period. If you specify an excessive number of pods, resources are wasted. If you do not specify sufficient pods, the resource demand cannot be met. You must modify the scaling policy to adapt to the fluctuating workloads.

ACK clusters provide the AHPA component that supports predictive scaling. You can use AHPA to increase resource utilization and improve the efficiency of resource management. AHPA can analyze historical data and predict the number of pods that are required per minute within the next 24 hours. If you use CronHPA, you must manually create 1,440 (24 hours × 60 minutes) schedules instead. The following figure shows the difference between traditional horizontal pod scaling and predictive horizontal pod scaling.

Traditional horizontal pod scaling: Scale-out activities are triggered after the amount of workloads increases. The system cannot provision pods at the earliest opportunity to handle the fluctuating workloads due to the scaling delay.
Predictive horizontal pod scaling: AHPA learns the pattern of workload fluctuations based on the historical values of specific metrics and the amount of time that a pod spent before the pod entered the Ready state. This way, AHPA can provision pods that are ready to be scheduled before a traffic spike occurs. This ensures that resources are allocated at the earliest opportunity.

Business architecture

Various metrics: supports metrics such as CPU, GPU, Memory, QPS, RT, and external metrics.
Stability: uses proactive prediction, passive prediction, and service degradation to guarantee sufficient resources for applications.
- Proactive prediction: predicts the trend of workload fluctuations based on historical metric data. Proactive prediction is suitable for applications whose workloads periodically fluctuate.
- Passive prediction: predicts workload fluctuations in real time. AHPA can predict workload fluctuations and deploy resources in real time.
Service degradation: allows you to specify the maximum and minimum numbers of pods within one or more time periods.
Multiple scaling methods: AHPA can use Knative, HPA, and Deployments to perform scaling.
- Knative: AHPA can help resolve the cold start issue in resource scaling based on concurrency, QPS, or RT in serverless scenarios.
- HPA: AHPA can simplify the configuration of HPA scaling policies and help beginners handle the scaling delay issue.
- Deployment: AHPA can directly use Deployments to perform auto scaling.

Advantages

High performance: AHPA can predict workload fluctuations within milliseconds and scale resources within seconds.
High accuracy: AHPA can learn complex patterns of workload fluctuations with an accuracy higher than 95% based on proactive prediction and passive prediction.
High stability: AHPA allows you to specify the maximum and minimum numbers of pods required within time periods that are accurate to minutes.

Scenarios

Applications whose workloads periodically fluctuate, such as live streaming, online education, and gaming applications.
Scenarios in which a fixed number of pods are deployed and auto scaling is also used to handle workload fluctuations.
System recommendations on the number of pods to be provisioned are required. AHPA provides a Kubernetes API to allow you to obtain prediction results. You can integrate the API into your business systems.

References

For more information about how to deploy and use AHPA, see Deploy AHPA.
For more information about how to use AHPA to perform predictive scaling based on GPU metrics, see Use AHPA to perform predictive scaling based on GPU metrics.
For more information about how to use AHPA to enable predictive scaling in Knative, see Use AHPA to enable predictive scaling in Knative.