Open the door of Kubernetes elastic prediction


Users have higher and higher expectations of cloud resilience, which mainly come from two aspects. The first is the rise of the cloud native concept. From the VM era to the container era, the use mode of cloud is changing. It is the rise of new business models. These new business models are built on the basis of cloud at the beginning of design, and naturally have the demand for flexibility.

With cloud users no longer need to build infrastructure from physical machines and computer rooms, cloud provides users with very flexible infrastructure. The biggest advantage of cloud is that it can provide users with flexible resource supply, especially in the era of cloud native, users' demand for flexibility is increasingly strong. Elastic demand intensity is still at the minute level of manual operation in the VM era, and has reached the second level in the container era. Facing different business scenarios, users' expectations and requirements for the cloud are also changing:

• Cyclical business scenarios: New businesses such as live broadcast, online education and games have a big common point that they have very obvious periodicity, which urges customers to think about resilient business architecture. In addition, the concept of cloud native is very natural to think of popping up a batch of services on demand and releasing them when they are used up.

• The arrival of Serverless: The core concept of Serverless is to use it on demand and to achieve automatic flexibility. Users do not need capacity planning. However, when you really start using Serverless, you will find some problems, such as elastic lag and cold start. This is unacceptable for response delay sensitive services.

So in the face of the above scenario, can the existing elastic solutions in Kubernetes be solved?

Problems faced by traditional flexible solutions

Generally, there are three ways to manage the number of application instances in Kubernetes: fixed number of instances, HPA and CronHPA. The number of fixed instances is the most used. The biggest problem with the number of fixed instances is the obvious waste of resources during the business trough. In order to solve the problem of resource waste, HPA was introduced, but the elastic trigger of HPA is delayed, which leads to the lag of resource supply. Failure to supply resources in time may lead to the decline of business stability. CronHPA can scale at a fixed time, which seems to solve the problem of elastic lag. But how fine the specific timing granularity is and when the traffic volume changes, do you need to manually adjust the timing elastic policy frequently? If you do this, it will bring very heavy operation and maintenance complexity, and it is also easy to make mistakes.

AHPA elastic prediction

The main starting point of AHPA elastic prediction is to make "timing planning" based on the detected period, and achieve the purpose of early expansion through planning. However, since there will be omissions in planning, it is necessary to have the ability to adjust the number of planned instances in real time. So this scheme has two flexible strategies: active prediction and passive prediction. Active prediction is based on the robustPeriod algorithm [1] of the Dharma Institute to identify the length of the cycle, and then use the robustSTL algorithm [2] to pick up the periodic trend, and actively predict the number of instances applied in the next cycle; Passive prediction sets the number of instances based on the real-time data of the application, which can well cope with sudden traffic. In addition, AHPA also adds a bottomless protection policy, and users can set the upper and lower bounds of the number of instances. The number of instances that finally take effect in the AHPA algorithm is the maximum of active prediction, passive prediction and bottom-up strategy.


Elasticity is first carried out under the condition of stable business. The core purpose of elasticity is not only to help users save costs, but also to enhance the overall stability of the business, free operation and maintenance capabilities and build core competitiveness. Basic principles of AHPA architecture design:

• Stability: elastic scaling under the condition of ensuring stable user service

• Free of operation and maintenance: do not add additional operation and maintenance burden to users, including: do not add new controllers on the user side, Autoscaler configuration semantics are clearer than HPA

• Serverless oriented: It provides user-application-centric and application-oriented Pod dimension design, regardless of the utilization of K8s nodes. It can be assumed that users are using ECI Pods. Consider the elastic best practices in the Serverless scenario (no node) to enhance the ability of ASK to run LongRun

The architecture is as follows:

• Rich data indicators: including CPU, Memory, QPS, RT and external indicators

• Stability guarantee: AHPA's elastic logic is based on the strategy of active warm-up and passive bottoming out, and combined with degradation protection to ensure the stability of resources.

• Active prediction: predict the trend results of the future for a period of time according to the history, which is applicable to periodic applications.

• Passive prediction: real-time prediction. For sudden traffic scenarios, resources are prepared in real time through passive prediction.

• Degraded protection: supports the configuration of multiple instances with the maximum and minimum time range.

• Multiple scaling methods: AHPA supports scaling methods including Knative, HPA and deployment:

• Knative: solve the problem of flexible cold start based on concurrency/QPS/RT in the Serverless application scenario

• HPA: Simplify HPA flexibility policy configuration, reduce the threshold of user flexibility, and solve the problem of cold start when using HPA

• Deployment: directly use deployment to automatically expand and shrink capacity

Adapt to the scene

AHPA adaptation scenarios include:

• There are obvious periodic scenarios

• Fixed number of instances+flexible package scenario

• Recommended instance number configuration scenario

Prediction effect

After enabling AHPA elasticity, we provide a visual page to view the AHPA effect. The following is an example of prediction based on CPU indicators (compared with HPA):


• Predict CPU Oberserver: Blue indicates the actual CPU usage of HPA, and green indicates the predicted CPU usage. The green curve is larger than the blue, indicating that the capacity given by the prediction is sufficient.

• Predict POD Oberserver: blue indicates the actual number of expanded Pods using HPA, green indicates the predicted number of expanded Pods, and the green curve is smaller than blue, indicating that the number of Pods through the prediction elasticity is lower.

• Periodicity: According to the historical data of 7 days, the application is detected to be periodic through the prediction algorithm.

Conclusion: The prediction results show that the elastic prediction trend is in line with expectations.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us