Three Misunderstandings of Kubernetes HPA and Guidelines for Avoiding Pits

One of the advantages brought by cloud computing is elasticity. Kubernetes provides horizontal elastic capacity expansion (HPA) in the cloud native scenario, allowing applications to expand/shrink with real-time indicators. However, the actual working situation of HPA may be different from what we intuitively expected. There are some cognitive errors. This paper summarizes three cognitive errors that EDAS users often encounter when using HPA, as follows:

Myth 1: The HPA has a capacity expansion dead zone

Phenomenon: When Request=Limit and the expected utilization rate exceeds 90%, the capacity cannot be expanded normally.

Cause analysis: There is tolerance in HPA (10% by default). When the index change is less than the tolerance, HPA will ignore this expansion/contraction action. If the expected utilization rate is 90%, the actual utilization rate is between 81% and 99%, which will be ignored by HPA.

Pit avoidance guide: when Request=Limit, avoid setting too high expected utilization rate to avoid capacity expansion dead zone; Second, passive capacity expansion has a certain delay time, leaving more buffer margin to cope with sudden increase in traffic.

Myth 2: Misunderstanding the calculation method of utilization, HPA capacity expansion is inconsistent with the expected usage

Phenomenon: When Limit>Request, 50% utilization is configured, and the capacity is expanded before the usage reaches 50% of the limit.

Cause analysis: HPA calculates the utilization rate based on Request. When Limit>Request, the actual utilization rate can exceed 100%.

Pit avoidance guide: For more important applications, Request=Limit should be set to ensure the exclusive use of resources. For applications that can tolerate resource sharing, the corresponding expected utilization rate should not be set too high. When cluster resources are tight, the Pod that uses excessive resources is likely to be killed, resulting in service interruption.

Myth 3: Elastic behavior always lags behind, and expansion and contraction behavior does not conform to psychological expectations

Phenomenon: When the index suddenly increases, the HPA will not be expanded immediately, and the expansion may be carried out in several times, and the number of instances when the index finally stabilizes is also different from the expected number.

Cause analysis: The design architecture of HPA determines that the expansion/contraction of HPA always lags behind, and the expansion/contraction receives the joint action of elastic behavior and tolerance. The elastic behavior limits the expansion/contraction rate and will not expand/shrink to the expected number of instances in one breath. However, the tolerance will ignore small changes in indicators, which may lead to the difference between the final number of instances calculated at the beginning and the number of instances calculated at the beginning under the scenario of multiple capacity expansion.

Pit avoidance guide: Read below to understand the working principle of HPA and configure reasonable elastic behavior.

Working mechanism of HPA

Before breaking the cognitive misunderstanding, we need to sort out the working mechanism of HPA

As shown in the figure, the flexible function of HPA controller is mainly divided into four steps:

1. Monitor HPA resources. Once HPA resources are generated or HPA configurations are changed, HPA controllers can sense and adjust them in time.

2. Get the corresponding indicator data from the Metrics API. The Metrics Server here can be divided into three categories

A. Kubernetes MetricServer: provide container level CPU/memory usage

b. Custom MetricServer: provides indicator data from Kubernetes cluster custom resources

c. External MetricServer: provides indicator data from outside the Kubernetes cluster

3. Each indicator item calculates the expected number of instances separately, and finally takes the maximum of all expected instances as the expected number of instances of the current workload

4. Adjust the corresponding workload

Steps 2-4 are executed every 15 seconds. If you need to change the time cycle, you can adjust the KCM configuration parameter -- horizontal pod autoscaler sync period.

data source

As shown in the figure above, HPA currently provides five indicator sources and three indicator services (MetricsServer). A brief introduction is as follows:

1. Resource: Provide CPU/memory usage at Pod level

2. ContainerResource: provides CPU/memory usage at the container level

3. Object: provide relevant indicators of any resource in the Kubernetes cluster

4. Pods: provide indicators related to pods in the Kubernetes cluster

5. External: provide indicator data outside the Kubernetes cluster

It is worth mentioning that in the self built Kubernetes scenario, these three MetricsServers need to be installed additionally, and they all run outside of KCM. The following table lists the deployment of several Kubernetes cluster MetricsServers.

Index calculation method

HPA provides three types of expectations

1. Total amount (Value)

2. Average Value=total amount/current instances

3. Utilization=Average/Request

It is worth mentioning that the utilization rate is calculated based on Request, so HPA may not work normally without setting Request.

The following figure describes the expectation types supported by five indicator sources. It is not difficult to see that all indicator sources support average quantities.

The calculation rules for the expected number of instances of a single indicator are as follows:

The concept of tolerance is introduced, that is, the jitter in a small range near the expected value can be tolerated and ignored. The source of this parameter is that the indicator value is a value that fluctuates all the time. If minor changes are not ignored, it is likely to cause continuous expansion and contraction of applications, thus affecting the stability of the entire system.

As shown in the figure below, when the indicator value falls within the pink area (tolerance range), the expected number of instances is equal to the current number of instances. The upper and lower limits of the pink area (tolerance range) are 0.9 times the expected value and 1.1 times the expected value respectively.

For multiple indicator rules configured, the final expected instance count rule is as follows:

Briefly summarize the calculation method in one sentence: when a single indicator fluctuates, it is ignored. The maximum value is taken between multiple indicators, and the final number of instances will fall between the lower limit and the upper limit.

Dilation behavior

In some cases, the indicator data will have a frequent and significant jitter. As shown in the figure below, there are some indicators jittering or intermittent traffic decline that cause the utilization rate to decline. The variation range of the indicators has exceeded the tolerance range. At this point, from the perspective of application stability, we do not expect to apply shrinkage. To solve this problem, HPA introduced a configuration to control the expansion and contraction, that is, expansion and contraction behavior. It was introduced in HPA (autoscaling/v2beta2), and the Kubernetes cluster version is required to be>=1.18.

The elastic lines of HPA are divided into capacity expansion behavior and capacity reduction behavior. The behavior consists of the following three parts:

• Stability window: The stability window will refer to the expected number of instances calculated in the past period, and select the extreme value as the final result, so as to ensure that the system is stable in a period of time window. The minimum value is taken for capacity expansion, and the maximum value is taken for capacity reduction.

• Step size strategy: limit the range of instance changes in a period of time. It consists of three parts: step type, step value and time period. It is worth mentioning that the concept of time period is different from the above stable window. The time period here defines how long to backtrack and calculate the change of the number of instances.

• Selection strategy: It is used to select the calculated results of multiple step size strategies. It supports three strategies: maximum, minimum, and close.

Review and summary

So far, we have roughly understood the working mechanism of HPA. Rational use of HPA can effectively improve the utilization rate of resources. In this process, we summarized some precautions, and memorized these points to "effectively avoid pitfalls" when using HPA.

1. The design architecture of HPA leads to that HPA can only be elastically expanded and shrunk by passive response index. In this mode, elastic hysteresis must exist. At present, Alibaba Cloud container service has introduced AHPA with prediction capability, which can effectively reduce elastic hysteresis.

2. The HPA utilization calculation method is based on Request. It is normal for the actual utilization rate/expected utilization rate to exceed 100%. To configure a high expected utilization rate, it is necessary to reasonably plan the cluster resources and review the corresponding risks.

3. The concept of tolerance in HPA can alleviate the problem of system oscillation caused by index fluctuation, but at the same time, the problem of capacity expansion dead zone introduced needs to be avoided by operation and maintenance personnel.

4. The design architecture of HPA allows the extension of various types of indicators, and the corresponding MetricsServer needs to be developed/installed. For example, EDAS provides users with microservice RT and QPS indicators.

5. There is capacity expansion behavior in HPA. Even if the corresponding parameters are not configured, there is a default behavior. The stable window of capacity expansion behavior is 0 by default. If the application is often expanded due to noise data, a short capacity expansion stable window can be set to avoid sharp noise.

6. A single HPA supports the configuration of multiple indicators for flexibility. Do not configure multiple HPAs for a single application, which will affect each other and cause application shock.

In the cloud native scenario, the flexibility is more abundant, and the indicators for flexibility are more capable of business customization. The application PaaS platform (such as enterprise level distributed application service EDAS) can combine the basic technical capabilities of cloud manufacturers in computing, storage, and networking, and make the cost of using the cloud lower. However, there will be some challenges for business applications (such as stateless/configuration code decoupling). From a broader perspective, this is a challenge for application architecture in the cloud native era. However, if the application becomes more and more native, the cloud technology dividend will be closer to us.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us