Fine-tune the scaling behavior of HPA based on business scenarios - Container Service for Kubernetes

After you enable horizontal pod autoscaling (HPA) for your cluster, if the default scaling behavior cannot meet your requirement, you can modify the behavior setting to fine-tune the scale-in (scaleDown) and scale-out (scaleUp) behavior. For example, you can quickly scale out when the business traffic spikes, scale in or out to handle fluctuating workloads, or forbid scale-in for status-sensitive applications.

Precautions

The behavior setting reaches the Stable state in Kubernetes 1.23. Make sure that HPA is enabled for your cluster that runs Kubernetes 1.24 or later. For more information, see Implement horizontal pod autoscaling. For more information about how to update a cluster, see Manually update ACK clusters.
If you use kubectl to manually deploy the HPA controller, make sure that the version of the HPA API is v2beta2 or later.

How to configure the `behavior` setting

The behavior setting in the HPA configuration is optional. You can configure this setting to fine-tune the scaling behavior. The scaleDown and scaleUp fields in the behavior setting are used to fine-tune the scaling behavior in order to avoid resource exhaustion or resource shortage, improve resource utilization, and optimize application performance.

The following code block is an example of the HPA configuration that contains the behavior setting. The behavior setting is set to the default in this example. You can modify the values of the existing fields on demand. If you do not specify the value of a field, the default setting is used.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: sample-hpa
spec:
  minReplicas: 1
  maxReplicas: 100
  metrics:
  - pods:
      metric:
        name: http_requests_per_second
      target:
        averageValue: 50
        type: AverageValue
    type: Pods
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sample-app
  behavior:  # The default behavior setting. 
    scaleDown: # Modify this field to fine-tune the scale-in behavior. 
      stabilizationWindowSeconds: 300
      policies: 
      - type: Pods
        value: 10 
        periodSeconds: 15
    scaleUp:  # Modify this field to fine-tune the scale-out behavior. 
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

In the sample YAML file, the scaleDown field defines a stabilization window (stabilizationWindowSeconds) of 300 seconds and defines the scaling policy in the policies section. The stabilization window indicates the HPA controller needs to consider recommendations about the number of replicated pods to be scaled in within the previous 5 minutes, selects the greatest value, and scales in at most 10 replicated pods every 15 seconds. This avoids flapping in the number of pods.

The scaleUp field defines a stabilization window of 0 seconds, indicating that the HPA controller performs scale-out operations immediately. Two scale-out policies are defined and selected in the following way: The HPA controller scales out at most 100% of the current number of replicated pods every 15 seconds, or scales out at most four replicated pods every 15 seconds. During pod scale-out, the HPA controller selects the policy that adds the more pods.

The following table describes the fields supported by scaleDown and scaleUp.

Field	Description
`stabilizationWindowSeconds`	The stabilization window used to avoid flapping in the number of replicated pods when the value of the scaling metric consistently fluctuates. The HPA controller needs to consider all recommendations about the number of replicated pods within the stabilization window. If you configure `scaleDown`, the HPA controller selects the greatest recommended number. If you configure `scaleUp`, the HPA controller selects the smallest recommended number. Unit: seconds.
`policies`	Define one or more scale-in policies. Each policy consists of `type` (such as `Percent` or `Pods`) and `value` to describe how pods are removed when the scale-in conditions are met. For example, the HPA controller removes the specified percentage of pods or a fixed number of pods.
`selectPolicy`	Specify the policy to use when multiple scale-in policies are available. Valid values: `Min`, `Max`, and `Disabled`. If you configure `scaleDown`, `Max` indicates that the HPA controller selects the scale-in policy that removes the most replicated pods. If you configure `scaleUp`, it indicates that the HPA controller selects the policy that adds the most replicated pods.

The following sections describe how to fine-tune the behavior setting based on different scenarios.

Quick scale-out

During events such as flash sales and product releases, use the following configuration to handle sudden workload spikes:

behavior:
  scaleUp:
    policies:
    - type: Percent
      value: 900
      periodSeconds: 15

This configuration attempts the HPA controller to add 900% of the current number of pods within 15 seconds, achieving tenfold replica expansion while respecting maxReplicas constraints. If the initial number of pods is 1, and the scale-out conditions are consistently met, the changes in the number of pods are as follows:

1 -> 10 -> 100 -> 1000

Quick scale-out and slow scale-in

To handle unexpected traffic spikes after scale-in activities in order to ensure the stability and fast response of the application, you can use the following configuration to quickly scale out and slowly scale in.

behavior:
  scaleUp:
    policies:
    - type: Percent
      value: 900
      periodSeconds: 60
  scaleDown:
    policies:
    - type: Pods
      value: 1
      periodSeconds: 600 # Remove a pod every 10 minutes.

During a scale-out activity, the HPA controller adds 900% of the current number of pods within 60 seconds. If the metric value consistently remains below the threshold, the HPA controller removes at most one pod within 600 seconds (10 minutes) during a scale-in activity.

Forbid scale-in

Scale-in activities can interrupt key tasks or status-sensitive applications or cause workload migration. You can reference the following configuration to forbid scale-in in order to ensure the high availability and stability of applications.

behavior:
  scaleDown:
    selectPolicy: Disabled

Extend or reduce the scale-in time window

In scenarios where the resources or budget is limited, quick scale-out activities may exhaust all resources or greatly increase the cost. You can use stabilizationWindowSeconds to control the frequency of scale-out caused by fluctuating workloads.

behavior:
  scaleDown:
    stabilizationWindowSeconds: 600
    policies:
    - type: Pods
      value: 5
      periodSeconds: 600

In the preceding configuration, even if the metric value drops below the scale-in threshold, the HPA controller does not perform a scale-in activity immediately. It waits a time window of 600 seconds (10 minutes) and then checks whether the metric value is still lower than the threshold. Within a time window, the HPA controller removes at most five pods. To scale in immediately when the scale-in condition is met, set stabilizationWindowSeconds to 0.

Use multiple scale-out policies

If your business traffic growth pattern is unpredictable, you can reference the following configuration to define multiple scale-out policies to handle traffic fluctuations.

  behavior:
    scaleUp:
      policies:
      - type: Pods # Scale out based on the number of pods. 
        value: 4
        periodSeconds: 60
      - type: Percent  # Scale out based on the specified percentage. 
        value: 50
        periodSeconds: 60
      selectPolicy: Max

In the preceding configuration, scaleUp defines two policies.

One policy adds at most four pods every minute.
The other policy adds at most 50% of the current number of pods every minute.

selectPolicy is set to Max, which indicates that the HPA controller selects the policy that adds the most pods.

References

We recommend that you use HPA together with node scaling to ensure that all pods have sufficient compute resources. For more information, see Overview of node scaling.
If you have any questions when using HPA, see Workload scaling FAQ.