After you enable horizontal pod autoscaling (HPA) for your cluster, if the default scaling behavior cannot meet your requirement, you can modify the behavior
setting to fine-tune the scale-in (scaleDown
) and scale-out (scaleUp
) behavior. For example, you can quickly scale out when the business traffic spikes, scale in or out to handle fluctuating workloads, or forbid scale-in for status-sensitive applications.
Precautions
The
behavior
setting reaches the Stable state in Kubernetes 1.23. Make sure that HPA is enabled for your cluster that runs Kubernetes 1.24 or later. For more information, see Implement horizontal pod autoscaling. For more information about how to update a cluster, see Manually update ACK clusters.If you use kubectl to manually deploy the HPA controller, make sure that the version of the HPA API is
v2beta2
or later.
How to configure the behavior
setting
The behavior
setting in the HPA configuration is optional. You can configure this setting to fine-tune the scaling behavior. The scaleDown
and scaleUp
fields in the behavior
setting are used to fine-tune the scaling behavior in order to avoid resource exhaustion or resource shortage, improve resource utilization, and optimize application performance.
The following code block is an example of the HPA configuration that contains the behavior
setting. The behavior
setting is set to the default in this example. You can modify the values of the existing fields on demand. If you do not specify the value of a field, the default setting is used.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: sample-hpa
spec:
minReplicas: 1
maxReplicas: 100
metrics:
- pods:
metric:
name: http_requests_per_second
target:
averageValue: 50
type: AverageValue
type: Pods
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sample-app
behavior: # The default behavior setting.
scaleDown: # Modify this field to fine-tune the scale-in behavior.
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 10
periodSeconds: 15
scaleUp: # Modify this field to fine-tune the scale-out behavior.
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
In the sample YAML file, the scaleDown
field defines a stabilization window (stabilizationWindowSeconds
) of 300 seconds and defines the scaling policy in the policies
section. The stabilization window indicates the HPA controller needs to consider recommendations about the number of replicated pods to be scaled in within the previous 5 minutes, selects the greatest value, and scales in at most 10 replicated pods every 15 seconds. This avoids flapping in the number of pods.
The scaleUp
field defines a stabilization window of 0
seconds, indicating that the HPA controller performs scale-out operations immediately. Two scale-out policies are defined and selected in the following way: The HPA controller scales out at most 100% of the current number of replicated pods every 15 seconds, or scales out at most four replicated pods every 15 seconds. During pod scale-out, the HPA controller selects the policy that adds the more pods.
The following table describes the fields supported by scaleDown
and scaleUp
.
Field | Description |
| The stabilization window used to avoid flapping in the number of replicated pods when the value of the scaling metric consistently fluctuates. The HPA controller needs to consider all recommendations about the number of replicated pods within the stabilization window. If you configure |
| Define one or more scale-in policies. Each policy consists of |
| Specify the policy to use when multiple scale-in policies are available. Valid values: |
The following sections describe how to fine-tune the behavior
setting based on different scenarios.
Quick scale-out
During events such as flash sales and product releases, use the following configuration to handle sudden workload spikes:
behavior:
scaleUp:
policies:
- type: Percent
value: 900
periodSeconds: 15
This configuration attempts the HPA controller to add 900% of the current number of pods within 15 seconds, achieving tenfold replica expansion while respecting maxReplicas
constraints. If the initial number of pods is 1, and the scale-out conditions are consistently met, the changes in the number of pods are as follows:
1 -> 10 -> 100 -> 1000
Quick scale-out and slow scale-in
To handle unexpected traffic spikes after scale-in activities in order to ensure the stability and fast response of the application, you can use the following configuration to quickly scale out and slowly scale in.
behavior:
scaleUp:
policies:
- type: Percent
value: 900
periodSeconds: 60
scaleDown:
policies:
- type: Pods
value: 1
periodSeconds: 600 # Remove a pod every 10 minutes.
During a scale-out activity, the HPA controller adds 900% of the current number of pods within 60 seconds. If the metric value consistently remains below the threshold, the HPA controller removes at most one pod within 600 seconds (10 minutes) during a scale-in activity.
Forbid scale-in
Scale-in activities can interrupt key tasks or status-sensitive applications or cause workload migration. You can reference the following configuration to forbid scale-in in order to ensure the high availability and stability of applications.
behavior:
scaleDown:
selectPolicy: Disabled
Extend or reduce the scale-in time window
In scenarios where the resources or budget is limited, quick scale-out activities may exhaust all resources or greatly increase the cost. You can use stabilizationWindowSeconds
to control the frequency of scale-out caused by fluctuating workloads.
behavior:
scaleDown:
stabilizationWindowSeconds: 600
policies:
- type: Pods
value: 5
periodSeconds: 600
In the preceding configuration, even if the metric value drops below the scale-in threshold, the HPA controller does not perform a scale-in activity immediately. It waits a time window of 600 seconds (10 minutes) and then checks whether the metric value is still lower than the threshold. Within a time window, the HPA controller removes at most five pods.
To scale in immediately when the scale-in condition is met, set stabilizationWindowSeconds
to 0
.
Use multiple scale-out policies
If your business traffic growth pattern is unpredictable, you can reference the following configuration to define multiple scale-out policies to handle traffic fluctuations.
behavior:
scaleUp:
policies:
- type: Pods # Scale out based on the number of pods.
value: 4
periodSeconds: 60
- type: Percent # Scale out based on the specified percentage.
value: 50
periodSeconds: 60
selectPolicy: Max
In the preceding configuration, scaleUp
defines two policies.
One policy adds at most four pods every minute.
The other policy adds at most 50% of the current number of pods every minute.
selectPolicy
is set to Max
, which indicates that the HPA controller selects the policy that adds the most pods.
References
We recommend that you use HPA together with node scaling to ensure that all pods have sufficient compute resources. For more information, see Overview of node scaling.
If you have any questions when using HPA, see Workload scaling FAQ.