You can enable the Horizontal Pod Autoscaler (HPA) feature to automatically scale pods based on CPU utilization, memory usage, or other metrics. HPA can quickly scale out replicated pods to handle heavy stress when the workloads surge and scale in appropriately to save resources when the workloads decrease. The entire process is automated and requires no human intervention. It is ideal for businesses with large fluctuations in service, large numbers of services, and frequent scaling requirements, such as e-commerce services, online education, and financial services.
Before you begin
To help you better use the HPA feature, we recommend that you read the Kubernetes official documentation Horizontal Pod Autoscaling to understand the basic principles, algorithm details, and scaling configurations of HPA before reading this topic.
In addition, Container Service for Kubernetes (ACK) clusters provide various workload scaling solutions for scheduling layer elasticity and node scaling solutions for resource layer elasticity. We recommend that you read the Auto scaling overview to understand the applicable scenarios and usage limits of different solutions before using the HPA feature.
Prerequisites
An ACK managed cluster or ACK dedicated cluster is created. For more information, see Create a cluster.
If you use the kubectl command to implement HPA, you must make sure that a kubectl client is connected to the Kubernetes cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Create an application that has HPA enabled in the ACK console
ACK is integrated with HPA. You can create an application that has HPA enabled in the ACK console. You can enable HPA when you create an application or for an existing application. We recommend that you create only one application that has HPA enabled for a workload.
Enable HPA when you create an application
The following takes a Deployment application as an example to describe how to enable HPA when you create an application. The steps for other workload types are similar.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Deployments page, click Create from Image.
On the Create page, enter the basic information, container configuration, service configuration, and scaling configuration as prompted to create a Deployment that supports HPA.
For more information about specific steps and configuration parameters, see Create a stateless application by using a Deployment. The following list describes the key parameters.
Basic Information: Set the information of the application, such as the name and number of replicas.
Container: Select the image and the required CPU and memory resources.
You can use the resource profiling feature to analyze historical data of resource usage and get recommendations for configuring container requests and limits. For more information, see Resource profiling.
ImportantYou must configure the
request
resources required by the application. Otherwise, you cannot enable HPA.Advanced:
In the Access Control section, click Create to the right of Services to set the parameters.
In the Scaling section, select Enable for HPA and configure the scaling threshold and related parameters.
Metrics: Select CPU Usage or Memory Usage, which must be the same as the one you have specified in the Required Resources field. If both CPU Usage and Memory Usage are specified, HPA will perform scaling operations when any one of the metrics reaches the scaling threshold.
Condition: Specify the resource usage threshold. HPA triggers scaling events when the threshold is exceeded. For more information about the algorithms that are used to perform horizontal pod autoscaling, see Algorithm details.
Max. Replicas: Specify the maximum number of pods to which the Deployment can be scaled. The value of this parameter must be greater than the minimum number of replicated pods.
Min. Replicas: Specify the minimum number of pods that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.
Enable HPA for an existing application
The following takes a Deployment application as an example to describe how to enable HPA for an existing application. The steps for other workload types are similar.
Use the workload page
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Deployments page, click the target application name, then click the Pod Scaling tab, then click Create to the right of HPA.
In the Create dialog box, configure the HPA settings as prompted.
Name: Enter a name for the HPA policy.
Metric: Select CPU Usage or Memory Usage, which must be the same as the one you have specified in the Required Resources field. If both CPU Usage and Memory Usage are specified, HPA will perform scaling operations when any one of the metrics reaches the scaling threshold.
Threshold: Specify the resource usage threshold. HPA triggers scaling events when the threshold is exceeded. For more information about the algorithms that are used to perform horizontal pod autoscaling, see Algorithm details.
Max. Containers: Specify the maximum number of pods to which the Deployment can be scaled. The value of this parameter must be greater than the minimum number of replicated pods.
Min. Containers: Specify the minimum number of pods that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.
Use the workload scaling page
This page is available for only users in whitelists. If you need to use it, submit a ticket to apply.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose
.In the upper right corner of the page, click Create Auto Scaling, select the target workload, then check the HPA option under the HPA and CronHPA tab, and configure the HPA policy as prompted.
Scaling Policy Name: Enter a name for the HPA policy.
Min. Containers: Specify the minimum number of pods that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.
Max. Containers: Specify the maximum number of pods to which the Deployment can be scaled. The value of this parameter must be greater than the minimum number of replicated pods.
Scaling Metric: Supports CPU and memory, which need to be the same as the required resource types set. When both CPU and memory resource types are specified, HPA will perform scaling operations when any one of the metrics reaches the scaling threshold.
Threshold: Specify the resource usage threshold. HPA triggers scaling events when the threshold is exceeded. For more information about the algorithms that are used to perform horizontal pod autoscaling, see Algorithm details.
Result verification
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose
.Click the Horizontal Scaling tab, and then select HPA to view the scaling status and task list.
After the application starts to run, container resources are automatically scaled based on the load among pods. You can also check whether HPA is enabled in the staging environment by performing a CPU stress test on the pods of the application.
Create an application that has HPA enabled by using kubectl
You can also create an HPA by using an orchestration template and associate the HPA with the Deployment for which you want to enable HPA. Then, you can run kubectl commands to enable HPA. We recommend that you create only one application that has HPA enabled for a workload. In the following example, HPA is enabled for an NGINX application.
Create a file named nginx.yml and copy the following content to the file.
ImportantYou must configure the
request
resources required by the application. Otherwise, you cannot enable HPA. You can use the resource profiling feature to analyze historical data of resource usage and get recommendations for configuring container requests and limits. For more information, see Resource profiling.Run the following command to create an NGINX application:
kubectl apply -f nginx.yml
Create a file named hpa.yml and copy the following content to the file to create an HPA.
Use the
scaleTargetRef
parameter to associate the HPA with thenginx
Deployment and trigger scaling operations when the average CPU utilization of all containers in the pod reaches 50%.YAML template for clusters whose Kubernetes versions are 1.24 and later
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx minReplicas: 1 # The minimum number of containers that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1. maxReplicas: 10 # The maximum number of containers to which the Deployment can be scaled. The value of this parameter must be greater than minReplicas. metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 # The average utilization of the target resource, which is the ratio of the average value of resource usage to its request amount.
YAML template for clusters whose Kubernetes versions are earlier than 1.24
If you need to specify both CPU and memory metrics, you can specify both
cpu
andmemory
types of resources under themetrics
field instead of creating two HPAs. If HPA detects that any one of the metrics reaches the scaling threshold, it will perform scaling operations.metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 50
Run the following command to create an HPA:
kubectl apply -f hpa.yml
At this point, run the
kubectl describe hpa <HPA name>
command, a warning similar to the following output is returned, indicating that the HPA is still being deployed. The HPA name in this example is nginx-hpa.yml. You can run thekubectl get hpa
command to check the status of the HPA.Warning FailedGetResourceMetric 2m (x6 over 4m) horizontal-pod-autoscaler missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7 Warning FailedComputeMetricsReplicas 2m (x6 over 4m) horizontal-pod-autoscaler failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5
Wait for the HPA to be created and the pod to reach the scaling condition, which is when the pod CPU utilization of NGINX exceeds 50% in this example. Then, run the
kubectl describe hpa <HPA name>
command again to check the horizontal scaling status.If the following output is returned, the HPA is running as expected:
Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 5m6s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Related operations
The autoscaling/v2
API allows you to use the behavior
parameter of HPA to configure scaling settings. You can specify the scaleUp
and scaleDown
fields in the behavior
parameter to specify scale-out and scale-in settings.
Feature status: Kubernetes v1.23 [stable]
Configure pods to only perform scaling operations
You can configure HPA to perform only scale-out operations or only scale-in operations by using the selectPolicy
field. For more information, see Configurable scaling behavior.
By default, scale-out and scale-in are not disabled.
Disable scale-out: Set
selectPolicy
toDisabled
in the scaleUp field. Example:behavior: scaleUp: selectPolicy: Disabled
Disable scaling-in: Set
selectPolicy
toDisabled
in the scaleDown field. Example:behavior: scaleDown: selectPolicy: Disabled
Configure the stable window
If the scale-in and scale-out metrics fluctuate frequently, you can add the stabilizationWindowSeconds
field under the behavior
parameter to set a stable window for a period of time, limiting the fluctuation of the number of replicas. This window can specify how long the HPA controller needs to observe the metric data before triggering scale-in or scale-out to determine the expected status of the system, preventing excessively frequent or unnecessary scaling due to short-term metric fluctuations.
In the example below, if the metric indicates that the target should scale down, the auto-scaling algorithm will consider all desired states in the past 5 minutes and use the maximum value within the window period.
behavior:
scaleDown:
stabilizationWindowSeconds: 300
FAQ
What do I do if unknown is displayed in the current field in the HPA metrics?
What do I do if HPA fails to scale up or down with abnormal metrics?
What do I do if excess pods are added by HPA during a rolling update?
What do I do if HPA does not scale pods when the scaling threshold is reached?
How do I fix the issue that excess pods are added by HPA when CPU or memory usage rapidly increases?
References
Other related documentation
For more information about how to configure external metrics supported by Kubernetes to implement HPA based on Alibaba Cloud metrics, see Implement horizontal auto scaling based on Alibaba Cloud metrics.
For more information about how to convert Managed Service for Prometheus metrics to metrics that are supported by the HPA to implement HPA, see Implement horizontal auto scaling based on Prometheus metrics.
For frequently asked questions that you may encounter when you use HPA, see FAQ about auto scaling for troubleshooting.
For more information about how to enable CronHPA and HPA to interact without conflicts, see Make CronHPA compatible with HPA.
Other workload scaling solutions
If your application resource usage has periodic changes and you need to scale pods regularly according to a Crontab-like strategy, see Use CronHPA for scheduled horizontal scaling.
If your application resource usage has periodic changes that are difficult to define by rules, you can use Advanced Horizontal Pod Autoscaling (AHPA) to automatically identify resource usage cycles and scale pods based on historical metrics. For more information, see AHPA overview.
If you need to automatically set resource limits for pods based on pod resource usage to ensure that pods get enough computing resources, see Use Vertical Pod Autoscaler.
If you need to flexibly customize scaling policies for pods based on message queues, timing strategies, custom metrics, and other Kubernetes events, see ACK KEDA.
Combination solutions
You can use HPA with the node auto scaling feature to automatically scale nodes when cluster node resources are insufficient. For more information, see Enable node auto scaling.