How to enable horizontal pod autoscaling - Container Service for Kubernetes

You can enable the Horizontal Pod Autoscaling (HPA) feature to automatically scale pods based on CPU utilization, memory utilization, or other metrics. HPA endows your container service with high elasticity. It can quickly scale out replicated pods to handle heavy stress during sharp increases in workloads. It can also scale in replicated pods to save resources when the workloads decrease. The entire process is automated and requires no human intervention. It is ideal for businesses such as e-commerce services, online education, and financial services. These businesses usually have fluctuating service demand, run large numbers of services, or need to scale resources frequently.

Background information

In Kubernetes 1.18 and later versions, the v2beta2 API version allows you to use the behavior parameter of HPA to configure scaling settings. You can specify the scaleUp and scaleDown fields in the behavior parameter to specify scale-out and scale-in settings. If you want HPA to perform only scale-out operations or only scale-in operations, you can enable HPA in the Container Service for Kubernetes (ACK) console and then click Disable Scale-in or Disable Scale-out.

By default, scale-out and scale-in are enabled.
Disable scale-out: Set selectPolicy to Disabled in the scaleUp field. Example:
```
behavior:
  scaleUp:
    selectPolicy: Disabled
```
Disable scale-in: Set selectPolicy to Disabled in the scaleDown field. Example:
```
behavior:
  scaleDown:
    selectPolicy: Disabled
```

Create an application that has HPA enabled in the ACK console

ACK is integrated with HPA. You can create an application that has HPA enabled in the ACK console. You can enable HPA when you create an application or after the application is created.

Method 1: Enable HPA when you create an application

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage and choose Workloads > Deployments in the left-side navigation pane.
On the Deployments page, click Create from Image.

On the Basic Information wizard page, enter a name for your application, set the parameters, and then click Next.

Parameter	Description
Namespace	Select the namespace to which the application belongs. The default namespace is automatically selected.
Name	Enter a name for the application.
Replicas	The number of pods that you want to provision for the application. Default value: 2.
Type	The type of the application. You can select Deployment, StatefulSet, Job, CronJob, or DaemonSet.
Label	Add a label to the application. The label is used to identify the application.
Annotations	Add annotations to the application.
Synchronize Timezone	ACK Serverless clusters do not support this parameter. This parameter specifies whether to synchronize the time zone between nodes and containers.

On the Container wizard page, set the container parameters, select an image, and then configure the required computing resources. Click Next. For more information, see Configure containers.
Note
You must configure the computing resources required by the Deployment. Otherwise, you cannot enable HPA.
On the Advanced wizard page, click Create to the right of Services in the Access Control section, and then set the parameters. For more information, see Configure advanced settings.
On the Advanced wizard page, select Enable for HPA and configure the scaling threshold and related settings.
- Metric: Select CPU Usage or Memory Usage. The selected resource type must be the same as the one you have specified in the Required Resources field.
- Condition: Specify the resource usage threshold. HPA triggers scaling events when the threshold is exceeded. For more information about the algorithms that are used to perform horizontal pod autoscaling, see Algorithm details.
- Max. Replicas: Specify the maximum number of pods to which the Deployment can be scaled. The value of this parameter must be greater than the minimum number of replicated pods.
- Min. Replicas: Specify the minimum number of pods that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.
In the lower-right corner of the Advanced wizard page, click Create to create the application that has HPA enabled.

Verify the result

Click View Details or choose Workloads > Deployments. On the page that appears, click the name of the created application or click Details in the Actions column. Then, click the Pod Scaling tab to view information about the scaling group of the application.
After the application starts running, container resources are automatically scaled based on the CPU utilization of the application. You can also check whether HPA is enabled in the staging environment by performing a CPU stress test on the pods of the application. Verify that the pods are automatically scaled within 30 seconds.

Method 2: Enable HPA for an existing application

The following example demonstrates how to enable HPA for a Deployment.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage and choose Workloads > Deployments in the left-side navigation pane.
On the Deployments page, click the name of the application that you want to manage.
Click the Pod Scaling tab and click Create.
In the Create dialog box, configure the HPA settings. For more information about how to set the parameters, see HPA settings in Step 9.
Click OK.

Create an application that has HPA enabled by using kubectl

You can also create an HPA by using an orchestration template and associate the HPA with the Deployment for which you want to enable HPA. Then, you can run kubectl commands to enable HPA.

In the following example, HPA is enabled for an NGINX application.

Create a file named nginx.yml and copy the following content to the file.

Example:

apiVersion: apps/v1 
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx  
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9 # replace it with your exactly <image_name:tags>
        ports:
        - containerPort: 80
        resources:
          requests:                         #This parameter is required for running the HPA. 
            cpu: 500m

Run the following command to create an NGINX application:
```
kubectl create -f nginx.yml
```

Create an HPA.

Use the scaleTargetRef parameter to associate the HPA with the nginx Deployment.

YAML template for clusters whose Kubernetes versions are 1.26 and later

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1 # The value must be an integer greater than or equal to 1. 
  maxReplicas: 10 # The value must be greater than the minimum number of replicated pods. 
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

YAML template for clusters whose Kubernetes versions are 1.24 and earlier

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1 # The value must be an integer greater than or equal to 1. 
  maxReplicas: 10 # The value must be greater than the minimum number of replicated pods. 
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Note

You must configure resource requests for the pods of the application. Otherwise, the HPA cannot be started.

Run the kubectl describe hpa name command. A warning similar to the following output is returned:

Warning  FailedGetResourceMetric       2m (x6 over 4m)  horizontal-pod-autoscaler  missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7

Warning  FailedComputeMetricsReplicas  2m (x6 over 4m)  horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5

After the HPA is created, run the kubectl describe hpa name command.
If the following output is returned, the HPA is running as expected:
```
Normal SuccessfulRescale 39s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
```
If the CPU utilization of the pod of the NGINX application exceeds 50% as specified in the HPA settings, the HPA automatically creates new pods. If the CPU utilization of the pod of the NGINX application drops below 50%, the HPA automatically removes pods.

References

For more information about the algorithms that are used to perform horizontal pod autoscaling, see Algorithm details.
A Vertical Pod Autoscaler (VPA) automatically sets limits on the resource usage of a cluster based on the resource usage of the pods in the cluster. This way, ACK can schedule pods to nodes that have sufficient resources. For more information, see Vertical pod auto scaling.