Enable node auto scaling to automatically scale nodes - Container Service for Kubernetes

Before you begin

To make the most of node auto scaling, read Node scaling and understand the following concepts:

How node auto scaling works and its features

Use cases where node auto scaling can meet your business requirements

Important considerations before using node auto scaling

During a scale-in, subscription instances are removed but not released. To avoid extra costs, use pay-as-you-go instances when you enable this feature.

Prerequisites

Make sure you have activated Auto Scaling.
See Usage notes to understand the quotas and limitations of node scaling.
Node auto scaling has known limitations with certain scheduling policies, which may lead to unexpected scaling results. If your workloads or components use an unsupported scheduling policy, consider one of the following solutions:
- Solution 1: Switch to node instant scaling.
- Solution 2: Deploy the affected workloads or components to a node pool without node scaling enabled.
  
  For example, to deploy the ack-node-local-dns-admission-controller component, place it in a node pool without node scaling enabled and declare the following node affinity requirement in the component's configuration:
```
nodeAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchExpressions:
      - key: "k8s.aliyun.com"
        operator: "NotIn"
        values: ["true"]
```
The cluster-autoscaler component requires node resources for updates or deployments. Insufficient resources may cause these operations to fail and lead to scaling issues. Ensure that your nodes have adequate resources.

This feature involves the following steps:

Step 1: Enable node auto scaling for the cluster: You must first enable node auto scaling at the cluster level before the scaling policies of your node pools can take effect.
Step 2: Configure a node pool for auto scaling: The node auto scaling feature only affects node pools that are configured for auto scaling. Therefore, you must set the scaling mode of the target node pools to Auto.

Step 1: Enable node auto scaling

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click Nodes > Node Pools.
On the Node Pools page, click Enable next to Node Scaling.
If you are using node auto scaling for the first time, follow the on-screen instructions to activate the Auto Scaling service and grant the required permissions. You can skip this step if you have already done this.
- ACK managed cluster: Authorize the AliyunCSManagedAutoScalerRole role.
- ACK dedicated cluster: Authorize the KubernetesWorkerRole role and attach the AliyunCSManagedAutoScalerRolePolicy.
  
  In the Node Scaling Configuration dialog box, after the precheck passes, click the RAM role link (such as KubernetesWorkerRole-xxxx) to complete authorization in the RAM console.

In the Node Scaling Configuration dialog box, set Node Scaling Plan to Auto Scaling, configure the scaling parameters, and then click OK.

You can switch the node scaling method after the initial configuration. To do this, change the selection here to node instant scaling. Carefully read and follow the on-screen instructions to complete the process.

Parameter	Description
Node Pool Scale-out Policy	Random Policy: If multiple node pools are available for scale-out, one is chosen at random. Default Policy: If multiple node pools are available for scale-out, the one that results in the least resource waste is chosen. Priority-based Policy: If multiple node pools are available for scale-out, the one with the highest priority is chosen. Node pool priority is defined by the Node Pool Scale-out Priority parameter.
Node Pool Scale-out Priority	Sets the scale-out priority for node pools. This parameter takes effect only when Node Pool Scale-out Policy is set to Priority-based Policy. The value can be an integer from 1 to 100. A larger value indicates a higher priority. Click Add next to the parameter, select a node pool with auto scaling enabled, and set its priority. If no node pools with auto scaling enabled are available, you can ignore this parameter for now and configure it after you complete Step 2: Configure a node pool for auto scaling.
Scaling Sensitivity	The interval at which the system checks for scaling conditions. The default value is 60s.
During auto scaling, the scaling component automatically triggers scale-out based on scheduling conditions. Important ECS nodes: A node scale-in can occur only when all three conditions are met: Scale-in Threshold, Scale-in Trigger Delay, and Cooldown Period. GPU nodes: A GPU node scale-in can occur only when all three conditions are met: GPU Scale-in Threshold, Scale-in Trigger Delay, and Cooldown Period.
Allow Scale-in	Specifies whether to allow node scale-in operations. If disabled, scale-in settings do not take effect. Use this setting with caution.
Scale-in Threshold	The ratio of total resource requests to the total capacity of a single node in a node pool where node auto scaling is enabled. A node is considered for scale-in only when this ratio is below the configured threshold, meaning both its CPU and memory utilization are below the Scale-in Threshold.
GPU Scale-in Threshold	The scale-in threshold for GPU instances. A GPU node is considered for scale-in only when its CPU, memory, and GPU utilization are all below the GPU Scale-in Threshold.
Scale-in Trigger Delay	The time to wait from when a scale-in condition is detected to when the scale-in operation is actually performed. Unit: minutes. Default value: 10 minutes. Important The scaling component performs a node scale-in only after the Scale-in Threshold is met and the Scale-in Trigger Delay period has passed.
Cooldown Period	The period after a scale-out event during which the scaling component will not perform a scale-in. During the cooldown period, the scaler does not perform scale-ins but continues to evaluate nodes against the scale-in conditions. After the cooldown ends, if a node has met the scale-in threshold for longer than the scale-in delay, the scaler removes it. For example, if the cooldown is 10 minutes and the scale-in delay is 5 minutes, the scaler will not scale in any nodes for 10 minutes after the last scale-out. However, during these 10 minutes, it still checks for nodes that are eligible for scale-in. Once the 10-minute cooldown ends, if a node has met the scale-in threshold for more than the 5-minute delay, it is scaled in.

View advanced settings

Parameter	Description
Pod Termination Timeout	Maximum wait time for pod termination during scale-in. Unit: seconds. If a pod is not evicted before the timeout, the node is not released.
Minimum Pod Replicas	Scale-in protection threshold. Nodes with `ReplicationController` or `ReplicaSet` pods are not scaled in if the replica count falls below this value. Applies only to `ReplicationController` and `ReplicaSet` pods, not `StatefulSet` or `DaemonSet`.
Enable DaemonSet Pod Eviction	When enabled, DaemonSet pods are evicted when their node is scaled in.
Skip nodes with pods in the kube-system namespace	When enabled, nodes with kube-system namespace pods are excluded from scale-in. Note This does not apply to DaemonSet pods or mirror pods.

Step 2: Configure a node pool

You can either configure an existing node pool by changing its Scaling Mode to Auto, or create a new node pool with auto scaling enabled.

For more information, see Create and manage node pools. The key parameters are described below:

Parameter	Description
Scaling Mode	Manual: ACK adjusts the node count in the node pool based on the configured Expected Number of Nodes, maintaining the node count at the Expected Number of Nodes. For details, see Manually scale node pools. Auto: When cluster capacity planning cannot meet application pod scheduling demands, ACK automatically scales node resources based on configured minimum and maximum instance counts. Clusters running Kubernetes 1.24 or later default to node instant scaling; clusters running earlier versions default to node autoscaling. For details, see Node scaling.
Instances	The scalable range of nodes in the node pool, defined by Min. Instances and Max. Instances. This does not include your existing instances. Note If Min. Instances is greater than 0, the corresponding number of ECS instances is created automatically after the scaling group takes effect. We recommend setting Max. Instances to a value no less than the current number of nodes in the node pool. Otherwise, a scale-in will be triggered immediately after the auto scaling feature takes effect.
Instance-related configurations	When scaling out, nodes are allocated from the configured ECS instance families. To improve scale-out success rates, select multiple instance types across multiple zones to avoid unavailability or insufficient inventory. The specific instance type used for scaling is determined by the configured Scaling Policy. To ensure business stability and accurate resource scheduling, do not mix GPU and non-GPU instance types in the same node pool. Configure instance types for scaling in one of two ways: Specific types: Specify exact instance types based on vCPU, memory, family, architecture, and other dimensions. Generalized configuration: Select instance types to use or exclude based on attributes (vCPU, memory, etc.) to further improve scale-out success rates. For details, see Configure node pools using specified instance attributes. Refer to the console's elasticity strength recommendations for configuration, or view node pool elasticity strength after creation. For ACK-unsupported instance types and node configuration recommendations, see ECS instance type configuration recommendations. Cloud resource and billing information: ECS instance, GPU instance
Operating System	When auto scaling is enabled, you can select Alibaba Cloud Linux, Windows, or Windows Core images. When the selected image is a Windows image or a Windows Core image, the system automatically configures the taint `{ effect: 'NoSchedule', key: 'os', value: 'windows' }`.
Node Labels	Node labels added in the cluster are automatically applied to nodes created by auto scaling. Important Auto scaling can recognize node labels and taints only after they are mapped to node pool tags. A node pool has a limit on the number of tags it can have. Therefore, the total number of ECS tags, taints, and node labels for a node pool with auto scaling enabled must be 12 or fewer.
scaling policy	Configure how the node pool selects instances during scaling. Priority-based Policy: Scales based on the vSwitch priority configured in the cluster (vSwitch order from top to bottom indicates decreasing priority). If instances cannot be created in the higher-priority zone, the next priority vSwitch is used automatically. Cost Optimization: Scales from lowest to highest vCPU unit price. When the node pool uses Preemptible Instance, spot instances are prioritized. You can configure the Percentage of pay-as-you-go instances (%) to automatically supplement with pay-as-you-go instances when spot instances cannot be created due to inventory or other reasons. Distribution Balancing: Distributes ECS instances evenly across multiple zones, but only in multi-zone scenarios. If zone distribution becomes unbalanced due to inventory shortages, you can rebalance.
Use Pay-as-you-go Instances When Spot Instances Are Insufficient	Requires selecting spot instances as the billing method. When enabled, if sufficient spot instances cannot be created due to price or inventory reasons, ACK automatically attempts to create pay-as-you-go instances as a supplement. Cloud resource and billing information: ECS instance
Enable Supplemental Spot Instance	Requires selecting spot instances as the billing method. When enabled, upon receiving a system notification that a spot instance will be reclaimed (5 minutes before reclamation), ACK attempts to scale out new instances for compensation. Compensation successful: ACK drains the old node and removes it from the cluster. Compensation failed: ACK does not drain the old node, and the instance is reclaimed after 5 minutes. When inventory is restored or price conditions are met, ACK automatically purchases instances to maintain the desired node count. For details, see Spot instance node pool best practices. Active release of spot instances may cause business disruptions. To improve compensation success rates, we recommend also enabling Use Pay-as-you-go Instances When Spot Instances Are Insufficient. Cloud resource and billing information: ECS instance
Scaling Mode	Requires enabling Auto Scaling for the node pool and setting Scaling Mode to Auto. Standard: Scales by creating and releasing ECS instances. Swift: Scales by creating, stopping, and restarting ECS instances. When scaling is needed again, stopped instances are restarted directly, improving scaling speed. Stopped ECS instances do not incur compute resource fees, only storage fees (except for instance families with local storage capabilities, such as big data and local SSD types). For billing details and considerations about ECS instance stop modes, see Economical mode.
Taints	After you add a taint to a node, the cluster no longer schedules Pods to it.

Step 3: (Optional) Verify the results

After you complete the steps, node auto scaling is active. The node pool's status will indicate that auto scaling has started, and the cluster-autoscaler component will be automatically installed.

Node pool has auto scaling enabled

On the Node Pools page, the list shows the node pools that have auto scaling enabled.

Installed cluster-autoscaler component

In the left navigation pane of the cluster management page, choose Workload > Deployments.
Select the kube-system namespace. The cluster-autoscaler component is displayed.

FAQ

Category	Subcategory	Link
Scaling behavior of node auto scaling	Known limitations
	Scale-out behavior	What scheduling policies does the cluster-autoscaler use to determine whether an unschedulable pod can be scheduled to a node pool with auto scaling enabled? What resources can the cluster-autoscaler simulate for scheduling analysis? Why does the node auto scaling add-on fail to scale out nodes? How does the autoscaler calculate the resources of a scaling group that contains multiple instance types? During a scale-out, how does the autoscaler choose between multiple enabled node pools? How to configure custom resources for a node pool with auto scaling enabled? Why does enabling auto scaling for a node pool fail?
	Scale-in behavior	Why does the cluster-autoscaler fail to scale in a node? How to enable or disable eviction for a specific DaemonSet? What types of pods can prevent the cluster-autoscaler from removing a node?
	Extension support	Does the cluster-autoscaler support CustomResourceDefinitions (CRDs)?
Custom scaling behavior	Control scaling behavior by using pods	How to delay the cluster-autoscaler's scale-out response to an unschedulable pod?
Custom scaling behavior	Control scaling behavior by using nodes	How to prevent a specific node from being scaled in by the cluster-autoscaler? How to influence node scale-in by using pod annotations?
cluster-autoscaler component		How to upgrade the cluster-autoscaler to the latest version? What operations trigger an automatic update of the cluster-autoscaler? Why does node scaling fail on my ACK managed cluster even after I have granted the required role permissions?