All Products
Search
Document Center

Container Service for Kubernetes:Enable node auto scaling

Last Updated:Jun 24, 2026

If your cluster has insufficient capacity to schedule application Pods, you can use node auto scaling to automatically scale nodes in and out. Node auto scaling is ideal for scenarios with smaller scaling requirements, such as having fewer than 20 auto-scaling node pools or fewer than 100 nodes in those pools. It is also suitable for workloads with stable traffic, predictable resource demands that a single scaling operation can meet, and periodic or foreseeable resource needs.

Before you begin

To make the most of node auto scaling, read Node scaling and understand the following concepts:

对How node auto scaling works and its features

对Use cases where node auto scaling can meet your business requirements

对Important considerations before using node auto scaling

对During a scale-in, subscription instances are removed but not released. To avoid extra costs, use pay-as-you-go instances when you enable this feature.

Prerequisites

  • Make sure you have activated Auto Scaling.

  • See Usage notes to understand the quotas and limitations of node scaling.

  • Node auto scaling has known limitations with certain scheduling policies, which may lead to unexpected scaling results. If your workloads or components use an unsupported scheduling policy, consider one of the following solutions:

    • Solution 1: Switch to node instant scaling.

    • Solution 2: Deploy the affected workloads or components to a node pool without node scaling enabled.

      For example, to deploy the ack-node-local-dns-admission-controller component, place it in a node pool without node scaling enabled and declare the following node affinity requirement in the component's configuration:

      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: "k8s.aliyun.com"
              operator: "NotIn"
              values: ["true"]
  • The cluster-autoscaler component requires node resources for updates or deployments. Insufficient resources may cause these operations to fail and lead to scaling issues. Ensure that your nodes have adequate resources.

This feature involves the following steps:

  1. Step 1: Enable node auto scaling for the cluster: You must first enable node auto scaling at the cluster level before the scaling policies of your node pools can take effect.

  2. Step 2: Configure a node pool for auto scaling: The node auto scaling feature only affects node pools that are configured for auto scaling. Therefore, you must set the scaling mode of the target node pools to Auto.

Step 1: Enable node auto scaling

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Nodes > Node Pools.

  3. On the Node Pools page, click Enable next to Node Scaling.

  4. If you are using node auto scaling for the first time, follow the on-screen instructions to activate the Auto Scaling service and grant the required permissions. You can skip this step if you have already done this.

    • ACK managed cluster: Authorize the AliyunCSManagedAutoScalerRole role.

    • ACK dedicated cluster: Authorize the KubernetesWorkerRole role and attach the AliyunCSManagedAutoScalerRolePolicy.

      In the Node Scaling Configuration dialog box, after the precheck passes, click the RAM role link (such as KubernetesWorkerRole-xxxx) to complete authorization in the RAM console.

  5. In the Node Scaling Configuration dialog box, set Node Scaling Plan to Auto Scaling, configure the scaling parameters, and then click OK.

    You can switch the node scaling method after the initial configuration. To do this, change the selection here to node instant scaling. Carefully read and follow the on-screen instructions to complete the process.

    Parameter

    Description

    Node Pool Scale-out Policy

    • Random Policy: If multiple node pools are available for scale-out, one is chosen at random.

    • Default Policy: If multiple node pools are available for scale-out, the one that results in the least resource waste is chosen.

    • Priority-based Policy: If multiple node pools are available for scale-out, the one with the highest priority is chosen.

      Node pool priority is defined by the Node Pool Scale-out Priority parameter.

    Node Pool Scale-out Priority

    Sets the scale-out priority for node pools. This parameter takes effect only when Node Pool Scale-out Policy is set to Priority-based Policy.

    The value can be an integer from 1 to 100. A larger value indicates a higher priority.

    Click Add next to the parameter, select a node pool with auto scaling enabled, and set its priority.

    If no node pools with auto scaling enabled are available, you can ignore this parameter for now and configure it after you complete Step 2: Configure a node pool for auto scaling.

    Scaling Sensitivity

    The interval at which the system checks for scaling conditions. The default value is 60s.

    During auto scaling, the scaling component automatically triggers scale-out based on scheduling conditions.

    Important
    • ECS nodes: A node scale-in can occur only when all three conditions are met: Scale-in Threshold, Scale-in Trigger Delay, and Cooldown Period.

    • GPU nodes: A GPU node scale-in can occur only when all three conditions are met: GPU Scale-in Threshold, Scale-in Trigger Delay, and Cooldown Period.

    Allow Scale-in

    Specifies whether to allow node scale-in operations. If disabled, scale-in settings do not take effect. Use this setting with caution.

    Scale-in Threshold

    The ratio of total resource requests to the total capacity of a single node in a node pool where node auto scaling is enabled.

    A node is considered for scale-in only when this ratio is below the configured threshold, meaning both its CPU and memory utilization are below the Scale-in Threshold.

    GPU Scale-in Threshold

    The scale-in threshold for GPU instances.

    A GPU node is considered for scale-in only when its CPU, memory, and GPU utilization are all below the GPU Scale-in Threshold.

    Scale-in Trigger Delay

    The time to wait from when a scale-in condition is detected to when the scale-in operation is actually performed. Unit: minutes. Default value: 10 minutes.

    Important

    The scaling component performs a node scale-in only after the Scale-in Threshold is met and the Scale-in Trigger Delay period has passed.

    Cooldown Period

    The period after a scale-out event during which the scaling component will not perform a scale-in.

    During the cooldown period, the scaler does not perform scale-ins but continues to evaluate nodes against the scale-in conditions. After the cooldown ends, if a node has met the scale-in threshold for longer than the scale-in delay, the scaler removes it. For example, if the cooldown is 10 minutes and the scale-in delay is 5 minutes, the scaler will not scale in any nodes for 10 minutes after the last scale-out. However, during these 10 minutes, it still checks for nodes that are eligible for scale-in. Once the 10-minute cooldown ends, if a node has met the scale-in threshold for more than the 5-minute delay, it is scaled in.

    View advanced settings

    Parameter

    Description

    Pod Termination Timeout

    Maximum wait time for pod termination during scale-in. Unit: seconds.

    If a pod is not evicted before the timeout, the node is not released.

    Minimum Pod Replicas

    Scale-in protection threshold. Nodes with ReplicationController or ReplicaSet pods are not scaled in if the replica count falls below this value.

    Applies only to ReplicationController and ReplicaSet pods, not StatefulSet or DaemonSet.

    Enable DaemonSet Pod Eviction

    When enabled, DaemonSet pods are evicted when their node is scaled in.

    Skip nodes with pods in the kube-system namespace

    When enabled, nodes with kube-system namespace pods are excluded from scale-in.

    Note

    This does not apply to DaemonSet pods or mirror pods.

Step 2: Configure a node pool

You can either configure an existing node pool by changing its Scaling Mode to Auto, or create a new node pool with auto scaling enabled.

For more information, see Create and manage node pools. The key parameters are described below:

Parameter

Description

Scaling Mode

  • Manual: ACK adjusts the node count in the node pool based on the configured Expected Number of Nodes, maintaining the node count at the Expected Number of Nodes. For details, see Manually scale node pools.

  • Auto: When cluster capacity planning cannot meet application pod scheduling demands, ACK automatically scales node resources based on configured minimum and maximum instance counts. Clusters running Kubernetes 1.24 or later default to node instant scaling; clusters running earlier versions default to node autoscaling. For details, see Node scaling.

Instances

The scalable range of nodes in the node pool, defined by Min. Instances and Max. Instances. This does not include your existing instances.

Note
  • If Min. Instances is greater than 0, the corresponding number of ECS instances is created automatically after the scaling group takes effect.

  • We recommend setting Max. Instances to a value no less than the current number of nodes in the node pool. Otherwise, a scale-in will be triggered immediately after the auto scaling feature takes effect.

Instance-related configurations

When scaling out, nodes are allocated from the configured ECS instance families. To improve scale-out success rates, select multiple instance types across multiple zones to avoid unavailability or insufficient inventory. The specific instance type used for scaling is determined by the configured Scaling Policy.

To ensure business stability and accurate resource scheduling, do not mix GPU and non-GPU instance types in the same node pool.

Configure instance types for scaling in one of two ways:

  • Specific types: Specify exact instance types based on vCPU, memory, family, architecture, and other dimensions.

  • Generalized configuration: Select instance types to use or exclude based on attributes (vCPU, memory, etc.) to further improve scale-out success rates. For details, see Configure node pools using specified instance attributes.

Refer to the console's elasticity strength recommendations for configuration, or view node pool elasticity strength after creation.

For ACK-unsupported instance types and node configuration recommendations, see ECS instance type configuration recommendations.

Cloud resource and billing information: imageECS instance, imageGPU instance

Operating System

When auto scaling is enabled, you can select Alibaba Cloud Linux, Windows, or Windows Core images.

When the selected image is a Windows image or a Windows Core image, the system automatically configures the taint { effect: 'NoSchedule', key: 'os', value: 'windows' }.

Node Labels

Node labels added in the cluster are automatically applied to nodes created by auto scaling.

Important

Auto scaling can recognize node labels and taints only after they are mapped to node pool tags. A node pool has a limit on the number of tags it can have. Therefore, the total number of ECS tags, taints, and node labels for a node pool with auto scaling enabled must be 12 or fewer.

scaling policy

Configure how the node pool selects instances during scaling.

  • Priority-based Policy: Scales based on the vSwitch priority configured in the cluster (vSwitch order from top to bottom indicates decreasing priority). If instances cannot be created in the higher-priority zone, the next priority vSwitch is used automatically.

  • Cost Optimization: Scales from lowest to highest vCPU unit price.

    When the node pool uses Preemptible Instance, spot instances are prioritized. You can configure the Percentage of pay-as-you-go instances (%) to automatically supplement with pay-as-you-go instances when spot instances cannot be created due to inventory or other reasons.

  • Distribution Balancing: Distributes ECS instances evenly across multiple zones, but only in multi-zone scenarios. If zone distribution becomes unbalanced due to inventory shortages, you can rebalance.

Use Pay-as-you-go Instances When Spot Instances Are Insufficient

Requires selecting spot instances as the billing method.

When enabled, if sufficient spot instances cannot be created due to price or inventory reasons, ACK automatically attempts to create pay-as-you-go instances as a supplement.

Cloud resource and billing information: imageECS instance

Enable Supplemental Spot Instance

Requires selecting spot instances as the billing method.

When enabled, upon receiving a system notification that a spot instance will be reclaimed (5 minutes before reclamation), ACK attempts to scale out new instances for compensation.

  • Compensation successful: ACK drains the old node and removes it from the cluster.

  • Compensation failed: ACK does not drain the old node, and the instance is reclaimed after 5 minutes. When inventory is restored or price conditions are met, ACK automatically purchases instances to maintain the desired node count. For details, see Spot instance node pool best practices.

Active release of spot instances may cause business disruptions. To improve compensation success rates, we recommend also enabling Use Pay-as-you-go Instances When Spot Instances Are Insufficient.

Cloud resource and billing information: imageECS instance

Scaling Mode

Requires enabling Auto Scaling for the node pool and setting Scaling Mode to Auto.
  • Standard: Scales by creating and releasing ECS instances.

  • Swift: Scales by creating, stopping, and restarting ECS instances. When scaling is needed again, stopped instances are restarted directly, improving scaling speed.

    Stopped ECS instances do not incur compute resource fees, only storage fees (except for instance families with local storage capabilities, such as big data and local SSD types). For billing details and considerations about ECS instance stop modes, see Economical mode.

Taints

After you add a taint to a node, the cluster no longer schedules Pods to it.

Step 3: (Optional) Verify the results

After you complete the steps, node auto scaling is active. The node pool's status will indicate that auto scaling has started, and the cluster-autoscaler component will be automatically installed.

Node pool has auto scaling enabled

On the Node Pools page, the list shows the node pools that have auto scaling enabled.

Installed cluster-autoscaler component

  1. In the left navigation pane of the cluster management page, choose Workload > Deployments.

  2. Select the kube-system namespace. The cluster-autoscaler component is displayed.

FAQ

Category

Subcategory

Link

Scaling behavior of node auto scaling

Known limitations

Scale-out behavior

Scale-in behavior

Extension support

Does the cluster-autoscaler support CustomResourceDefinitions (CRDs)?

Custom scaling behavior

Control scaling behavior by using pods

Control scaling behavior by using nodes

cluster-autoscaler component