All Products
Search
Document Center

Container Service for Kubernetes:Node scaling overview

Last Updated:Nov 21, 2025

If your cluster's capacity planning cannot meet application pod scheduling requirements, you can use the node scaling feature to automatically scale node resources and increase scheduling capacity. ACK provides two elastic scaling solutions: node auto scaling and node instant scaling. Node instant scaling offers faster scaling, higher delivery efficiency, and a lower barrier to entry than node auto scaling.

Before you begin

This overview helps you understand the node scaling solutions provided by ACK and select the one that best fits your business needs before you enable the node scaling feature.

Before you read this topic, we recommend that you familiarize yourself with scaling concepts such as manual scaling, auto scaling, horizontal scaling, and vertical scaling by reading the official Kubernetes documentation.

How it works

In Kubernetes, node scaling works differently from the traditional model that is based on usage thresholds. Understanding this difference is important when you migrate from traditional data centers or other orchestration systems to a Kubernetes cluster.

The traditional elastic scaling model is based on usage. For example, if the CPU and memory usage of the nodes in a cluster exceed specific thresholds, the system scales out by adding new nodes. However, this model has the following issues.

How are thresholds selected and evaluated?

In a cluster, some hot spot nodes may have high utilization while other nodes have low utilization.

  • If elastic scaling is determined by the average resource utilization of the entire cluster, the high utilization of hot spot nodes is averaged out. This causes delayed scaling for hot spot nodes.

  • If elastic scaling is determined by the highest node utilization, it can lead to a waste of resources from unnecessary scale-outs and affect the overall service of the cluster.

How is pressure relieved after instances are scaled out?

In a Kubernetes cluster, applications are deployed as pods on different nodes. When a pod has high resource utilization, it can trigger a scale-out of the node or cluster. However, the number of pods for the application and the pod's corresponding Limit do not change. Therefore, the load pressure on the original node is not transferred to the newly added nodes.

How are instance scale-ins evaluated and executed?

If a node scale-in is determined by resource utilization, pods that request many resources but use only a small amount are likely to be evicted. If a cluster contains many such pods, they occupy a large amount of scheduling resources, which can cause other pods to become unschedulable.

To address these issues, ACK uses a two-layer elastic model: node scaling (resource layer) and workload scaling (scheduling layer). For example, the node scaling feature triggers changes to application replicas, which are scheduling units, based on resource usage. The following sections describe the technical details.

How can I determine if a node is displayed?

Node scaling monitors for pods that fail to schedule and determines whether to trigger a scale-out. If a pod fails to schedule due to insufficient resources, node scaling runs a scheduling simulation. The simulation calculates which auto-scaling-enabled node pool can provide the required node resources for the pending pods. If a suitable node pool is found, the corresponding nodes are scaled out.

Note

During a scheduling simulation, an auto-scaling-enabled node pool is treated as an abstract node. The instance types configured in the node pool determine the CPU, memory, or GPU capacity of the abstract node. The configured labels and taints also become the labels and taints of the abstract node. The simulation scheduler includes this abstract node in its scheduling evaluation. If the scheduling conditions are met, the simulation scheduler calculates the required number of nodes and triggers the node pool to scale out.

How are node scale-ins evaluated?

Node scaling only scales in nodes that are in an auto-scaling-enabled node pool. It cannot manage static nodes. Each node is evaluated for scale-in individually. When the resource utilization of a node falls below a configured threshold, a scale-in evaluation is triggered. At this point, node scaling simulates evicting the workloads on the node to determine if the node can be safely drained. The presence of certain pods, such as non-DaemonSet pods in the kube-system namespace or pods protected by a Pod Disruption Budget (PDB), will prevent the node from being drained. If a node can be drained, its pods are evicted to other nodes, and then the node is removed.

How is a node pool selected from multiple auto-scaling-enabled node pools?

Choosing between different auto-scaling-enabled node pools is equivalent to choosing between different abstract nodes. Similar to pod scheduling policies, a scoring mechanism is used to select a node pool. The elastic scaling component first filters for node pools that match the pod's scheduling requirements, and then makes further selections based on affinity policies such as affinity.

If a suitable node pool cannot be selected based on the preceding policies, node auto scaling selects a node pool based on the least-waste policy by default. The core of the least-waste policy is to find the option that results in the least amount of unused resources after a simulated scale-out.

Note

If a GPU node pool and a CPU node pool with auto scaling enabled can both be scaled out, the CPU node pool is prioritized by default.

By default, node instant scaling comprehensively evaluates inventory availability and costs. It prioritizes selecting instance types with sufficient inventory and lower costs from multiple feasible scale-out options.

How to improve the success rate of elastic scaling

The success rate of elastic scaling mainly depends on the following two factors:

  • Is the scheduling policy met?

    After you configure an auto-scaling-enabled node pool, you must confirm the range of pod scheduling policies that the node pool can support. If you cannot determine this directly, you can use a nodeSelector that targets the node pool's label to perform a pre-scaling simulation.

  • Sufficient resource configuration

    After the scheduling simulation passes, the system selects an auto-scaling-enabled node pool to scale out instances. However, the inventory of the ECS instance types configured in the node pool directly affects whether instances can be successfully scaled out. Therefore, we recommend that you configure multiple zones and various instance types to improve the scale-out success rate.

How to improve the speed of elastic scaling

  • Method 1: Use the swift mode to accelerate the scale-out speed. After an auto-scaling-enabled node pool is warmed up by completing one scale-out and one scale-in cycle, the node pool can enter the swift scaling mode. For more information, see Enable node auto scaling.

  • Method 2: Use a custom image based on Alibaba Cloud Linux 3 to improve the resource delivery speed of the IaaS layer by up to 50%. For more information, see Optimize elastic scaling with custom images.

Elastic scaling solutions: node auto scaling and node instant scaling

Node scaling is an elastic scaling capability at the resource layer. It automatically scales node resources to increase scheduling capacity when the cluster's capacity cannot meet application pod scheduling requirements. ACK provides two node scaling solutions.

Introduction

Important
  • Only one elastic scaling component can run in a cluster. The two elastic scaling solutions cannot be used together. To enable the node scaling feature, follow the standard procedure in Enable node auto scaling or Enable node instant scaling.

  • The elastic scaling performance data provided in this topic are theoretical values based on custom images that are optimized for elastic scaling. The actual performance may vary depending on your business environment. For more information about custom images, see Optimize elastic scaling with custom images.

Solution

Elastic scaling component

Description

Solution 1: node auto scaling

cluster-autoscaler component

Periodically maintains and checks the cluster status using polling to find conditions that meet scale-out or scale-in requirements, and then automatically scales cluster nodes.

Solution 2: node instant scaling

Node instant scaling component

An event-driven node scaling controller. It ensures better elastic resource delivery in scenarios such as large-scale clusters (for example, a node pool with auto scaling enabled has more than 100 nodes, or there are more than 20 such node pools) and consecutive scale-out activities. The scaling speed (the time from the first pod scheduling failure to a successful scheduling) is stable at 45s, the success rate can reach 99%, and resource fragmentation is reduced by about 30%. It also offers better extensibility for custom scaling policies.

Solution comparison

If a node pool in your cluster has automatic elastic scaling enabled and its Scaling Mode is set to Non-swift Mode, node instant scaling is compatible with the semantics and behavior of the node auto scaling component. This allows for a seamless transition for all types of applications. This section describes the optimized features of node instant scaling compared with node auto scaling.

Enhanced feature

Node auto scaling

Node instant scaling

Scaling speed and efficiency

For a single scaling activity, the scaling speed is approximately 60s in standard mode and 50s in swift mode.

Triggers scaling actions through an event-driven mechanism and uses Alibaba Cloud ContainerOS capabilities to accelerate elastic scaling. The scaling speed is approximately 45±10s.

When the scaling time reaches 1 minute, the scaling speed encounters a bottleneck. The elastic scaling speed also shows significant jitter at different scales (multiple node pools) and in different scenarios (consecutive scaling). For example, when the number of node pools exceeds 100, the scaling speed decreases to a range of 100s to 150s.

Performance does not significantly degrade as the number of node pools and pods increases. This makes it more suitable for scenarios with high requirements for elastic delivery speed.

Uses a polling model and is limited by its dependency on cluster state maintenance. The minimum elastic scaling sensitivity is 5s.

Is event-driven and uses a responsive model. The elastic scaling sensitivity is 1s to 3s.

Resource delivery certainty

The inventory of cloud resources changes frequently. Due to issues such as complex instance type combinations and insufficient inventory, the elastic scaling success rate of node auto scaling is approximately 97%.

Supports an automatic inventory selection policy. It can filter out-of-stock instance types from thousands of Alibaba Cloud instance type combinations based on your configured filter conditions and order. It then selects the most suitable type for scale-out or compensates with a qualified type if the inventory is insufficient. This greatly reduces the burden on O&M engineers to select instance types and increases the delivery success rate to 99%.

Supports scaling out the same instance type as configured in the node pool. If multiple types are configured, it selects the smallest instance type that meets the requirements for scale-out.

Supports scaling out different instance types.

When resource delivery fails, it retries periodically, which is a reactive approach.

When resource delivery fails, it supports an inventory alert feature to provide advance notice of potential risks associated with instance type combinations.

Use and O&M threshold

Compared to node auto scaling, node instant scaling has a lower barrier to entry. This is mainly reflected in the following aspects.

  • Node pool configuration maintenance: Node instant scaling can automatically select instances from multiple instance types and zones based on instance properties to accommodate pending pods. In contrast, with node auto scaling, you must manually maintain the configurations of each node pool to ensure proper pod scheduling. Therefore, when pod configurations change, the corresponding node pool configuration often needs to be updated as well.

  • Node O&M: For developers, any exceptions related to the scaling process are synchronized through pod events. They only need to manage the pod lifecycle.

  • Feature extension: Supports extension mechanisms, such as using Descheduler to prepare elastic resources. Node instant scaling supports non-intrusive filter interactions between resource supply policies, node lifecycle management, and your custom behaviors, which provides more possibilities for custom development.

Scheduling policy

In addition to all the scheduling features of node auto scaling, node instant scaling also supports the following features:

  • Topology: Often used to meet high availability requirements across zones.

  • Pod Disruption Budgets: Limits the number of pods in a multi-replica application that can be voluntarily evicted at the same time to ensure stability during changes.

Node instant scaling supports selecting the optimal Bin Packing and PreBind policies (custom features) based on the pod, which can reduce the scheduling fragmentation rate by up to 30%.

Limits of node instant scaling

Understanding the limits of node instant scaling is an important part of evaluating the node instant scaling solution.

  • The swift mode is not supported.

  • A node pool cannot scale out more than 180 nodes in a single batch.

  • Disabling scale-in at the cluster level is not currently supported.

  • Node instant scaling does not support checking the inventory of spot instances. For a node pool where the Billing Method is set to Spot Instance and Use On-Demand Instances To Supplement Spot Instance Capacity is enabled, on-demand instances may be scaled out even when the spot instance inventory is sufficient.

Suggestions on selecting a solution

Based on the preceding Solution comparison and Limits of node instant scaling, you can select the appropriate solution for your needs. If your business has relatively low requirements for scaling speed, resource delivery certainty, and O&M costs, and cannot accept the limits of node instant scaling, node auto scaling may be sufficient. Conversely, if you have the following business requirements, node instant scaling is the recommended solution.

  • The cluster is large. For example, if an auto-scaling-enabled node pool has more than 100 nodes, or if there are more than 20 such node pools, the scale-out efficiency of node auto scaling decreases significantly as the cluster size grows. In contrast, the performance of node instant scaling fluctuates less.

  • You have high requirements for resource delivery speed (elastic scaling speed). In a single scaling scenario, the elastic scaling speed of node auto scaling in standard mode is approximately 60s, while for node instant scaling it is approximately 45s.

  • Business workload batches are unpredictable, and you often need to perform consecutive scale-outs for the same elastic node pool. In consecutive scaling mode, the performance of node auto scaling decreases and shows significant jitter. In contrast, node instant scaling can still achieve a scaling speed of approximately 45s.

Notes

Quotas and limits

  • You can add up to 200 custom routes to a route table of a virtual private cloud (VPC). To increase this quota, go to Quota Center and submit a request. For more information about the quotas of other resources and how to increase them, see Quotas for underlying cloud dependencies.

  • We recommend that you properly configure the maximum number of nodes in an auto-scaling-enabled node pool. Make sure that dependent resources and quotas, such as VPC CIDR blocks and vSwitches, are sufficient for the specified number of nodes. Otherwise, scale-out activities may fail. For more information about how to configure the maximum number of nodes, see Configure the number of instances. For more information about network planning for ACK, see Network planning for an ACK managed cluster.

  • The node scaling feature does not support subscription nodes. When you create a new auto-scaling-enabled node pool, do not select subscription as the billing method. To enable auto scaling for an existing node pool, make sure that the node pool does not contain any subscription nodes.

  • The node scaling feature does not currently support Sidecar Containers. Deploy workloads that use Sidecar Containers to a node pool where auto scaling is not enabled.

Maintenance of dependent resources

If you bind EIPs to nodes, do not directly delete the ECS nodes that are scaled out by node scaling in the ECS console. Otherwise, the EIPs cannot be automatically released.

What to do next