All Products
Search
Document Center

Container Service for Kubernetes:Manually scale a node pool

更新时间:Nov 24, 2025

You can manually scale a node pool by adjusting its expected number of nodes. This helps maintain the desired number of nodes and improves O&M efficiency. You can scale out a node pool to ensure that you have enough nodes to support your services or scale in a node pool to save resource costs.

Note

ACK also supports autoscaling. You can choose from two elasticity solutions, node autoscaling or Node Instant Elasticity, to automatically scale node resources and increase your scheduling capacity. For more information, see Node Scaling.

Introduction to node pool scaling

The expected number of nodes is the target number of nodes for a node pool to maintain. It represents the desired state of the node pool. After you specify the expected number of nodes for a node pool, the node pool automatically triggers a scale-out or scale-in operation based on the current number of nodes. This process maintains the number of nodes at the expected level without requiring manual intervention.

Scale out a node pool

If you set the expected number of nodes to a value greater than the current number of nodes in the node pool, the system triggers a scale-out operation. If the system fails to add a scale-out node, it automatically retries the operation until the number of nodes in the node pool reaches the expected number. The configuration of the new nodes depends on the node pool configuration. The instance type and zone of the nodes are determined by the scaling policy. For more information about scaling policies, see Scaling policies.

During a node pool scale-out, you are charged based on the instance types that are created. For example, if a node pool is configured with two instance types, the Billing Method is Pay-as-you-go, and the Scaling Policy is Priority. During this scale-out, 2 Node A instances are created in the zone of the first-priority virtual switch. If the resources for Node A are insufficient, 3 Node B instances are created in the zone of the second-priority virtual switch. The cost for one hour is calculated using the formula Unit price of instance type × Number of nodes × Billing duration. In this example, the cost is Unit price of Node A × 2 × 1 + Unit price of Node B × 3 × 1.

A node pool scale-out consists of two steps.

  1. Create ECS instances: ACK node pools use Auto Scaling as the underlying service to create nodes. After you adjust the expected number of nodes, ACK modifies the expected number of instances in the Auto Scaling group and performs a scale-out based on the node pool configuration. The status of the node pool changes to Scaling Out. After Auto Scaling creates the ECS instances, the node pool status changes to Active. For more information about the expected number of instances, see Expected number of instances.

    Important

    ECS Bare Metal GPU instances (instance families ebmgn7 and ebmgn7e) do not support automatic multi-instance GPU (MIG) cleanup. Therefore, when ACK adds nodes of these types, it resets existing MIG settings. The duration of the reset is unpredictable. If the reset takes too long, the automatic addition of nodes may fail.

  2. Add an ECS instance to a cluster: After Auto Scaling provisions an ECS instance, the instance automatically runs the cloud-init script (maintained by ACK) to initialize the node and add it to the node pool. The execution logs are saved to the /var/log/messages file on the node. You can log on to the node and run the grep cloud-init /var/log/messages command to view the execution logs.

    Note
    • If a node is successfully added to the node pool, the log messages in /var/log/messages are automatically purged. Therefore, you can reference these logs only when a node fails to be added to the cluster.

    • If a node fails to be added to the cluster, key information from the /var/log/messages log is extracted and included in the task result. You can click the target cluster and view the reason on the Cluster Tasks tab.

Scale in a node pool

If you set the expected number of nodes to a value less than the current number of nodes, the system triggers a scale-in operation and removes nodes.

  • When scaling in nodes:

    • If the node pool scaling policy is set to Priority, the system scales in the most recently created instances.

    • If the node pool scaling policy is set to Distribution Balancing, the system filters ECS instances by zone according to the balanced release policy and then selects the most recently created instances for the scale-in. This ensures that the number of ECS instances in each zone of the scaling group is roughly equal after the instances are removed.

    • If the node pool scaling policy is set to Cost Optimization, the system prioritizes removing ECS instances with the highest vCPU unit prices during a scale-in.

  • When you scale in a node pool by changing the expected number of nodes, the nodes are removed even if the draining operation fails. If you want to drain the nodes before they are removed, you must remove specific nodes. For more information, see Remove a node.

  • When you scale in a node pool, subscription ECS instances are not released. To release subscription instances, log on to the ECS console, convert the subscription instances to pay-as-you-go instances, and then release them. For more information, see Convert a subscription instance to a pay-as-you-go instance.

  • The lifecycles of the system disk and data disks are linked to the node. When a node is released during a scale-in, its disks are also released. All data on the disks is permanently lost and cannot be recovered. To ensure data persistence, use a PersistentVolume (PV). This decouples the storage data from the node lifecycle and ensures data security.

Procedure

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, find the cluster to manage and click its name. In the left navigation pane, choose Nodes > Node Pools.

  3. In the Actions column for the target node pool, click Scale and set Scaling Mode to Manual.

  4. (Optional) If you have not authorized the OOS O&M orchestration service, you must grant authorization by creating the AliyunOOSLifecycleHook4CSRole role. Click AliyunOOSLifecycleHook4CSRole and follow the on-screen instructions to complete the authorization.

    Note
    • If you are using an Alibaba Cloud account, click AliyunOOSLifecycleHook4CSRole to grant the required permissions.

    • If you are using a RAM user, ensure that your Alibaba Cloud account has been assigned the AliyunOOSLifecycleHook4CSRole role. Then, attach the AliyunRAMReadOnlyAccess policy to the RAM user. For more information, see Grant permissions to a RAM user.

  5. Enter a value for Expected Nodes and follow the prompts to submit the configuration.

    After submission, the node pool status changes to Updating, followed by Scaling Out or Removing Node.
    • In the Status column of the node pool list, a status of Scaling Out indicates that the node pool is being scaled out. A status of Active indicates that the scale-out is complete.

      Important

      When you scale out nodes in a cluster, if the security group denies access to 100.64.0.0/10, the nodes cannot be added to the cluster.

    • In the Status column of the node pool list, if the status is Removing, the node pool is scaling in. If the status is Active, the scale-in is complete.

Non-standard operations and recommended actions

The expected number of nodes is the target number of nodes that a node pool should maintain. Certain non-standard operations can prevent the node pool from scaling as expected and may lead to resource loss. The following table describes common non-standard operations and recommended actions.

Important

Do not perform any non-standard operations.

Non-standard operation

Node pool behavior

Recommendation

Run the kubectl delete node command to remove a node.

The expected number of nodes is based on the number of ECS instances in the Auto Scaling group, not the number of nodes in the cluster.

If you use the API server to remove nodes, the corresponding ECS instances are not released. As a result, the actual number of nodes in the node pool does not change. However, because the nodes are removed from the cluster, their status is displayed as Unknown in the node pool's node list.

  • If you have performed this operation, you can click a node pool's name on the node pool page and remove the node on the Nodes tab.

    Note

    You do not need to select Drain Node because the node has already been removed from the cluster. You can then choose whether to Release ECS Instance.

  • The following types of nodes are not automatically released. After you remove them from the node pool, you must log on to the ECS console to manage them manually.

    • Nodes that are manually added to the cluster.

    • Subscription nodes.

Release an ECS instance from the ECS console or by calling an OpenAPI operation.

The node pool detects the release of the ECS instance and automatically creates a new ECS instance to maintain the expected number of nodes.

  • ACK detects node releases based on the expected number of nodes and automatically creates new instances to replace them, which can cause unexpected costs or service disruptions. We recommend that you use the ACK console to remove nodes. For more information, see Remove nodes.

  • The following types of nodes are not automatically released. After you remove them from the node pool, you must log on to the ECS console to manage them manually.

    • Nodes that are manually added to the cluster.

    • Subscription nodes.

Remove an ECS instance from a scaling group using Auto Scaling without modifying the expected number of instances.

The node pool detects the release of the ECS instance and automatically creates a new ECS instance to maintain the expected number of nodes.

Do not directly manage the scaling group associated with the node pool. This can cause the node pool to behave unexpectedly.

A subscription ECS instance is released upon expiration.

The node pool detects the release of the ECS instance and automatically creates a new ECS instance to maintain the expected number of nodes.

ACK detects the release of the node and creates a new instance to maintain the expected number of nodes. This can result in unexpected costs. Handle expiring ECS instances promptly. You can either remove the node or renew the subscription for the ECS instance.

Manually enable health checks for Auto Scaling group instances from the Auto Scaling console or by calling an OpenAPI operation.

After health checks are enabled for an Auto Scaling group, a new ECS instance is automatically created whenever an unhealthy instance, such as a stopped instance, is detected.

By default, ACK does not enable Auto Scaling health checks. New ECS instances are created only when a node is released. Do not directly manage the Auto Scaling group of a node pool. This can cause the node pool to behave unexpectedly.

Error codes for scaling failures and solutions

Node pool scaling can fail for reasons such as insufficient inventory. To view the reason for the failure, click the name of the target cluster on the Clusters page. Then, on the Cluster Tasks tab, click View Cause.

The following table describes common error codes for scale-out failures.

Error code

Cause

Solution

RecommendEmpty.InstanceTypeNoStock

The ECS instance inventory in the current zone is insufficient.

Edit the node pool to add vSwitches in different zones and configure multiple instance types. This increases the probability of successful node creation.

The node pool list identifies node pools with low elastic strength. This lets you evaluate the availability of the node pool configuration and the health of instance provisioning. For more information, see View the elastic strength of a node pool.

NodepoolScaleFailed.FailedJoinCluster

The node failed to be added to the ACK cluster.

Log on to the scale-out node and run the grep cloud-init /var/log/messages command to view the execution log and retrieve the error message.

InvalidAccountStatus.NotEnoughBalance

Your account has an insufficient balance.

Add funds to your account and retry the operation.

InvalidParameter.NotMatch

The error message Image bootMode BIOS does not match instanceType bootMode indicates that the boot mode of the specified OS image is not compatible with the specified instance type.

Modify the instance type.

  • Click Details in the row of the target node pool to view information such as its operating system and image ID on the Overview tab.

  • You can query the instance types that an OS image supports by calling the DescribeImageSupportInstanceTypes operation in OpenAPI.

  • For more information about the images that ACK supports, see Operating system.

QuotaExceed.ElasticQuota

The number of ECS instances of the selected instance type in the current region exceeds your quota.

You can perform one of the following operations:

  • Select other instance types.

  • Reduce the current number of ECS instances.

  • Go to the Quota Center to request a quota increase.

InvalidResourceType.NotSupported

The specified ECS instance type is not supported or is out of stock in the current zone.

Call the DescribeAvailableResource operation to check whether the instance type is available in the zone. Then, modify the instance type of the node pool.

InvalidImage.NotSupported

The error message The specified image does not support vSGX instance. indicates that the node pool's OS image does not support security-enhanced instances.

Change the instance type.

  • You can click Details for the target node pool to view its information, such as the operating system and image ID, on the Overview tab.

  • You can query the instance types that an OS image supports by calling the DescribeImageSupportInstanceTypes operation in OpenAPI.

  • For more information about the OS images that are supported by the security-enhanced instance family, see Create an instance using the console.

InvalidParameter.NotMatch

The error message The specified instanceType only support vTPM image. indicates that the specified OS image does not support the security-enhanced instance family.

Change the instance type.

  • You can click Details for the target node pool to view its information, such as the operating system and image ID, on the Overview tab.

  • You can query the instance types that an OS image supports by calling the DescribeImageSupportInstanceTypes operation in OpenAPI.

  • For more information about the OS images that are supported by the security-enhanced instance family, see Create an instance using the console.

QuotaExceeded.PrivateIpAddress

The vSwitch has an insufficient number of available private IP addresses.

Configure more vSwitches for the node pool and then retry the operation.

InvalidParameter.KmsNotEnabled

The KMS key that you specified is not enabled.

Log on to the Key Management Service (KMS) console to check the key status.

InvalidInstanceType.NotSupported

The error message The specified instanceType is not supported by the image architecture. indicates that the instance type is not compatible with the OS image architecture.

Modify the instance type.

  • Click Details in the row of the target node pool to view information such as its operating system and image ID on the Overview tab.

  • You can query the instance types that an OS image supports by calling the DescribeImageSupportInstanceTypes operation in OpenAPI.

  • For more information about the images that ACK supports, see Operating system.

InsufficientBalance.CreditPay

Your account has an insufficient balance.

Add funds to your account to proceed.

ApiServer.InternalError

The error message an error on the server (\"Get \\\"https://192.168.xxx.xxx:xxx/api/v1/nodes\\\": dial tcp 192.168.xxx.xxx:xxx: connect: connection refused\") has prevented the request from succeeding indicates that the API Server of the ACK cluster cannot be accessed.

Check whether the cluster's API server is available and accessible. For more information, see Troubleshoot issues with accessing a cluster from the console.

RecommendEmpty.InstanceTypeNotAuthorized

The instance type that you specified requires authorization before use.

You can submit a ticket to ECS to request authorization.

Account.Arrearage

You have an insufficient account balance.

Add funds to your account before you proceed.

Err.QueryEndpoints

Access to the ACK cluster's API server has failed.

Check whether your cluster's API server is available and accessible. For more information, see Troubleshoot issues with accessing a cluster from the console.

RecommendEmpty.DiskTypeNoStock

The disk inventory in the specified zone is insufficient.

Add more zones (vSwitches) to the node pool, or change the disk type, and then retry the operation.

InvalidParameter.KMSKeyId.KMSUnauthorized

You are not authorized to access Key Management Service (KMS).

Log on to the ECS console to grant ECS the `AliyunECSDiskEncryptDefaultRole` service role, which allows ECS to access KMS. For more information, see Permissions for encryption.

InvalidParameter.Conflict

The error message The specified disk category (xxxx) is not support the specified instance type. indicates that the specified instance type does not support the specified disk category.

Change the instance type or disk type and then retry the operation.

NotSupportSnapshotEncrypted.DiskCategory

System disk encryption is supported only for Enhanced SSDs (ESSDs).

Select another disk type. For more information about disk types and encryption, see Create and manage node pools.

ScalingActivityInProgress

The node pool is already undergoing a scaling activity. Try again later.

To avoid scaling activity conflicts, do not scale nodes in or out directly from the Auto Scaling console.

Instance.StartInstanceFailed

The ECS instance failed to start.

Try the operation again later. To troubleshoot the issue, you can submit a ticket to ECS.

OperationDenied.NoStock

The ECS instance type that you selected is out of stock in the specified zone.

Change the instance types in the node pool configuration and try again.

Elasticity strength indicates the probability of a successful node pool scale-out based on real-time inventory. For more information, see View the elastic strength of a node pool.

NodepoolScaleFailed.WaitForDesiredSizeTimeout

The scale-out task timed out.

Perform the following steps to view the details of the scaling activity.

  1. Log on to the ACK console. In the navigation pane on the left, click Clusters.

  2. On the Clusters page, find the cluster you want and click its name. In the left-side pane, choose Nodes > Node Pools.

  3. Click the target node pool name to view the scale-out details on the Scaling Activities tab.

ApiServer.TooManyRequests

The API server is throttling the scale-out job.

The API server is throttling the scale-out job. Reduce the number of requests sent to the API server or retry the job later.

NodepoolScaleFailed.PartialSuccess

The scale-out was partially successful. Some nodes were created, but others failed due to insufficient inventory.

Select different instance types and retry the operation.

Elasticity strength indicates the probability of a successful node pool scale-out based on real-time inventory. For more information, see View the elastic strength of a node pool.

References

  • For detailed steps and precautions about removing a node from a cluster, see Remove a node.

  • For information about O&M operations for node pools, such as upgrading node pools, enabling automatic node recovery, and fixing OS CVEs, see Node pool O&M.

  • For information about best practices for node pools, such as using deployment sets to distribute nodes across different physical servers for high availability or creating node pools from spot instances, see Best practices for nodes and node pools.