Add existing ECS instances to a node pool as ACK worker nodes - Container Service for Kubernetes

You can add existing ECS instances to a cluster as worker nodes, or re-add worker nodes that were removed from a node pool. This allows you to quickly reuse computing resources without interrupting existing workloads.

ACK provides two methods for adding nodes: auto mode and manual mode. An instance's billing method and instance type remain unchanged after it is added.

Item

Auto mode

Manual mode

OS reset

Resets and initializes the instance's operating system based on the current configuration of the node pool.

The original system disk is released and its data is lost.
Manual snapshots of the system disk are retained. The retention of automatic snapshots depends on whether the Delete Automatic Snapshots While Releasing Disk setting is enabled.

To configure this setting, log on to the ECS console and refer to Set the 'Delete Automatic Snapshots While Releasing Disk' attribute.
The original data disk is not released and its data is unaffected, but the disk ID changes.

Preserves the instance's original operating system for greater flexibility.

Use cases

You want the instance configuration to be consistent with the node pool for standardized management.

You need to preserve the instance's existing operating system or specific configurations.

Limitations

Before you begin, make sure that your environment and instances meet the following requirements.

Category	Item	Description
Instance and node pool	Cluster node quota	The total number of nodes in the cluster cannot exceed the quota. To request a quota increase, go to Quota Center. The default node quota for an ACK Basic cluster is 10.
	Instance ownership	The instance and the cluster must be in the same Alibaba Cloud account, region, and VPC. Otherwise, migrate the instance or create a new instance or cluster that meets the requirements. You cannot add an ECS instance from the other end of a VPC peering connection.
	Cluster ownership	You cannot add an instance that already belongs to another ACK cluster. You must first remove the node from the original cluster before adding it to the new one.
	Scaling group (ESS) ownership	You cannot add an instance that is already part of another scaling group. You must manually remove it from the scaling group first.
	Node pool type	You cannot add existing nodes to a node pool in auto mode. You cannot add existing nodes to a node pool that has auto scaling enabled. After you add an existing node to a node pool, you can no longer enable auto scaling for that node pool.
	Operating system	Swap must be disabled on the operating system. In manual mode, you cannot add instances that run Windows or ContainerOS. For more information, see Operating systems.
	Instance type	The instance must not be an unsupported ECS instance type. For ECS Bare Metal GPU instances (ECS instance families ebmgn7 and ebmgn7e), automatic Multi-Instance GPU (MIG) cleanup is not supported. When you add such an instance, its existing MIG settings are reset. This reset might take an extended time and cause the automatic node addition to fail. For troubleshooting information, see What do I do if I fail to add a bare metal instance node?.
Network	API Server access	The IP address of the instance must be in the API Server access whitelist. Otherwise, the node cannot communicate with the control plane. For more information, see Configure access control for the API Server.
	Security group	Type consistency: The security group type of the instance (basic or enterprise) must match that of the node pool. Rule compatibility: The instance's security group must allow access to the API Server internal endpoint on port 6443, and its rules must not conflict with the security group rules of the cluster and node pool. You can find the API Server internal endpoint on the Basic Information tab of the Cluster Information page. Quota: The number of security groups that an instance can join must not exceed the security group quota. To change the security group type of an instance or to add an instance to the node pool's security group in advance, see Associate a security group with an instance (primary ENI). To request a security group quota increase, see View or increase ECS quotas.
	Terway - Maximum pods	The maximum number of pods that the instance supports must meet the following requirements: The maximum number of pods supported in different elastic network interface (ENI) modes depends on the maximum number of ENIs that the instance type supports. For information about how to calculate this limit, see How to calculate the pod quota for a node. For shared ENI mode, the maximum number of supported pods must be greater than 10. For exclusive ENI mode, the maximum number of supported pods must be greater than 5. If the requirement is not met, upgrade or downgrade node resources or purchase a new instance.
	Terway - vSwitch configuration	If the instance and the node pool are in different availability zones, you must update the Terway vSwitch configuration. Otherwise, Terway allocates pod IP addresses from the vSwitch of the node's primary ENI, which can cause pod IP allocation errors. For more information, see Modify pod vSwitches.
	Terway - ENI	When you add the instance, its existing bound ENIs are retained, and pod IP addresses are allocated from the vSwitches associated with these ENIs. Ensure the instance has only one primary ENI. If a pod IP address does not belong to a configured vSwitch, remove the node from the cluster, delete all secondary ENIs, and then add the node back to the cluster.
	Terway - Worker RAM role	The instance must be bound to the node pool's Worker RAM role to prevent permission issues that could lead to an incorrect calculation of the maximum available pods (MaxPod). On the Node Pools page, click a node pool name to view its Worker RAM role on the Basic Information tab. To grant the RAM role, see Grant a RAM role to an ECS instance.
	Terway - IPv6 dual-stack	If IPv6 dual-stack is enabled for the cluster, you must assign an IPv6 address to the instance's primary ENI. For more information, see IPv6 communication.
	Flannel	The number of custom route entries in the system route table of the cluster's VPC cannot exceed the route table quota. To request a quota increase, go to Quota Center.

Usage notes

Data backup: Before you begin, create a manual snapshot of the instance's system disk and data disks to prevent data loss.

To ensure you have sufficient snapshot quota, we recommend deleting unnecessary manual and automatic snapshots to avoid creation failures.
Instance release and billing: For node pools that do not have the expected number of nodes enabled, instances added to the node pool are not released when you delete the cluster or node pool. You must manually remove the nodes. Monitor the ECS billing status to avoid unexpected charges.

Procedure

Time required: The node addition process, which includes system disk replacement (auto mode only) and node initialization, takes about 5 minutes. The actual time may vary depending on network conditions, OS image size, and other factors.

Adding an existing node does not affect the existing nodes and applications in the cluster. To avoid compatibility issues, we recommend that you do not initialize an ECS instance that already has services created on it as a worker node.

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click Nodes > Node Pools.
On the Node Pools page, find the target node pool, click in the Actions column, and then click Add Existing Node.

If the target ECS instance is not in the server list, it does not meet the conditions for being added to the cluster. You can select Show Unavailable Instances to view the unavailable ECS instances and the reasons. For more information about the reasons, see Limitations and Usage notes.
Read the on-screen notes carefully and select a method for adding the node.

Manual add

With this method, you obtain an installation command and run it on the target instance. You can add only one ECS instance at a time.

Set Method to Manual. In the list of existing cloud servers, select the ECS instance to add, and then click Next.

On the Specify Instance Information page, confirm the cluster and instance details. Configure the data disk and instance name, then click Next.

Parameter	Description
Data Disk	Specifies whether to store container and image data on a data disk. This separates the system disk from data disks to improve stability. If the ECS instance has a data disk attached and the file system of the last data disk is not initialized, ACK automatically formats the last data disk as ext4. This disk is then used exclusively to store data in /var/lib/containerd or /var/lib/docker (the default data directories for container runtimes) and /var/lib/kubelet (the default data directory for the kubelet component). Important The existing data on the formatted data disk will be lost. We recommend that you create a snapshot to back up your data in advance. If you store containers and images on a data disk, only the ext4 and xfs file systems are supported. If the ECS instance does not have a data disk attached, ACK does not automatically attach a new data disk, regardless of whether you select this option.
Retain Instance Name	Enabled: Uses the instance name as the node name. Disabled: ACK renames the node based on the custom node naming rules.

On the Complete page, copy the node join command automatically generated by ACK for use in a later step, and then click Finish.
Log on to the ECS console. In the left-side navigation pane, choose Instances & Images > Instance. Select the region where the cluster is located, and then select the target instance.
Click Connect for the target instance and select a remote connection method.
Follow the on-screen instructions to enter and run the script you copied in step 3 to automatically configure and add the instance to the cluster.

After the script runs successfully, a success message appears. In the node list, wait for the new node's status to change to Ready.
```
Worker node joined successfully
+ exit_code=0
+ set +x
```

Auto add

You can automatically add instances from the console.

Set Method to Auto. From the list of existing cloud servers, select the desired ECS instances and click Next.

On the Specify Instance Information page, confirm the cluster and instance information as prompted. Configure the data disk and instance name, and then click Next.

Parameter	Description
Data Disk	Specifies whether to store container and image data on a data disk. This separates the system disk from data disks to improve stability. If the ECS instance has a data disk attached and the file system of the last data disk is not initialized, ACK automatically formats the last data disk as ext4. This disk is then used exclusively to store data in /var/lib/containerd or /var/lib/docker (the default data directories for container runtimes) and /var/lib/kubelet (the default data directory for the kubelet component). Important The existing data on the formatted data disk will be lost. We recommend that you create a snapshot to back up your data in advance. If you store containers and images on a data disk, only the ext4 and xfs file systems are supported. If the ECS instance does not have a data disk attached, ACK does not automatically attach a new data disk, regardless of whether you select this option.
Logon method and password	If the node pool's Logon Type is configured as Password, you must reset the instance password.
Retain Instance Name	Enabled: Uses the instance name as the node name. Disabled: ACK renames the node based on the custom node naming rules.

In the dialog box that appears, read the notes carefully and then click OK.

After the node is added, you can wait for it to initialize in the node list until its status changes to Ready.

FAQ

Does adding nodes affect workloads?

Adding an existing node, in either manual or auto mode, does not affect existing cluster workloads.

How does instance scaling affect workloads?

Upgrading or downgrading an ECS instance can include changing the instance type, public bandwidth billing method, public bandwidth, or data disk billing method. For more information, see Overview of instance configuration changes. The impact on the ECS instance varies based on the upgrade or downgrade method.

Operations that do not require a restart: The impact on your business depends on your specific scenario.
Operations that require an ECS instance restart: Operations such as upgrading or downgrading the instance type cause service disruptions. Before you perform such an operation, such as upgrading or downgrading node resources, we recommend that you check the current workload to determine if you need to add redundant nodes to take over the pods. Then, drain the node to be upgraded or downgraded and remove it from the scaling group and the ACK cluster. For more information, see Remove nodes.

After the upgrade or downgrade is complete, add the node back to the cluster by following the instructions in this topic.

Can I use different instance types?

Yes. ACK allows you to manage nodes of multiple instance types in the same node pool. This helps prevent scale-out failures caused by instance type unavailability or insufficient inventory. Before you add an ECS instance, make sure that its instance type is included in the node pool's list of instance types. Follow these steps:

Edit or create a node pool and configure the required node instance types. For more information, see Create and manage a node pool.
Drain and remove the target node. Do not release the ECS instance. For more information, see Remove nodes.
Add the ECS instances of different instance types to the node pool by following the instructions in the Limitations and Procedure sections of this topic.

How to move nodes between clusters?

ACK does not support moving nodes directly between clusters. However, you can achieve this by adding an existing node. Follow these steps:

Drain and remove the target node from the source cluster. Do not release the ECS instance. For more information, see Remove nodes.
Add the target ECS instance to a node pool in the destination cluster by following the instructions in the Limitations and Procedure sections of this topic.

Can I add a node with an EOL OS?

Manual mode: You can manually add an existing instance that runs an unsupported operating system to a node pool. However, you must ensure that the OS version of the instance is compatible with the current cluster version. For more information, see Operating systems.

For example, CentOS 7 and Alibaba Cloud Linux 2 are supported only in clusters of version 1.30 and earlier.
Auto mode: Yes. ACK initializes the instance using the OS image specified in the node pool configuration.

Is user data overwritten when adding nodes?

Whether the original instance's user data is overwritten depends on the addition method.

Auto mode: ACK initializes the system disk, overwriting the instance's original user data with the user data configured for the node pool.
Manual mode: The user data of the original instance is not overwritten. After the instance is added to the node pool, it continues to use its original user data.

How to fix node addition timeouts?

Check the network connectivity between the node and the API Server. First, verify that the security group meets the requirements. For information about security group limitations when adding an existing node, see Limitations. For information about other network connectivity issues, see Network management FAQ.

Does adding a node change the expected count?

Yes. After you add an existing node, the Expected Nodes count increases by the number of nodes added. For example, if the Expected Nodes for a node pool is set to 5 and you add one ECS instance to the node pool, the count automatically becomes 6.

References

In addition to the console, you can add ECS instances to an ACK cluster by calling an API operation (Obtain the script for adding existing nodes to a node pool) or running a CLI command (Add existing ECS instances).
Older clusters created before the node pool feature was released may contain free nodes (nodes that do not belong to any node pool). For centralized management, you can migrate them to a node pool. For more information, see Migrate free nodes to a node pool.
If a node, pod, or another component is not working as expected, you can troubleshoot the issue. For more information, see Troubleshoot node exceptions, Troubleshoot pod exceptions, and FAQ about nodes and node pools.