Add a purchased Elastic Compute Service (ECS) instance as a worker node to an ACK cluster, or re-add a previously removed worker node to a node pool. ACK supports two modes: Auto and Manual. After the node joins, its billing method and instance type remain unchanged.
Prerequisites
Before you begin, make sure that:
-
The node quota in your cluster is sufficient. To increase the quota, submit a request in the Quota Center console.
-
The ECS instances belong to the same Alibaba Cloud account as the cluster, and are in the same region and Virtual Private Cloud (VPC).
-
The ECS instances do not belong to another cluster. If they do, remove them from that cluster first. See Remove a node from a cluster or node pool.
-
The node pool's Scaling Mode is not set to Auto. This feature does not support auto-scaling node pools.
-
(Recommended) Back up your data before proceeding. See Create a snapshot.
Limitations
Instance type limits
GPU-accelerated ECS Bare Metal instances of the ebmgn7 and ebmgn7e families cannot automatically delete their Multi-Instance GPU (MIG) configuration. ACK resets the MIG configuration on these instances automatically, which may take time and cause the add operation to fail. For troubleshooting steps, see What do I do if I fail to add ECS Bare Metal instances that are equipped with NVIDIA A100 GPUs?
Network limits
Terway
Terway restricts which instance types you can add, based on the maximum number of pods the instance supports:
-
Shared elastic network interface (ENI) mode: instance types with a maximum pod count of 10 or fewer cannot be added.
-
Exclusive ENI mode: instance types with a maximum pod count of 5 or fewer cannot be added.
The maximum pod count depends on how many ENIs the instance type supports. For calculation details and how to query the limit for a specific instance type, see Work with Terway. To resolve a limit violation, upgrade the instance type or create a new instance that meets the requirement.
Additional Terway requirements:
-
When adding a node in a new zone, update the vSwitch configuration of Terway. Otherwise, pods on that node get IP addresses from the primary ENI's vSwitch rather than the zone's vSwitch. See Modify pod vSwitches.
-
The node being added must have only one primary ENI. When the node joins a node pool, the bound ENI is retained and its vSwitch is used to assign pod IP addresses. If pods end up with IP addresses outside that vSwitch's range, remove the node, delete all secondary ENIs, then re-add it.
-
The cluster RAM role must be assigned to the ECS instance. Without it, the system cannot correctly calculate the maximum number of pods (MaxPod) and ENIs for the node. See Use RBAC to manage the operation permissions on resources in a cluster.
Flannel
The number of custom route entries in the VPC's system route table must not exceed the quota. To increase this quota, submit a request in the Quota Center console. For background on route tables, see Route tables.
IPv4/IPv6 dual-stack
Assign an IPv6 address to the primary ENI of the ECS instance before adding it. See IPv6 communication.
Security group limits
|
Limit |
Description |
Resolution |
|
Security group type |
An ECS instance's security group type must match the node pool's type. A basic security group and an advanced security group cannot coexist on the same instance. Check the node pool's security group on the Overview tab of the node pool details page. |
Replace the instance's security group with the node pool's security group (see Associate security groups with an instance), or remove the instance from the conflicting group before adding it. |
|
Security group rules |
The instance's security group rules must not conflict with those of the node pool or cluster. |
Add the node to the node pool's security group first, then use the security group rule check feature to verify each rule. |
|
Security group count |
An instance cannot belong to more than 5 security groups (default limit). Adding to a node pool adds it to the node pool's security group. |
See Security group limits for how to increase this quota. |
Choose a mode
|
Auto mode |
Manual mode |
|
|
What happens |
ACK resets the OS to match the node pool's OS, releases the system disk (data disk is retained with a new disk ID), and registers the node automatically. |
The OS is preserved as-is. You obtain an installation script and run it on the instance yourself. |
|
OS change |
OS is reset to the node pool's OS |
OS is retained |
|
System disk |
Released; automatic snapshot retention depends on the Delete Automatic Snapshots While Releasing Disk setting |
Not affected |
|
When to use |
You want a clean, consistent node configuration |
You need to retain the existing OS, or services are already deployed on the instance |
|
Limitation |
— |
Only one instance per operation |
Adding nodes does not affect existing nodes or applications in the cluster. Avoid initializing instances that already have services deployed on them.
Additional limits for Manual mode:
-
Cannot add instances running ContainerOS. See ContainerOS.
-
Swap must be disabled on the instance's OS.
-
If you want to store container data and images on a data disk, the disk must use the ext or XFS file system.
Navigate to the node pool
-
Log on to the ACK console. In the left navigation pane, click Clusters.
-
On the Clusters page, click the cluster name. In the left navigation pane, choose Nodes > Node Pools.
-
Find the target node pool. In the Actions column, click
> Add Existing Node.
Add nodes manually
Only one ECS instance can be added per manual operation.
-
On the Select Existing ECS Instance page, set Mode to Manual, select the instance, and click Next Step.
-
On the Specify Instance Information page, configure the following parameters and click Next Step.
-
Cluster ID/Name: Automatically populated. Identifies the target cluster.
-
Data Disk: Specifies whether to use a data disk for container and image data. > Important: Formatting a disk erases all data. Back up the disk before proceeding.
-
If a data disk is attached and the file system of the last data disk is not initialized, the system formats it to ext4 and uses it for
/var/lib/dockerand/var/lib/kubelet. -
If no data disk is attached, the system does not provision one.
-
-
Retain Instance Name: Enabled by default. Disable it if you want nodes renamed according to the node naming rules.
-
Instance Information: The IDs and names of the selected instances.
-
-
On the Complete page, copy the installation command and click Done.
-
Log on to the ECS console. In the left navigation pane, choose Instances & Images > Instances. Select the region and find the target instance.
-
In the Actions column, click Connect and select a connection method. See Methods for connecting to an ECS instance.
-
Run the installation command you copied. The script registers the instance as a worker node and adds it to the cluster.
-
To verify, go back to the Node Pools page. Click Details in the Actions column of the node pool, then check the Nodes tab to confirm the node appears.
Add nodes automatically
All ECS instances in the current account that meet the requirements are listed for selection.
-
On the Select Existing ECS Instance page, set Mode to Auto, select the instances, and click Next Step. If no instances appear in the list, the available instances do not meet the requirements. Select Show Unavailable Instances to see why each instance is ineligible. If the list remains empty after selecting that option, check that the instances are in the same region and VPC as the cluster, and review the Limitations and Prerequisites sections.
-
On the Specify Instance Information page, configure the same parameters as in the manual flow (Cluster ID/Name, Data Disk, Retain Instance Name, Instance Information) and click Next Step.
-
In the Confirm dialog, review the precautions and click Confirm.
-
To verify, go to the Node Pools page. Click Details in the Actions column of the node pool, then check the Nodes tab to confirm the nodes appear.
Usage notes
-
Deleting a cluster or node pool does not release the ECS instances added to it. Release them manually to avoid unnecessary charges. See Remove a node from a cluster or node pool and Billing overview.
-
When ACK resets the system disk (Auto mode), user snapshots of that disk are retained. Automatic snapshots are retained or deleted based on the Delete Automatic Snapshots While Releasing Disk setting of the system disk, which you can change in the ECS console. See Enable the Delete Automatic Snapshots While Releasing Disk attribute.
-
Delete snapshots you no longer need to maintain enough snapshot quota for automatic snapshot policies.
FAQ
Does adding or removing nodes affect the expected node count of a node pool?
Yes. Each time you add an existing node, the Expected Nodes count for the node pool increases by the number of nodes added. For example, if Expected Nodes is 5 and you add one instance, it becomes 6.
What do I do if a timeout error occurs after adding a node?
Check network connectivity between the node and the Classic Load Balancer (CLB) instance of the API server. Also verify that the security group rules meet the requirements described in Security group limits. For other network connectivity issues, see FAQ about network management.
Can I add nodes of different instance types to the same node pool?
Yes. Mixing instance types in a node pool reduces the risk of scale-out failures when a specific type is unavailable or out of stock. To do this:
-
Create or update the node pool to include the desired instance types. See Create and manage a node pool.
-
Drain and remove the existing nodes without releasing the ECS instances. See Remove a node from a cluster or node pool.
-
Re-add the instances following the steps in this topic.
How do I move a node from one ACK cluster to another?
Nodes cannot be moved directly between clusters. To transfer a node:
-
Drain and remove the node from the source cluster without releasing the ECS instance. See Remove a node from a cluster or node pool.
-
Add the ECS instance to the target cluster following the steps in this topic.
Does upgrading or downgrading an ECS instance affect the cluster?
ECS instance upgrades or downgrades can change the instance type, bandwidth, and billing method. The impact depends on whether a restart is required:
-
No restart required: Assess the impact based on your workload before proceeding.
-
Restart required (for example, instance type changes): The node becomes unavailable during the restart. Before proceeding, determine whether additional nodes are needed to host any evicted pods, drain the node, and remove it from the cluster. See Remove a node from a cluster or node pool and Upgrade or downgrade worker node resources. After the change is complete, re-add the node following the steps in this topic.
For an overview of instance configuration change options, see Overview of instance configuration changes.
What's next
-
Add nodes using the API or CLI instead of the console: Manually add an existing instance to a specified node pool and Add existing ECS instances.
-
Move nodes that don't belong to any node pool (free nodes in older clusters) into a node pool: Add free nodes to a node pool.
-
Troubleshoot node or pod issues: Troubleshoot node exceptions, Pod troubleshooting, and FAQ about nodes and node pools.