Update a node pool for an ACK cluster - Container Service for Kubernetes

This topic describes how to update a node pool without affecting the data disks. The system runs a precheck before updating a node pool. The precheck notifies you of the risks that may impact the update. You can update the kubelet, operating system, and container runtime in any combination based on your business requirements.

Usage notes

To use the node pool update feature, make sure that the node pool contains at least one node.
When the system updates a node pool by replacing system disks, the nodes in the node pool are drained. The system performs pod eviction based on the specified PodDisruptionBudget (PDB) during node draining and the graceful shutdown period of pods is 30 minutes. If the pods on a node are not completely evicted within 30 minutes, the node is still updated. For more information, see PodDisruptionBudgets.

Feature overview

The node pool update feature allows you to update the following components in any combination.

Component	Description	Update procedure	Consideration
kubelet	When the control planes use a new kubelet version, you can update the kubelet on all nodes in a node pool to the new version. The new kubelet version that a node pool can use depends on the kubelet version used by the control plane.	If you select only Kubelet Update, an in-place update is performed to update the kubelet. If you also select Change Operating System, the update method that is used to change the operating system is applied. If you select Upgrade Node Pool by Replacing System Disk, the kubelet is updated by replacing the system disks of nodes.	For more information about the update notes for the kubelet, see Manually update ACK clusters.
Operating system	You can perform the following operations: Update the operating system. Change the operating system. Note The supported operating systems are displayed in the Available Version parameter on the Node Pool Upgrade page of the Container Service for Kubernetes (ACK) console.	You can select Change Operating System and update the nodes in a node pool by replacing system disks. You can select Create Snapshot before Update to create snapshots for the nodes in the node pool.	To update or change the operating system of the nodes in a node pool, the system replaces the system disks of the nodes in the node pool in batches. Therefore, do not store important data in the system disks or back up the data before you update the node pool. The data disks are not affected by the update.
Container runtime	If a new container runtime version is available, you can update the container runtime of the nodes in a node pool to the new version.	Select Runtime Update. If you change the container runtime from Docker to containerd, the change is applied by replacing the system disks of the nodes in the node pool. For more information, see Change the container runtime from Docker to containerd. If you choose to update the containerd version, an in-place update is performed on the node pool by default. The `/etc/containerd/config.toml` file on each node is replaced with the new version provided by ACK. You can select Upgrade Node Pool by Replacing System Disk to perform the update by replace system disks. You can select Create Snapshot before Update to create snapshots for the nodes in the node pool.	If the runtime update is performed by replacing system disks, do not store important data in the system disks or back up the data before you update the node pool. The data disks are not affected by the update. During the runtime update, pod probes and lifecycle hooks may fail to run, and pods may perform in-place restarts.

Note

When the system replaces the system disks of the nodes in a node pool, the system uses the node pool configuration to render the node component parameters. This ensures that the configurations of the node components and node pool are the same.

Procedure

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Nodes > Node Pools.
On the Node Pools page, find the node pool that you want to update and choose More > Upgrade in the Actions column.Upgrade
Select the update items (node OS, container runtime, and the kubelet), configure the parameters, and then click Precheck. After the precheck is completed, click Start Update.
The following content describes the Batch Update Policy:
- Update methods
  - Specify whether to update a node pool by replacing system disks. After you select this update method, do not store important data in the system disks or back up the data before you update the node pool.
  - Specify whether to create a snapshot before the update. Snapshots can be used to back up and restore data. For more information about snapshots and snapshot billing, see Overview and Snapshots.
- Maximum Number of Nodes per Batch: You can specify the maximum number of nodes that can be concurrently updated in a batch. Maximum value: 10. For more information, see In-place updates and updates by replacing system disks.
- Automatic Pause Policy: Specify whether to pause the update or set the frequency of pauses. You can select Automatic to monitor the health status of each batch.
- Interval Between Batches: If Automatic Pause Policy is not configured, you can choose whether to set an interval between batches. The interval between batches can be set to 5 to 120 minutes.
Note
If the cluster fails to pass the precheck or the precheck result contains warnings, refer to Suggestions on how to fix cluster issues to troubleshoot the issues or click View Details to view detailed information on the Report page.

What to do next

During the update process, you can perform the following operations in the Event Rotation section:

Pause or resume the update: If you want to pause or resume the update, click Pause or Continue.
- Do not perform operations after you pause the node pool update. In addition, we recommend that you resume and complete the update at your earliest convenience. If the update is paused for more than 7 days, the system automatically terminates the update process. The events and log data that are generated during the update process are also deleted.
- After you click Pause, you cannot roll back the kubelet and container runtime after you update them. You can roll back only the OS image after you update it, provided that the original OS image that you want to use is supported by the node pool.
Cancel the update: If you want to cancel the update, click Cancel.
After you click Cancel, the update is canceled. However, this operation cannot roll back the kubelet and container runtime after you update them. You can roll back only the OS image after you update it, provided that the original OS image that you want to use is supported by the node pool.

In-place updates and updates by replacing system disks

Procedure

The following section describes the procedure for in-place updates and updates by replacing system disks. You can specify the maximum number of nodes that can be concurrently updated in a batch. The maximum concurrency supported is 10. The number of nodes to be updated per batch increases batch by batch in the following sequence: 1, 2, 4, 8... After the maximum concurrency is reached, the maximum number of nodes to be updated in each batch is equal to the maximum concurrency. If you set the maximum concurrency to 4, one node is updated in the first batch, two nodes are concurrently updated in the second batch, and four nodes are concurrently updated in the third batch and subsequent batches.

The following figure shows the batch update process when the maximum concurrency is N. The number of nodes to be updated per batch increases batch by batch in the following sequence: 1, 2, 4, 8, ..., N

How an in-place update is performed on a node

Perform the precheck before the update. If the container has critical issues such as ttrpc request processing failures or container processes not responding to signals, the update is suspended.
Save the current container and pod status to the tmp temporary directory.
Update containerd, crictl, and related configuration files to the latest versions provided by ACK, and restart containerd (this operation does not affect running containers). If you have modified the /etc/containerd/config.toml file on the node, this update will overwrite your changes.
Ensure that the kubelet is running as normal and the node is ready.

How a node is updated by replacing the system disk

The node is drained. When the node is drained, it is set to unschedulable.
The Elastic Compute Service (ECS) instance is stopped.
The system disk is replaced and the disk ID is changed. The category of the system disk, the IP addresses of the ECS instance, and the MAC addresses of the elastic network interfaces (ENIs) that are bound to the ECS instance remain unchanged.
The node is re-initialized.
The node is restarted and ready. When the node is ready, it is set to schedulable.

FAQ

How do I restore data from snapshots?

You can create a snapshot for the nodes in a node pool when you update the node pool. By default, the snapshot is retained for 30 days. You can manually delete the snapshot before its retention period ends. If the data is lost after you update a node pool, you can use the following methods to restore the data:

If an in-place update is performed to update only the kubelet, you can use a snapshot to roll back the disk. For more information, see the Roll back a disk by using a snapshot.
If the operating system or container runtime is updated by replacing the system disks of the nodes in the node pool, you can create a disk from the snapshot. For more information, see Create a disk from a snapshot.

How long does it require for updating the nodes in a batch?

In-place update: within 5 minutes.

Update by replacing the system disk: within 8 minutes if snapshots are not created. If you select Create Snapshot before Update, ACK starts to update the nodes after the snapshots are created. The timeout period of snapshot creation is 40 minutes. If snapshot creation times out, the node update fails to start. If no business data is stored in the system disks, we recommend that you clear Create Snapshot before Update.

Are applications affected during the update?

In-place update: Pods are not restarted and applications are not affected.

Update by replacing the system disk: Nodes are drained during the update. If an application runs in multiple pods that are spread across multiple nodes and graceful shutdown is enabled for the pods, the application is not affected. For more information about graceful shutdown, see Graceful shutdown and zero downtime deployments in Kubernetes. To ensure that all the nodes that run the pods of an application are not updated in the same batch, we recommend that you set the maximum concurrency to a value less than the number of pods in which the application runs.

Can I roll back a node pool after I update the node pool?

You cannot roll back the kubelet and container runtime after you update them. You can roll back only the OS image after you update it, provided that the original OS image that you want to use is supported by the node pool.

Does data loss occur when a node is updated?

If the runtime update is performed by replacing system disks, do not store important data in the system disks or back up the data before you update the node pool. The data disks are not affected by the update.

Does the IP address of a node change after the system disk of the node is replaced?

After the system disk is replaced, the disk ID is changed but the category of the system disk, the IP addresses of the ECS instance, and the MAC addresses of the ENIs that are bound to the ECS instance remain unchanged. For more information, see Replace the operating system of an instance.

How do I update free nodes?

Nodes that are not added to node pools are called free nodes. Free nodes exist in clusters that were created before the node pool feature is released. To update a free node, add the node to a node pool and then update the node pool. For more information about how to add a free node to a node pool, see Add free nodes to a node pool.

What do I do if the Docker directory still exists and occupies disk space after I change the container runtime of a node from Docker to containerd?

In addition to cluster-related containers, images, and logs, the file paths that you created are also included in the Docker directory. If you no longer need the data in the Docker directory, you can manually delete the directory.

Container Service for Kubernetes:Update a node pool

Usage notes

Feature overview

Procedure

What to do next

In-place updates and updates by replacing system disks

Procedure

How an in-place update is performed on a node

How a node is updated by replacing the system disk

FAQ

How do I restore data from snapshots?

How long does it require for updating the nodes in a batch?

Are applications affected during the update?

Can I roll back a node pool after I update the node pool?

Does data loss occur when a node is updated?

Does the IP address of a node change after the system disk of the node is replaced?

How do I update free nodes?

What do I do if the Docker directory still exists and occupies disk space after I change the container runtime of a node from Docker to containerd?

References