This topic provides answers to some frequently asked questions about node management.
- How do I manually update the kernel version of GPU-accelerated nodes in a cluster?
- What do I do if no container is launched on a GPU-accelerated node?
- How do I resolve the issues that occur when I add nodes to a cluster?
- How do I fix the "drain-node job execute timeout" error that occurs when I remove a node?
- How do I change the hostname of a worker node in an ACK cluster?
- How do I change the operating system for a node pool?
- What are the differences between node pools that are configured with the Expected Nodes parameter and those that are not configured with this parameter?
How do I manually update the kernel version of GPU-accelerated nodes in a cluster?
3.10.0-957.21.3
.
Confirm the kernel version to which you want to update. Proceed with caution when you perform the update.
The following procedure shows how to update the NVIDIA driver. Details about how to update the kernel version are not shown.
What do I do if no container is launched on a GPU-accelerated node?
service kubelet stop
Redirecting to /bin/systemctl stop kubelet.service
service docker stop
Redirecting to /bin/systemctl stop docker.service
service docker start
Redirecting to /bin/systemctl start docker.service
service kubelet start
Redirecting to /bin/systemctl start kubelet.service
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
docker info | grep -i cgroup
Cgroup Driver: cgroupfs
The output shows that the cgroup driver is set to cgroupfs. To resolve the issue, perform the following steps:
How do I change the hostname of a worker node in an ACK cluster?
How do I change the operating system for a node pool?
Update the operating system
If you want to update the operating system for a node pool, modify the node pool in the ACK console. For example, if you want a node pool to use the latest CentOS version, modify the node pool in the ACK console. After you modify the node pool, only nodes that are newly added to the node pool use the new operating system version. If you want to update the operating system of the existing nodes in the node pool, you must add intermediate nodes to the node pool first.
Change the type of operating system
If you want to change the type of operating system for a node pool, for example, from CentOS to Alibaba Cloud Linux, you must create a new node pool. This is because you cannot change the type of operating system used by the nodes in a node pool.
What are the differences between node pools that are configured with the Expected Nodes parameter and those that are not configured with this parameter?
The Expected Nodes parameter specifies the number of nodes that you want to keep in a node pool. You can change the value of this parameter to adjust the number of nodes in the node pool. This feature is disabled for existing node pools that are not configured with the Expected Nodes parameter.
Node pools that are configured with the Expected Nodes parameter and those that are not configured with this parameter have different reactions to operations such as removing nodes and releasing ECS instances. The following table shows the details.
Operation | Node pool that is configured with the Expected Nodes parameter | Node pool that is not configured with the Expected Nodes parameter | Suggestion |
---|---|---|---|
Remove specified nodes in the ACK console or by calling the ACK API | The value of the Expected Nodes parameter automatically changes based on the number of nodes that you removed. For example, the value of the Expected Nodes parameter is 10 before you remove nodes. After you remove three nodes, the value is changed to 7. | The specified nodes are removed as expected. | To scale in a node pool, we recommend that you use this method. |
Remove nodes by running the kubectl delete node command.
|
The value of the Expected Nodes parameter remains unchanged | The nodes are not removed. | We recommend that you do not use this method to remove nodes. |
Manually release ECS instances in the ECS console or by calling the ECS API. | New ECS instances are automatically added to the node pool to keep the expected number of nodes. | No ECS instances are added to the node pool. After you release the ECS instances, the nodes remain in the Unknown state before they are removed from the Nodes list of the node pool details page in the ACK console. | This operation may cause an inconsistency among the ACK console, Auto Scaling console, and the actual condition. We recommend that you do not use this method to remove nodes. To remove nodes, we recommend that you use the ACK console or call the ACK API. For more information, see Remove a node. |
The subscriptions of ECS instances expire. | New ECS instances are automatically added to the node pool to keep the expected number of nodes. | No ECS instances are added to the node pool. After the subscriptions of ECS instances expire, the nodes remain in the Unknown state before they are removed from the Nodes list of the node pool details page in the ACK console. | This operation may cause an inconsistency among the ACK console, Auto Scaling console, and the actual condition. We recommend that you do not use this method to remove nodes. To remove nodes, we recommend that you use the ACK console or call the ACK API. For more information, see Remove a node. |
Manually enable the health check feature of Auto Scaling for ECS instances in a scaling group and the ECS instances fail to pass health checks due to reasons such as that the ECS instances are suspended. | New ECS instances are automatically added to the node pool to keep the expected number of nodes. | New ECS instances are automatically added to replace the ECS instances that are suspended. | We recommend that you do not perform operations on the scaling group of a node pool. |
Remove ECS instances from scaling groups in the Auto Scaling console without changing the value of the Expected Nodes parameter. | New ECS instances are automatically added to the node pool to keep the expected number of nodes. | No ECS instances are added to the node pool. | We recommend that you do not perform operations on the scaling group of a node pool. |