FAQ about nodes and node pools - Container Service for Kubernetes

This topic provides answers to some frequently asked questions (FAQ) about nodes and node pools. For example, you can obtain answers to questions such as how to change the maximum number of pods that are supported by a node, how to change the operating system for a node pool, and how to solve the timeout error related to a node.

How do I change the operating system for a node pool?
Can I leave the Expected Nodes parameter empty when I create a node pool?
What are the differences between node pools that are configured with the Expected Nodes parameter and those that are not configured with this parameter?
How do I add free nodes to a node pool?
How do I use preemptible instances in a node pool?
How do I modify the configurations of a node?
How do I release a specific ECS instance?
How do I update the container runtime of a worker node that does not belong to a node pool?
What do I do if a timeout error occurs after I add an existing node?
How do I change the hostname of a worker node in an ACK cluster?
What is the path of the kubelet in an ACK cluster? Can I use a custom path?
When a cluster that contains nodes in different zones fails, how does the cluster evict pods from nodes?

How do I change the operating system for a node pool?

The method used to change the operating system for a node pool is similar to that used to update a node pool. To change the operating system for a node pool, perform the following steps:

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Nodes > Node Pools.
On the Node Pools page, find the node pool that you want to modify and choose More > Upgrade in the Actions column.
Select Change Operating System, select the image that is used to replace the original image, and then click Start Update.
Note
By default, Kubelet Update and Upgrade Node Pool by Replacing System Disk are selected when you change the operating system for a node pool. Select Create Snapshot before Update based on your business requirements.

Am I able to leave the Expected Nodes parameter empty when I create a node pool?

No, you cannot leave the Expected Nodes parameter empty when you create a node pool.

For more information about how to remove or release a node, see Remove a node. For more information about how to add a node, see Add existing ECS instances to an ACK cluster. After you remove nodes from or add existing nodes to a cluster, the value of the Expected Nodes parameter is automatically set to the actual number of nodes after the modification.

What are the differences between node pools that are configured with the Expected Nodes parameter and those that are not configured with this parameter?

The Expected Nodes parameter specifies the number of nodes that you want to keep in a node pool. You can change the value of this parameter to modify the number of nodes in the node pool. This feature is disabled for existing node pools that are not configured with the Expected Nodes parameter.

Node pools that are configured with the Expected Nodes parameter and those that are not configured with this parameter have different reactions to operations such as removing nodes and releasing Elastic Compute Service (ECS) instances. The following table describes the details.

Operation	Node pool that is configured with the Expected Nodes parameter	Node pool that is not configured with the Expected Nodes parameter	Suggestion
Decrease the expected number of nodes by calling the API operations of Container Service for Kubernetes (ACK) or using the ACK console.	After you decrease the expected number of nodes, the nodes in the node pool are reduced until the number of existing nodes in the node pool is equal to the specified expected number of nodes.	If the number of existing nodes in the node pool is greater than the expected number of nodes, the system reduces the nodes in the node pool until the number of existing nodes in the node pool is equal to the expected number of nodes. At the same time, the system enables the Expected Nodes feature.	N/A
Remove specific nodes in the ACK console or by calling the API operations of ACK.	The value of the Expected Nodes parameter automatically changes based on the number of nodes that you removed. For example, the value of the Expected Nodes parameter is 10 before you remove nodes. After you remove three nodes, the value of this parameter is changed to 7.	The specified nodes are removed as expected.	N/A
Remove nodes by running the `kubectl delete node` command.	The value of the Expected Nodes parameter remains unchanged.	The nodes are not removed.	We recommend that you do not use this method to remove nodes.
Manually release ECS instances in the ECS console or by calling the API operations of ECS.	New ECS instances are automatically added to the node pool to keep the expected number of nodes.	The node pool does not respond to the operation. No ECS instances are added to the node pool. After the subscriptions to ECS instances expire, the nodes remain in the Unknown state before they are removed from the Nodes list of the node pool details page in the ACK console.	We recommend that you use the recommended method instead of this method to remove nodes. Otherwise, the data of ACK and Auto Scaling may be inconsistent with the actual data. For more information, see Remove nodes.
The subscriptions to ECS instances expire.	New ECS instances are automatically added to the node pool to keep the expected number of nodes.	The node pool does not respond to the operation. No ECS instances are added to the node pool. Nodes that are deleted from the node pool remain in the Unknown state for a period of time.	We recommend that you use the recommended method instead of this method to remove nodes. Otherwise, the data of ACK and Auto Scaling may be inconsistent with the actual data. For more information, see Remove nodes.
Manually enable the health check feature of Auto Scaling for ECS instances in a scaling group and the ECS instances fail to pass health checks due to reasons such as that the ECS instances are suspended.	New ECS instances are automatically added to the node pool to keep the expected number of nodes.	New ECS instances are automatically added to replace the ECS instances that are suspended.	We recommend that you do not perform operations on the scaling group of a node pool.
Remove ECS instances from a scaling group by using Auto Scaling without modifying the expected number of nodes.	New ECS instances are automatically added to the node pool to keep the expected number of nodes.	No ECS instances are added to the node pool.	We recommend that you do not perform operations on the scaling group of a node pool.

How do I add free nodes to a node pool?

Free nodes exist in clusters created before the node pool feature was released. If you no longer need free nodes, you can release the Elastic Compute Service (ECS) instances that are used to deploy the nodes. If you want to retain free nodes, we recommend that you add them to node pools. This way, you can manage the nodes in groups.

You can create and scale out a node pool, remove free nodes, and then add the corresponding ECS instances to the node pool. For more information, see Add free nodes to a node pool.

How do I use preemptible instances in a node pool?

You can use preemptible instances when you create a node pool. You can also use preemptible instances in a node pool by using the spot-instance-advisor command-line tool. For more information, see Best practices for preemptible instance-based node pools.

Note

When you create a cluster, you cannot select preemptible instances for the node pool of the cluster.

How do I modify the configurations of a node?

To ensure smooth business operation and facilitate node management:

You cannot modify some of the configuration items such as the container runtime and the virtual private cloud (VPC) to which the node belongs after the node pool is created.
You can modify some of the configuration items but the modification is limited. For example, if you change the operating system of a node, you can only upgrade the original image to the latest version and you cannot change the image type.
You can modify some of the configuration items such as vSwitch, billing method, and instance type, and the modification is not limited.

Modifications on specific configuration items, such as the public IP address and the CloudMonitor plug-in, take effect only on nodes that are newly added to the node pool. For more information, see Modify a node pool.

If you want to run a new node, we recommend that you create a node pool based on the configurations of the new node, set the nodes in the old node pool to the Unschedulable state, and then drain the old nodes. After you run your business on the new node, release the old nodes.

How do I release a specific ECS instance?

You can release an ECS instance by removing a node. After an ECS instance is released, the expected number of nodes automatically changes to the actual number of nodes. You do not need to modify the expected number of nodes. You cannot release an ECS instance by modifying the expected number of nodes.

How do I update the container runtime of a worker node that does not belong to a node pool?

Perform the following operations:

Remove the worker node. When you remove the worker node, the system sets the node to the Unschedulable state and drains the node. If the node fails to be drained, the system stops removing the node. If the node is drained, the system continues to remove the node from the cluster.
Add the node to a node pool. You can add the node to an existing node pool. Alternatively, you can create an empty node pool and add the node to the node pool. After the node is added to a node pool, the container runtime of the node automatically becomes the same as that of the node pool.
Note
Node pools are free of charge. However, you are charged for the cloud resources such as ECS instances that are used in node pools. For more information, see Cloud service billing.

What do I do if a timeout error occurs after I add an existing node?

Check whether the network of the node and the network of the Classic Load Balancer (CLB) instance of the API server are connected. Check whether the security groups meet the requirement. For more information about the limits on security groups, see Limits on security groups. For more information about other network connectivity issues, see FAQ about network management.

How do I change the hostname of a worker node in an ACK cluster?

After you create an ACK cluster, you cannot directly change the hostnames of worker nodes. If you want to change the hostname of a worker node, modify the node naming rule of the relevant node pool, remove the worker node from the node pool, and then add the worker node to the node pool again.

Note

When you create an ACK cluster, you can modify the hostnames of worker nodes in the Custom Node Name section. For more information, see Create an ACK managed cluster.

Remove the worker node.
1. Log on to the ACK console. In the left-side navigation pane, click Clusters.
2. In the left-side navigation pane of the details page, choose Nodes > Nodes.
3. On the Nodes page, find the worker node that you want to remove and choose More > Remove in the Actions column.
4. In the dialog box that appears, select I understand the above information and want to remove the node(s). and click OK.
Add the worker node to the node pool again. For more information, see the Manually add nodes section of the "Add existing ECS instances to an ACK cluster" topic.
Then, the worker node is renamed based on the new node naming rule of the node pool.

How do I manually update the kernel version of GPU-accelerated nodes in a cluster?

To manually update the kernel version of GPU-accelerated nodes in a cluster, perform the following steps:

Note

The current kernel version is earlier than 3.10.0-957.21.3.

Confirm the kernel version to which you want to update. Proceed with caution when you perform the update.

The following procedure shows how to update the NVIDIA driver. Details about how to update the kernel version are not shown.

Obtain the kubeconfig file of the cluster and use kubectl to connect to the cluster.
Set the GPU-accelerated node that you want to manage to the Unschedulable state. In this example, the node cn-beijing.i-2ze19qyi8votgjz12345 is used.
```
kubectl cordon cn-beijing.i-2ze19qyi8votgjz12345

node/cn-beijing.i-2ze19qyi8votgjz12345 already cordoned
```

Migrate the pods on the GPU-accelerated node to other nodes.

kubectl drain cn-beijing.i-2ze19qyi8votgjz12345 --grace-period=120 --ignore-daemonsets=true

node/cn-beijing.i-2ze19qyi8votgjz12345 cordoned
WARNING: Ignoring DaemonSet-managed pods: flexvolume-9scb4, kube-flannel-ds-r2qmh, kube-proxy-worker-l62sf, logtail-ds-f9vbg
pod/nginx-ingress-controller-78d847fb96-5fkkw evicted

Uninstall the existing nvidia-driver.
Note
In this example, the uninstalled driver version is 384.111. If your driver version is not 384.111, download the installation package of your driver from the official NVIDIA website and update the driver to 384.111 first.
1. Log on to the GPU-accelerated node and run the nvidia-smi command to check the driver version.
```
sudo nvidia-smi -a | grep 'Driver Version'
Driver Version                      : 384.111
```
2. Download the driver installation package.
```
sudo cd /tmp/
sudo curl -O https://cn.download.nvidia.cn/tesla/384.111/NVIDIA-Linux-x86_64-384.111.run
```
  Note
  The installation package is required for uninstalling the NVIDIA driver.
3. Uninstall the driver.
```
sudo chmod u+x NVIDIA-Linux-x86_64-384.111.run
sudo sh ./NVIDIA-Linux-x86_64-384.111.run --uninstall -a -s -q
```
Update the kernel.
Update the kernel version based on your business requirements.
Restart the GPU-accelerated node.
```
sudo reboot
```
Log on to the GPU node and run the following command to install the kernel-devel package.
```
sudo yum install -y kernel-devel-$(uname -r)
```

Go to the official NVIDIA website to download the required driver and install it on the GPU-accelerated node. In this example, the driver version 410.79 is used.

sudo cd /tmp/
sudo curl -O https://cn.download.nvidia.cn/tesla/410.79/NVIDIA-Linux-x86_64-410.79.run
sudo chmod u+x NVIDIA-Linux-x86_64-410.79.run
sudo sh ./NVIDIA-Linux-x86_64-410.79.run -a -s -q

warm up GPU
sudo nvidia-smi -pm 1 || true
sudo nvidia-smi -acp 0 || true
sudo nvidia-smi --auto-boost-default=0 || true
sudo nvidia-smi --auto-boost-permission=0 || true
sudo nvidia-modprobe -u -c=0 -m || true

Make sure that the /etc/rc.d/rc.local file includes the following configurations. Otherwise, add the following configurations to the file.

sudo nvidia-smi -pm 1 || true
sudo nvidia-smi -acp 0 || true
sudo nvidia-smi --auto-boost-default=0 || true
sudo nvidia-smi --auto-boost-permission=0 || true
sudo nvidia-modprobe -u -c=0 -m || true

Restart kubelet and Docker.

sudo service kubelet stop
sudo service docker restart
sudo service kubelet start

Set the GPU-accelerated node to schedulable.

kubectl uncordon cn-beijing.i-2ze19qyi8votgjz12345

node/cn-beijing.i-2ze19qyi8votgjz12345 already uncordoned

Run the following command in the nvidia-device-plugin container to check the version of the driver installed on the GPU-accelerated node.

kubectl exec -n kube-system -t nvidia-device-plugin-cn-beijing.i-2ze19qyi8votgjz12345 nvidia-smi
Thu Jan 17 00:33:27 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:00:09.0 Off |                    0 |
| N/A   27C    P0    28W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Note

If no container is started on the GPU-accelerated node after you run the docker ps command, see the What do I do if no container is started on a GPU-accelerated node? section of this topic.

What do I do if no container is started on a GPU-accelerated node?

For specific Kubernetes versions, after you restart kubelet and Docker on GPU-accelerated nodes, no container is started on the nodes.

sudo service kubelet stop
Redirecting to /bin/systemctl stop kubelet.service
sudo service docker stop
Redirecting to /bin/systemctl stop docker.service
sudo service docker start
Redirecting to /bin/systemctl start docker.service
sudo service kubelet start
Redirecting to /bin/systemctl start kubelet.service

sudo docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

Run the following command to check the cgroup driver:

sudo docker info | grep -i cgroup
Cgroup Driver: cgroupfs

The returned results indicate that the cgroup driver is set to cgroupfs.

To resolve the issue, perform the following steps:

Create a copy of /etc/docker/daemon.json. Then, run the following commands to update /etc/docker/daemon.json.

sudo cat >/etc/docker/daemon.json <<-EOF
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "exec-opts": ["native.cgroupdriver=systemd"],
    "log-driver": "json-file",
    "log-opts": {
        "max-size": "100m",
        "max-file": "10"
    },
    "oom-score-adjust": -1000,
    "storage-driver": "overlay2",
    "storage-opts":["overlay2.override_kernel_check=true"],
    "live-restore": true
}
EOF

Run the following commands to restart the Docker runtime and kubelet:

sudo service kubelet stop
Redirecting to /bin/systemctl stop kubelet.service
sudo service docker restart
Redirecting to /bin/systemctl restart docker.service
sudo service kubelet start
Redirecting to /bin/systemctl start kubelet.service

Run the following command to check whether the cgroup driver is set to systemd.
```
sudo docker info | grep -i cgroup
Cgroup Driver: systemd
```

What is the path of the kubelet in an ACK cluster? Am I able to customize the path?

ACK does not allow you to customize the path of the kubelet. The default path of the kubelet is /var/lib/kubelet. Do not change the path.

How do I migrate multiple pods to other nodes when a node fails?

You can set the faulty node to unschedulable and drain the node. This way, ACK migrates application pods from the faulty node to other nodes.

Log on to the ACK console. On the Nodes page, choose More > Drain in the Actions column. ACK sets the node to unschedulable and migrates applications from the node to other nodes.
Troubleshoot node exceptions. For more information, see Troubleshoot node exceptions.
You can also submit a ticket to contact the ACK technical team.

When a cluster that contains nodes in different zones fails, how does the cluster evict pods from nodes?

In most scenarios, when a node fails, the node controller evicts pods from the node. The value of --node-eviction-rate is 0.1 pod per second, which indicates that pods are evicted from at most one node every 10 seconds.

When an ACK cluster that contains nodes residing in multiple zones fails, the node controller determines how to evict pods based on the zone status and the cluster size.

A zone can be in one of the following states:

FullDisruption: No healthy node resides in the zone and at least one unhealthy node exists.
PartialDisruption: At least two unhealthy nodes exist in the zone, and the ratio of unhealthy nodes (unhealthy nodes/(unhealthy nodes + healthy nodes) is greater than 0.55.
Normal: All nodes in the zone are healthy.

A cluster can be classified into two types based on the cluster size:

Large cluster: The cluster contains more than 50 nodes.
Small cluster: The cluster contains 50 or fewer nodes.

The eviction rate of the node controller is calculated based on the following rules:

If all zones are in the FullDisruption state, the eviction feature is disabled for all zones.
If not all zones are in the FullDisruption state, the eviction rate is determined in the following ways.
- If a zone is in the FullDisruption state, the eviction rate is set to the default value (0.1), regardless of the cluster size.
- If a zone is in the PartialDisruption state, the eviction rate depends on the cluster size. In a large cluster, the eviction rate of the zone is 0.01. In a small cluster, the eviction rate of the zone is 0, which indicates that no pod is evicted.
- If a zone is in the Normal state, the eviction rate is set to the default value (0.1), regardless of the cluster size.

For more information, see Rate limits on eviction.