This topic provides answers to frequently asked questions (FAQs) regarding nodes and node pools in Alibaba Cloud Container Service for Kubernetes (ACK). It covers common operational tasks, such as modifying pod limits, updating OS images, and troubleshooting node-related timeout issues.
Index
To diagnose and troubleshoot node issues, see Troubleshoot node exceptions.
How do I use spot instances in a node pool?
You can use spot instances by creating a new node pool or using the spot-instance-advisor command. For details, see Best practices for spot instance node pools.
To maintain consistency within a node pool, you cannot convert an existing pay-as-you-go or subscription node pool to a spot instance node pool, nor can you convert a spot node pool into other billing types.
Can I configure different ECS instance types in a single node pool?
Yes, you can. To prevent scale-out failures caused by instance unavailability or inventory shortages, we recommend the following strategies:
Configure multiple vSwitches for a node pool across different availability zones.
Select multiple Elastic Compute Service (ECS) instance types or specify instance type based on vCPU and memory specifications.
You can view the scalability level of a node pool after creation.
For unsupported instance types and node configuration recommendations, see ECS instance type configuration recommendations.
How do I calculate the maximum number of pods per node?
The maximum number of pods supported per node depends on the network plugin used by the cluster. For detailed calculation methods, see Maximum number of pods per node.
Terway:
Max pods per node = Maximum Elastic Network Interface (ENI)-based pods + Host network pods.Flannel: The limit is defined by the Number of Pods per Node specified during cluster creation.
You can view the maximum number of pods, which is the Pod Quota, in the node list on the Nodes page of the ACK console.
How do I adjust the pod capacity when a node reaches its pod limit?
The maximum number of pods supported by a single worker node is determined by the network plugin type and is immutable in most cases.
Terway mode: The maximum pods per node depends on the number of ENIs provided by the ECS instance.
Flannel mode: The maximum pods per node is defined during cluster creation and cannot be modified once set.
If the pod count in your cluster reaches its limit, we recommend scaling out the node pool to add more nodes, which increases the total available pod capacity in your cluster. For more information, see Increase the maximum number of pods in a cluster.
How do I modify node configurations?
To ensure cluster stability, certain parameters—specifically those related to networking and high availability—are immutable after a node pool is created. For example, you cannot change the container runtime or the virtual private cloud (VPC) to which a node belongs.
For mutable parameters, changes typically apply only to newly created nodes. Existing nodes remain unaffected unless otherwise specified (such as Update ECS Tags of Existing Nodes and Update Labels and Taints of Existing Nodes).
Best practices for applying new configurations:
To apply new settings to existing nodes, follow these steps:
Create a new node pool with the desired configuration.
Cordon and drain the nodes in the old node pool to migrate workloads to the new nodes.
Once the migration is complete, release the instances in the old node pool.
For more information about which parameters can be modified and when the modifications take effect, see Edit a node pool.
Can I disable the Expected Nodes feature?
If the Scaling Mode of a node pool is set to Manual, the Expected Nodes parameter is mandatory and cannot be disabled.
If you want to remove or release a specific node, see Remove a node. If you want to add a specific node, see Add an existing node. After you remove a node or add an existing node, the expected number of instances is automatically adjusted to the new number of nodes. You do not need to manually change it.
What is the difference between a node pool with and without Expected Nodes enabled?
The Expected Nodes parameter defines the intended capacity of a node pool. You can scale out or scale in a node pool by adjusting this parameter. While most modern node pools use this for reconciliation and scaling, some legacy node pools may not have this feature enabled.
The following table describes how the system responds to different operations based on this setting:
Operation | Expected Nodes enabled | Expected Nodes disabled (legacy) | Recommendation |
Scale in by reducing the Expected Nodes via console/OpenAPI | The system terminates nodes until the count matches the expected value. | If the current number of nodes in the node pool is greater than the expected number of instances, nodes are scaled in until the number of instances reaches the specified number. The expected number of instances feature is then enabled. | N/A |
Remove a specific node via console/OpenAPI | The expected count decreases by the number of nodes removed. For example, if the Expected Nodes is 10 before removing the node, the value is updated to 7 after you remove 3 nodes. | The specific nodes are removed from the cluster. | N/A |
Remove a node via | The expected count remains unchanged. | No change to the pool state. | Not recommended |
Manually release an ECS instance via console/OpenAPI | The system automatically creates a new ECS instance to maintain the expected count. | The node pool is unaware of the change. No new ECS instance is created. The deleted node will display an | Not recommended. This causes data inconsistency between ACK and Auto Scaling (ESS). See Remove a node for recommended method. |
ECS subscription expiration | The system automatically creates a new ECS instance to maintain the expected count. | The node pool is unaware of the change. No new ECS instance is created. The deleted node will display an | Not recommended. This causes data inconsistency between ACK and ESS. Renew instances or remove them via the ACK console before expiration. For a recommended method to remove nodes, see Remove a node. |
ECS instance fails the ESS health check (e.g., node stop) | The system automatically creates a new ECS instance to maintain the expected count. | The system replaces the stopped instance with a new one. | Not recommended. Do not directly perform operations on scaling groups associated with node pools. |
Remove an ECS instance from ESS group without modifying Expected Nodes | The system automatically creates a new ECS instance to maintain the expected count. | No new ECS instance is created. | Not recommended. Do not directly perform operations on scaling groups associated with node pools. |
How do I add free nodes to a node pool?
Worker nodes created in legacy clusters before the introduction of the node pool feature are considered free nodes. If you no longer need them, release the corresponding ECS instances. Otherwise, to benefit from group management and automated O&M, we recommend migrating them into a node pool.
Create a new node pool or expand an existing one, remove the free nodes from the cluster, then add them to the target node pool. For details, see Add free nodes to a node pool.
How do I change the OS image of a node pool?
You can switch the OS as needed. For example, from CentOS to Alibaba Cloud Linux or upgrade to a newer version of the current OS. Before proceeding, review the OS image release notes for compatibility and usage limits.
For step-by-step instructions, see Replace the OS of a node pool.
How do I release a specific ECS instance?
To release a specific ECS instance, you must remove the node via the ACK console. This ensures the Expected Nodes count is updated automatically and correctly without manual intervention. Simply decreasing the Expected Nodes count will trigger a random scale-in, which might not target the specific instance you intend to release.
What do I do if adding an existing node fails with a timeout error?
Check connectivity: Ensure the node has network access to the API server Classic Load Balancer (CLB) instance.
Security groups: Verify that the Security Group rules allow the required traffic. Refer to the Security group limits for adding existing nodes.
General networking: For more complex issues, see Network management FAQ.
How do I change the hostname of a worker node in an ACK cluster?
Hostnames cannot be modified directly after a cluster is created. However, you can change them by defining a Custom Node Name rule in the node pool settings when creating a cluster. For details, see Create an ACK managed cluster.
Then, perform the following:
Remove the node from the cluster.
Add the removed node back to the node pool. For instructions, see Manually add nodes.
The node will be automatically renamed upon re-joining the cluster based on the node pool's naming template.
How do I manually upgrade the kernel and NVIDIA drivers on GPU nodes?
The current kernel version is below
3.10.0-957.21.3.This procedure involves kernel and driver changes. Confirm your target versions and perform these steps with caution.
This guide focuses on the driver upgrade required after or during a kernel upgrade. The kernel upgrade itself is not covered.
Connect to the cluster: Obtain the cluster kubeconfig and use kubectl to connect to the cluster.
Cordon the node: Prevent new pods from being scheduled on the target GPU node. This example uses the node
cn-beijing.i-2ze19qyi8votgjz*****.kubectl cordon cn-beijing.i-2ze19qyi8votgjz***** node/cn-beijing.i-2ze19qyi8votgjz***** already cordonedDrain the node: Evict existing pods to other nodes.
kubectl drain cn-beijing.i-2ze19qyi8votgjz***** --grace-period=120 --ignore-daemonsets=true node/cn-beijing.i-2ze19qyi8votgjz***** cordoned WARNING: Ignoring DaemonSet-managed pods: flexvolume-9scb4, kube-flannel-ds-r2qmh, kube-proxy-worker-l62sf, logtail-ds-f9vbg pod/nginx-ingress-controller-78d847fb96-***** evictedUninstall the current NVIDIA driver:
NoteThis example uses version
384.111. Replace it with your actual version.Log on to the GPU node and run the
nvidia-smicommand to check the driver version.sudo nvidia-smi -a | grep 'Driver Version' Driver Version : 384.111Download the matching installer from NVIDIA to perform the uninstallation.
cd /tmp/ sudo curl -O https://cn.download.nvidia.cn/tesla/384.111/NVIDIA-Linux-x86_64-384.111.runNoteYou must use the installation package to uninstall the NVIDIA driver.
Uninstall the current NVIDIA driver.
sudo chmod u+x NVIDIA-Linux-x86_64-384.111.run sudo sh ./NVIDIA-Linux-x86_64-384.111.run --uninstall -a -s -q
Upgrade the kernel.
You can upgrade the kernel as needed.
Restart the GPU node.
sudo rebootInstall kernel headers: Log on to the GPU node again and install the corresponding
kernel-devel.sudo yum install -y kernel-devel-$(uname -r)Install the new NVIDIA driver: Go to the NVIDIA website to download and install the required NVIDIA driver. This example uses version
410.79.# Change directory to /tmp cd /tmp/ # Download the NVIDIA driver installer sudo curl -O https://cn.download.nvidia.cn/tesla/410.79/NVIDIA-Linux-x86_64-410.79.run # Make the installer executable sudo chmod u+x NVIDIA-Linux-x86_64-410.79.run # Run the installer in silent mode sudo sh ./NVIDIA-Linux-x86_64-410.79.run -a -s -q # Warm up the GPU sudo nvidia-smi -pm 1 || true sudo nvidia-smi -acp 0 || true sudo nvidia-smi --auto-boost-default=0 || true sudo nvidia-smi --auto-boost-permission=0 || true sudo nvidia-modprobe -u -c=0 -m || trueConfigure persistence mode: Ensure the following GPU warm-up settings are in /etc/rc.d/rc.local. Add them manually if necessary.
sudo nvidia-smi -pm 1 || true sudo nvidia-smi -acp 0 || true sudo nvidia-smi --auto-boost-default=0 || true sudo nvidia-smi --auto-boost-permission=0 || true sudo nvidia-modprobe -u -c=0 -m || trueRestart services:
sudo service kubelet stop sudo service docker restart sudo service kubelet startUncordon the GPU node:
kubectl uncordon cn-beijing.i-2ze19qyi8votgjz***** node/cn-beijing.i-2ze19qyi8votgjz***** already uncordonedVerify: Run
nvidia-smiinside thenvidia-device-pluginpod to confirm the version.kubectl exec -n kube-system -t nvidia-device-plugin-cn-beijing.i-2ze19qyi8votgjz***** nvidia-smi Thu Jan 17 00:33:27 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... On | 00000000:00:09.0 Off | 0 | | N/A 27C P0 28W / 250W | 0MiB / 16280MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+NoteIf you run the
docker pscommand and find that no containers are started on the GPU node, see Fix container startup failures on GPU nodes.
Fix container startup failures on GPU nodes
Symptom
In certain Kubernetes versions, after restarting kubelet and Docker on a GPU-enabled node, no containers are initialized or displayed when running docker ps.
sudo service kubelet stop
# Redirecting to /bin/systemctl stop kubelet.service
sudo service docker stop
# Redirecting to /bin/systemctl stop docker.service
sudo service docker start
# Redirecting to /bin/systemctl start docker.service
sudo service kubelet start
# Redirecting to /bin/systemctl start kubelet.service
sudo docker ps
# Output: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMESDiagnosis
This issue typically occurs because the Docker Cgroup Driver is incorrectly configured as cgroupfs instead of systemd, causing a mismatch with the Kubernetes orchestration layer.
Run the following command to check the current Cgroup Driver:
sudo docker info | grep -i cgroupExpected output for error state:
Cgroup Driver: cgroupfsSolution
Update the Docker configuration: You must align the Cgroup Driver with
systemdand ensure the NVIDIA container runtime is set as the default.Back up your existing configuration (the /etc/docker/daemon.json file).
Apply the corrected configuration: Run the following command to overwrite /etc/docker/daemon.json with the required settings.
sudo cat >/etc/docker/daemon.json <<-EOF { "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m", "max-file": "10" }, "oom-score-adjust": -1000, "storage-driver": "overlay2", "storage-opts":["overlay2.override_kernel_check=true"], "live-restore": true } EOF
Restart services for the changes to take effect.
sudo service kubelet stop # Redirecting to /bin/systemctl stop kubelet.service sudo service docker restart # Redirecting to /bin/systemctl restart docker.service sudo service kubelet start # Redirecting to /bin/systemctl start kubelet.serviceConfirm that the Cgroup Driver has been successfully switched to systemd.
sudo docker info | grep -i cgroup Cgroup Driver: systemd
When a node fails, how do I migrate pods in batch for redeployment?
You can set the failed node to unschedulable and drain it to move application pods to healthy nodes.
Log on to the ACK console.
On the Nodes page, find the node that you want to manage. In the Actions column, choose More > Drain. This operation sets the old node to unschedulable and gradually migrates the applications from the old node to a new node.
Troubleshoot the failed node. For troubleshooting details, see Troubleshoot node issues.
If a cluster with nodes across multiple zones fails, how does the cluster determine the node eviction policy?
Typically, when a node fails, the node controller evicts pods from the unhealthy node. The default eviction rate --node-eviction-rate is 0.1 nodes per second. This means that at most one pod is evicted from a node every 10 seconds.
However, when an ACK cluster with nodes in multiple zones fails, the node controller determines the eviction policy based on zone health and cluster size.
There are three types of zone health state.
FullDisruption: The zone has no healthy nodes and at least one unhealthy node.
PartialDisruption: The zone has at least two unhealthy nodes, and the proportion of unhealthy nodes, which is calculated as
(Number of unhealthy nodes / (Number of unhealthy nodes + Number of healthy nodes)), is greater than 0.55.Normal: Neither of the above.
The eviction rate of the node controller is calculated as follows based on the three zone health states:
If all zones are in the FullDisruption state, the eviction feature is disabled for the entire cluster.
If some zones are in the FullDisruption state, the eviction rate is set to the normal value (0.1), regardless of the cluster size.
If a zone is in the PartialDisruption state, the eviction rate is affected by the cluster size.
Large clusters (>50 nodes): The eviction rate drops to 0.01/s.
Small clusters (≤50 nodes): The eviction rate for the zone is 0, which means no eviction occurs.
If a zone is in the Normal state, the eviction rate is set to the normal value (0.1), regardless of the cluster size.
For more information, see Rate limits on eviction.
Can I customize the kubelet directory path?
No. The kubelet path is fixed at /var/lib/kubelet and cannot be customized in ACK.
Can I mount a data disk to a custom directory in an ACK node pool?
This feature is currently in canary release. To apply for this feature, submit a ticket.
Once enabled, you can format and mount disks to specific paths, with the following restrictions:
Do not mount to the following reserved OS directories:
/
/etc
/var/run
/run
/boot
Do not mount to the following directories that are used by the system and container runtimes, or their subdirectories:
/usr
/bin
/sbin
/lib
/lib64
/ostree
/sysroot
/proc
/sys
/dev
/var/lib/kubelet
/var/lib/docker
/var/lib/containerd
/var/lib/container
The mount directories for different data disks must be unique.
The mount directory must be an absolute path starting with
/.The mount directory cannot contain carriage return or line feed characters (C-style escape characters
\rand\n) and cannot end with a backslash (\).
How do I modify the maximum number of file descriptors?
The maximum number of file descriptors is the maximum number of files that can be opened at the same time. Alibaba Cloud Linux and CentOS systems have two levels of limits:
System level: The maximum number of files that can be simultaneously opened by the processes of all users.
User level: The maximum number of files that can be opened by a single user processes.
In a container environment, there is another limit: the maximum number of file descriptors for a single process inside a container.
Manual changes made via CLI may be overwritten during node pool upgrades. We recommend editing the node pool in the console for persistent settings.
Modify system-level limit
Modify user-level limit
Log on to the node and check the
/etc/security/limits.conffile.cat /etc/security/limits.confThe maximum file descriptors for individual user processes are defined by the following parameters:
... root soft nofile 65535 root hard nofile 65535 * soft nofile 65535 * hard nofile 65535Run the
sedcommand to modify the file descriptors limit. The following example sets the value to65535(recommended):sed -i "s/nofile.[0-9]*$/nofile 65535/g" /etc/security/limits.confLog on to the node again and run the following command to check whether the modification is effective.
# ulimit -nIf the output matches the value you configured (e.g.,
65535), the modification was successful.
Modify container-level limit
This requires restarting the Docker or containerd service, which will interrupt running containers. Perform this operation during off-peak hours.
Log on to the node and run the following command to view the configuration file.
containerd node:
cat /etc/systemd/system/containerd.serviceDocker node:
cat /etc/systemd/system/docker.service
The file descriptors limit for a single process in a container is set by the following parameters:
... LimitNOFILE=1048576 ******Maximum number of file handles for a single process LimitNPROC=1048576 ******Maximum number of processes ...Run the following command to modify the parameter values.
1048576is the recommended value for the file descriptors limit.containerd node:
sed -i "s/LimitNOFILE=[0-9a-Z]*$/LimitNOFILE=65536/g" /etc/systemd/system/containerd.service;sed -i "s/LimitNPROC=[0-9a-Z]*$/LimitNPROC=65537/g" /etc/systemd/system/containerd.service && systemctl daemon-reload && systemctl restart containerdDocker node:
sed -i "s/LimitNOFILE=[0-9a-zA-Z]*$/LimitNOFILE=1048576/g" /etc/systemd/system/docker.service && sed -i "s/LimitNPROC=[0-9a-zA-Z]*$/LimitNPROC=1048576/g" /etc/systemd/system/docker.service && systemctl daemon-reload && systemctl restart docker
Run the following command to view the file descriptors limit for a single process in a container.
If the returned value is the same as the value you set, the modification is successful.
containerd node:
# cat /proc/`pidof containerd`/limits | grep files Max open files 1048576 1048576 filesDocker node:
# cat /proc/`pidof dockerd`/limits | grep files Max open files 1048576 1048576 files
How do I upgrade the container runtime for worker nodes that do not belong to a node pool?
In legacy clusters created before the introduction of the node pool feature, free worker nodes may exist. To upgrade the container runtime of these nodes, you must first migrate them into a node pool.
Procedure:
Create a node pool: If no suitable node pool exists, create one with the same configuration as the free node.
Remove the node: During the node removal process, the system sets the node to unschedulable and drains it. If the draining fails, the system automatically cordons the node (sets it to unschedulable) and performs a drain operation to evict pods. If the draining succeeds, the node is removed from the cluster.
Add an existing node: Add the target node to an existing node pool. Once the node re-joins the cluster, its container runtime will be automatically updated to match the runtime specified in the node pool configuration.
NoteWhile the node pool feature itself is free of charge, you will be billed for the underlying ECS instances and other cloud resources. For details, see Cloud resource fees.
Why does the console display the source of a node pool as Other Nodes?
ACK allows you to add computing resources via the console, OpenAPI, or CLI (see Add an existing node). If you add nodes through custom methods not recognized by ACK's standard lifecycle management, the console classifies them under the Other Nodes group.
ACK cannot manage these nodes through a node pool, meaning features such as automated O&M, lifecycle management, and guaranteed technical support are unavailable.
If you want to continue using these nodes, you must ensure their compatibility with cluster add-ons and assume the potential risks. These risks include but are not limited to:
Version incompatibility: During control plane or system component upgrades, the OS and resident components on these nodes may become incompatible with the new version, risking service disruption.
Scheduling conflicts: The cluster may fail to accurately report availability zones or resource remaining capacity for these nodes. This can lead to improper workload scheduling and degraded performance.
Data plane mismatches: Compatibility between node-side components/OS and the cluster control plane has not been validated, posing stability risks.
O&M failures: Maintenance operations performed via the ACK console or OpenAPI may fail or yield unexpected results because the underlying management channel for these nodes is unverified.
How do I configure network ACLs for the vSwitches used by cluster nodes?
If an access control list (ACL) is associated with the vSwitch of a node pool, you must explicitly allow specific CIDR blocks. Otherwise, new nodes will fail to join the cluster or will appear in a Failed or Offline state.
Procedure to allow traffic and re-add nodes:
Configure network ACL rules: Ensure both inbound and outbound rules allow traffic for the following CIDR blocks:
100.104.0.0/16: ACK control plane management CIDR.100.64.0.0/10: Alibaba Cloud internal service CIDR.100.100.100.200/32: ECS metadata service address.VPC/vSwitch CIDR: The primary and secondary CIDR blocks of the VPC, or the specific CIDR of the node's vSwitch.
Remove faulty nodes: Remove any nodes that were in a Failed or Offline state before the ACL rules were applied.
Create a node pool or expand an existing node pool: If the node status transitions to Ready, the network ACL rules have been configured correctly.