A pod in a Kubernetes cluster may fail to be initialized after a node that runs Alibaba Cloud Linux 3 and has an early Systemd version is added to the cluster. This topic describes the cause of and solution to the preceding issue.
Problem description
When a pod in a Kubernetes cluster is being initialized, a node that runs Alibaba Cloud Linux 3 is added to the cluster. When the added node refreshes Kubernetes related configurations and executes a script that runs the systemctl daemon-reload command, a pod fails to be initialized and an error message appears indicating that the control group (cgroup) configuration fails or the cgroup.proc file does not exist.
Scope of impacts
Alibaba Cloud Linux 3 operating systems that include a Systemd version earlier than systemd-239-82.0.3.4.al8.2.
Cause
The
systemctl daemon-reloadcommand is run to reload all units. When the units are reloaded while the pod is being initialized, aracecondition occurs. As a result,dbuscommunication fails orunitfiles cannot be found.When the
systemctl daemon-reloadcommand is run, specific process IDs are not saved or written tocgroups. As a result, acgroup emptyevent may occur and thecgroupsmay be deleted.
Solution
Update Systemd to systemd-239-82.0.3.4.al8.2 or later.
Check the current version of Systemd.
rpm -q systemdUpdate Systemd.
In this example, Systemd is updated to systemd-239-82.0.3.4.al8.2.x86_64. If you want to update Systemd to a different version, replace
systemd-239-82.0.3.4.al8.2.x86_64with the version you want to use.sudo dnf upgrade -y systemd-239-82.0.3.4.al8.2.x86_64Restart the instance for the configuration to take effect.
WarningThe restart operation stops the instances for a short period of time and may interrupt services that are running on the instance. This may result in data loss. Therefore, we recommend that you back up critical instance data before you restart the instance. We also recommend that you restart the instance during off-peak hours.
sudo reboot