If a hung error occurs when you delete cgroups in an Elastic Compute Service (ECS) instance, you can use the solution described in this topic to fix the issue.

Problem description

A hung error occurs when you delete containers in an instance, and a call stack similar to the following one is displayed:
[3302742.447940] Kernel panic - not syncing: softlockup: hung tasks
[3302742.448677] CPU: 18 PID: 1 Comm: systemd Kdump: loaded Tainted: G OEL ------------ T 3.10.0-862.14.4.el7.x86_64 #1
[3302742.450167] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8a46cfe 04/01/2014
[3302742.462123] [] mem_cgroup_reparent_charges+0x16d/0x3c0
[3302742.463243] [] mem_cgroup_css_offline+0x84/0x140
[3302742.464327] [] cgroup_destroy_locked+0xea/0x370
[3302742.465414] [] cgroup_rmdir+0x22/0x40
[3302742.466434] [] vfs_rmdir+0xdc/0x150
[3302742.467449] [] do_rmdir+0x1f1/0x220
[3302742.468470] [] ? ____fput+0xe/0x10
[3302742.469495] [] ? task_work_run+0xc0/0xe0
[3302742.470578] [] SyS_rmdir+0x16/0x20
[3302742.471628] [] system_call_fastpath+0x22/0x27

Cause

When you delete cgroups in the instance, the system repeatedly calculates the size of memory pages that are in use into that of the upper hierarchy of cgroups. If the cgroups consume a large amount of memory, the system spends an extended period of time in calculating. During the calculation process, the system has no scheduling test points, which results in a softlockup error.

Solution

Notice Before you perform the operations, we recommend that you create snapshots for the ECS instances to back up data to prevent data loss due to accidental operations. For more information about snapshots, see Snapshot overview.
Operations vary based on the types of instance operating systems.
  • If your instance runs a CentOS operating system, we recommend that you upgrade the kernel version.
    1. Run the following command to upgrade the kernel version:
      yum update kernel
    2. Run the following command to restart the instance:
      reboot
    3. Run the following command to check whether the kernel version is 3.10.0-1160 or later:
      uname -r
  • If your instance runs an Alibaba Cloud Linux operating system, no softlockup error occurs.
  • If your instance runs an operating system other than the preceding ones, we recommend that you manually upgrade the kernel version to 4.17 or later.

If you have requests or feedback, you can submit a ticket to contact Alibaba Cloud.