A softlockup occurs when some earlier versions of the Linux kernel write data back to file caches. This topic describes the cause of and solution to the issue.

Problem description

On a Linux Elastic Compute Service (ECS) instance with a kernel version earlier than 4.15, a softlockup occurs when the kernel writes data back to file caches. Call stack information similar to the following content is generated.
Note You can run the uname -r command to check the Linux kernel version. For example, a command output of 514.26.2.el7.x86_64 indicates that the kernel version is 3.10.0.
[3507707.671883]  [<ffffffff8127cf7a>] redirty_tail+0x3a/0x40
[3507707.671884]  [<ffffffff81280ea4>] __writeback_inodes_wb+0x64/0xc0
[3507707.671885]  [<ffffffff81281238>] wb_writeback+0x268/0x300
[3507707.671887]  [<ffffffff812819f4>] wb_workfn+0xb4/0x380
[3507707.671889]  [<ffffffff810a5dc9>] process_one_work+0x189/0x420
[3507707.671890]  [<ffffffff810a625b>] worker_thread+0x1fb/0x4b0
[3507707.671891]  [<ffffffff810a6060>] ? process_one_work+0x420/0x420
[3507707.671893]  [<ffffffff810ac696>] kthread+0xe6/0x100
[3507707.671894]  [<ffffffff810ac5b0>] ? kthread_park+0x60/0x60
[3507707.671897]  [<ffffffff81741dd9>] ret_from_fork+0x39/0x50

Cause

When the memory of the instance is insufficient, the operating system kernel frequently calls the wakeup_flusher_threads function. When the function is called, a large number of writeback tasks (wb_writeback_work) are created. As a result, the writeback threads keep processing the writeback tasks, and a softlockup occurs on the operating system.

Solution

You must update the kernel to a version later than 4.15. This issue does not occur on Alibaba Cloud Linux operating systems because the kernel versions of the Alibaba Cloud Linux operating systems are 4.19. The following solution is applicable to Linux distributions other than the Alibaba Cloud Linux operating systems.
Notice Before you perform the operations, we recommend that you create snapshots for the ECS instances to back up data to prevent data loss due to accidental operations. For more information about snapshots, see Snapshot overview.
  1. Connect to the instance.
    For more information about how to connect to an instance, see Connection methods.
  2. Run the following command to view the kernel version of the operating system:
    uname -r
  3. If the kernel version is 4.15 and earlier, run the following command to update the kernel version.
    If the kernel version is later than 4.15, the softlockup does not occur on the operating system. You do not need to perform the subsequent operations.
    yum update kernel
  4. Restart the instance after you update the kernel version.
    reboot
  5. After the instance is restarted, run the following command again to check whether the kernel version is later than 4.15:
    uname -r
    If you cannot update some earlier versions of Linux kernels by running the yum update kernel command, download the kernel RPM package of a version later than 4.15 and manually upgrade the kernel RPM package.