When memory reclamation cannot resolve low memory on a Linux system, the kernel invokes the OOM Killer to forcibly terminate processes. This topic covers common causes of OOM Killer events on Alibaba Cloud Linux and how to resolve them.
Problem description
The following log shows the test process triggering the OOM Killer:
565 [Sat Sep 11 12:24:42 2021] test invoked oom-killer: gfp_mask=0x62****(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
566 [Sat Sep 11 12:24:42 2021] test cpuset=/ mems_allowed=0
567 [Sat Sep 11 12:24:42 2021] CPU: 1 PID: 29748 Comm: test Kdump: loaded Not tainted 4.19.91-24.1.al7.x86_64 #1
568 [Sat Sep 11 12:24:42 2021] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS e62**** 04/01/2014
Potential causes
OOM Killer events result from either insufficient global memory or insufficient cgroup memory. The following table lists common scenarios:
|
Cause |
Scenario example |
|
Insufficient cgroup memory |
In this log, the OOM Killer fires for cgroup
Cause: The |
|
Insufficient parent cgroup memory |
Here, the
Cause: The parent cgroup |
|
Insufficient global memory |
In this log,
Cause: Free memory dropped below the minimum threshold and memory reclamation could not free enough pages. |
|
Insufficient memory on a memory node |
Key indicators in this log:
Cause: In a NUMA architecture, |
|
Insufficient buddy system memory due to memory fragmentation |
Key indicators in this log:
Cause: Memory fragmentation prevents the buddy system from allocating a contiguous block of the required size, even though total free memory is sufficient. Note
The buddy system is a Linux kernel memory allocator that manages contiguous blocks of varying sizes to reduce fragmentation. |
Solutions
Resolve the issue based on its cause.
Insufficient memory in a cgroup or parent cgroup
Terminate unnecessary processes to free memory. If your workload requires more memory, upgrade the instance.
-
Upgrade the instance.
-
After upgrading, manually adjust the cgroup's memory limit.
sudo bash -c 'echo <value> > /sys/fs/cgroup/memory/<cgroup_name>/memory.limit_in_bytes'Replace
<value>with the new memory limit for the cgroup in bytes and<cgroup_name>with the name of your cgroup.
Insufficient global memory
Investigate the following areas:
-
Check the
slab_unreclaimablememory usage.cat /proc/meminfo | grep "SUnreclaim"slab_unreclaimableis memory the system cannot reclaim. If it exceeds 10% of total memory, a slab memory leak is likely. Troubleshoot using What do I do if an instance has a high percentage of slab_unreclaimable memory? If the issue persists, submit a ticket. -
Check the
systemdmemory usage.cat /proc/1/status | grep "RssAnon"The kernel skips process 1 (
systemd) during OOM kills, sosystemdmemory usage should not exceed 200 MB. If usage is abnormally high, updatesystemdto a newer version. -
Review the performance of Transparent Huge Pages (THP).
THP can cause memory bloating that leads to OOM events. To mitigate this, tune THP settings as described in How do I use THP to tune performance in Alibaba Cloud Linux?
Insufficient memory on a memory node
Reconfigure cpuset.mems to allow the cgroup to use memory from additional nodes.
-
Identify the memory nodes in your system.
cat /proc/buddyinfo -
Configure the
cpuset.memsparameter.sudo bash -c 'echo <value> > /sys/fs/cgroup/cpuset/<cgroup_name>/cpuset.mems'Replace
<value>with the corresponding memory node numbers and<cgroup_name>with the name of your cgroup.For example, if your system has three nodes (Node 0, Node 1, and Node 2) and you want the cgroup to use memory from Node 0 and Node 2, set
<value>to0,2.
Insufficient buddy system memory due to memory fragmentation
Perform memory compaction during off-peak hours:
sudo bash -c 'echo 1 > /proc/sys/vm/compact_memory'