When a Linux operating system runs low on memory, it first triggers memory reclamation to free up memory for other processes. If memory reclamation cannot resolve the low memory condition, the system invokes the Out of Memory Killer (OOM Killer) to forcibly terminate a process. This frees up memory and relieves memory pressure. This topic describes the causes of OOM Killer events on an Alibaba Cloud Linux operating system and their solutions.
Problem description
The following log snippet is an example of an OOM Killer event on an Alibaba Cloud Linux operating system. The test process triggered the OOM Killer.
565 [Sat Sep 11 12:24:42 2021] test invoked oom-killer: gfp_mask=0x62****(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
566 [Sat Sep 11 12:24:42 2021] test cpuset=/ mems_allowed=0
567 [Sat Sep 11 12:24:42 2021] CPU: 1 PID: 29748 Comm: test Kdump: loaded Not tainted 4.19.91-24.1.al7.x86_64 #1
568 [Sat Sep 11 12:24:42 2021] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS e62**** 04/01/2014Potential causes
The OOM Killer is triggered by insufficient memory. This issue occurs for two main reasons: insufficient global memory on the instance or insufficient memory within a cgroup. The following table describes common scenarios and causes for OOM Killer events:
Cause | Scenario example |
Insufficient cgroup memory | In the following log example, the OOM Killer is triggered for the cgroup Cause: The memory usage of the |
Insufficient parent cgroup memory | In the following log example, the Cause: The OOM Killer was triggered because the parent cgroup |
Insufficient global memory | In the following log example, Cause: The amount of free memory on the instance is smaller than the lower limit of free memory, and memory reclamation cannot resolve the issue of insufficient memory. |
Insufficient memory on a memory node | In the following log example, several key lines indicate the problem:
Cause: In a Non-Uniform Memory Access (NUMA) architecture, an operating system can have multiple memory nodes. You can run the cat /proc/buddyinfo command to view information about the nodes. If the |
Insufficient buddy system memory due to memory fragmentation | The log entries show the following:
Cause: The system triggers the OOM Killer when memory fragmentation prevents the buddy system from finding a contiguous memory block of the requested size, even if total free memory is sufficient. Note The buddy system in Linux is a kernel mechanism for memory management. It mitigates memory fragmentation and efficiently allocates and deallocates memory blocks of various sizes. |
Solutions
Follow these steps to resolve the issue based on its cause.
Insufficient memory in a cgroup or parent cgroup
Evaluate the processes currently consuming memory on your instance and terminate any unnecessary processes to free up memory. If your workload requires more memory, upgrade the instance to increase its memory capacity.
Upgrade the instance.
For more information, see Overview of instance configuration changes.
After upgrading, manually adjust the cgroup's memory limit.
sudo bash -c 'echo <value> > /sys/fs/cgroup/memory/<cgroup_name>/memory.limit_in_bytes'Replace
<value>with the new memory limit for the cgroup in bytes and<cgroup_name>with the name of your cgroup.
Insufficient global memory
If you encounter insufficient global memory, investigate the following areas:
Check the
slab_unreclaimablememory usage.cat /proc/meminfo | grep "SUnreclaim"slab_unreclaimableis memory that the system cannot reclaim. If it accounts for more than 10% of the total memory, this may indicate a slab memory leak. If you suspect a memory leak, troubleshoot it manually. For detailed instructions, see What do I do if an instance has a high percentage of slab_unreclaimable memory? If the issue persists, submit a ticket.Check the
systemdmemory usage.cat /proc/1/status | grep "RssAnon"When the kernel triggers the OOM Killer, it skips process 1 (
systemd). Therefore,systemdmemory usage should typically not exceed 200 MB. If you observe abnormally high usage, try updating thesystemdtools to a newer version.Review the performance of Transparent Huge Pages (THP).
Enabling Transparent Huge Pages (THP) can cause memory bloating, which may lead to OOM Killer events. You can tune THP to mitigate this issue. For more information, see How do I use THP to tune performance in Alibaba Cloud Linux?
Insufficient memory on a memory node
To resolve an OOM Killer event caused by insufficient memory on a specific memory node, reconfigure the cpuset.mems interface to allow the cgroup to use memory from other available nodes.
Identify the memory nodes in your system.
cat /proc/buddyinfoConfigure the
cpuset.memsparameter.sudo bash -c 'echo <value> > /sys/fs/cgroup/cpuset/<cgroup_name>/cpuset.mems'Replace
<value>with the corresponding memory node numbers and<cgroup_name>with the name of your cgroup.For example, if your system has three nodes (Node 0, Node 1, and Node 2) and you want the cgroup to use memory from Node 0 and Node 2, set
<value>to0,2.
Insufficient buddy system memory due to memory fragmentation
To resolve OOM Killer events caused by memory fragmentation, perform memory compaction during off-peak hours. To initiate memory compaction, run the following command:
sudo bash -c 'echo 1 > /proc/sys/vm/compact_memory'