When a Linux operating system does not have sufficient memory, the system reclaims memory and allocates the reclaimed memory to other processes. If memory reclamation does not resolve the memory insufficiency issue, the system triggers Out of Memory Killer (OOM Killer) to forcefully free up the memory that is occupied by processes. This alleviates memory pressure. This topic describes the possible causes of the issue that OOM Killer is triggered in Alibaba Cloud Linux and how to resolve the issue.
Problem description
The following sample log indicates that the test process triggered the OOM Killer in Alibaba Cloud Linux:
565 [Sat Sep 11 12:24:42 2021] test invoked oom-killer: gfp_mask=0x62****(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
566 [Sat Sep 11 12:24:42 2021] test cpuset=/ mems_allowed=0
567 [Sat Sep 11 12:24:42 2021] CPU: 1 PID: 29748 Comm: test Kdump: loaded Not tainted 4.19.91-24.1.al7.x86_64 #1
568 [Sat Sep 11 12:24:42 2021] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS e62**** 04/01/2014Possible causes
The OOM Killer is triggered when the system runs out of memory. This memory shortage can be global, affecting the entire instance, or specific to a cgroup. Common scenarios and causes are as follows:
Reason Type | Example scenario |
A cgroup does not have sufficient memory. | In the following log example, the OOM Killer is triggered in the Cause: The memory usage of the |
A parent cgroup does not have sufficient memory. | In the following log example, the Cause: The memory usage of the |
System-wide out of memory | In the following log example, Cause: The amount of free memory on the instance is smaller than the lower limit of free memory, and memory reclamation cannot resolve the issue of insufficient memory. |
A memory node does not have sufficient memory. | In a scenario in which OOM Killer is triggered as recorded in the following log, the log data provides the following information:
Cause: In a Non-Uniform Memory Access (NUMA) architecture, an operating system can have multiple memory nodes. You can run the cat /proc/buddyinfo command to view information about the nodes. If the |
A buddy system does not have sufficient memory in the event of memory fragmentation. | In a scenario in which OOM Killer is triggered as recorded in the following log, the log data provides the following information:
Cause: If the buddy system does not have sufficient memory when the operating system allocates memory, the system triggers OOM Killer to free up memory and allocates the freed memory to the buddy system. Note The buddy system is a kernel memory management mechanism in Linux that mitigates memory fragmentation and efficiently allocates and frees up memory blocks of different sizes. |
Solutions
Perform the following steps based on the scenario to troubleshoot the issue.
A cgroup or parent cgroup does not have sufficient memory
We recommend that you assess the processes that are occupying memory and terminate unnecessary processes to free up memory. If your business requires a large amount of memory and the instance type of your instance does not meet this requirement, you can upgrade to an instance type that has a larger memory size.
Upgrade the instance type of your instance.
For more information, see Overview of instance configuration changes.
You can manually adjust the cgroup memory limit according to the increase in memory.
sudo bash -c 'echo <value> > /sys/fs/cgroup/memory/<cgroup_name>/memory.limit_in_bytes'In this command, replace
<value>with the new memory limit for the cgroup and<cgroup_name>with the actual cgroup name.
System-wide out of memory
If an instance does not have sufficient memory, check the following items:
Usage of the slab_unreclaimable memory
cat /proc/meminfo | grep "SUnreclaim"The slab_unreclaimable memory is the memory that cannot be reclaimed by the system. When the slab_unreclaimable memory takes up more than 10% of the total memory, the system may have slab memory leaks. For information about how to troubleshoot memory leaks, see What do I do if an instance has a high percentage of slab_unreclaimable memory? If the issue persists, submit a ticket.
Usage of the systemd memory
cat /proc/1/status | grep "RssAnon"When OOM Killer is triggered in the kernel, the first process (PID 1) of the system is skipped. In this case, the systemd memory usage does not exceed 200 MB. If exceptions occur, you can update the systemd version.
Usage of the Transparent Enormous Pages (THP) feature
If the THP feature is enabled, memory bloat may occur and trigger OOM Killer. You can optimize THP performance. For more information, see How do I use THP to tune performance in Alibaba Cloud Linux?.
A memory node does not have sufficient memory
If the OOM Killer is triggered because a memory node has insufficient memory, reconfigure the cpuset.mems parameter. This ensures that the cgroup can use memory from the correct nodes.
Run the following command to query the number of memory nodes in the system:
cat /proc/buddyinfoConfigure the
cpuset.memsparameter.sudo bash -c 'echo <value> > /sys/fs/cgroup/cpuset/<cgroup_name>/cpuset.mems'In this command, replace
<value>with the number of the corresponding memory node and<cgroup_name>with the actual cgroup name.For example, if the system has three nodes (Node 0, Node 1, and Node 2) and you want to allow the cgroup to use memory from Node 0 and Node 2, set
<value>to0,2.
A buddy system does not have sufficient memory in the event of memory fragmentation
If OOM Killer is triggered due to memory fragmentation, defragment the memory on a regular basis during off-peak hours. You can run the following command to defragment the memory:
sudo bash -c 'echo 1 > /proc/sys/vm/compact_memory'