All Products
Search
Document Center

Alibaba Cloud Linux:Causes of OOM Killer and solutions

Last Updated:Dec 26, 2025

When a Linux operating system does not have sufficient memory, the system reclaims memory and allocates the reclaimed memory to other processes. If memory reclamation does not resolve the memory insufficiency issue, the system triggers Out of Memory Killer (OOM Killer) to forcefully free up the memory that is occupied by processes. This alleviates memory pressure. This topic describes the possible causes of the issue that OOM Killer is triggered in Alibaba Cloud Linux and how to resolve the issue.

Problem description

The following sample log indicates that the test process triggered the OOM Killer in Alibaba Cloud Linux:

565 [Sat Sep 11 12:24:42 2021] test invoked oom-killer: gfp_mask=0x62****(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
566 [Sat Sep 11 12:24:42 2021] test cpuset=/ mems_allowed=0
567 [Sat Sep 11 12:24:42 2021] CPU: 1 PID: 29748 Comm: test Kdump: loaded Not tainted 4.19.91-24.1.al7.x86_64 #1
568 [Sat Sep 11 12:24:42 2021] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS e62**** 04/01/2014

Possible causes

The OOM Killer is triggered when the system runs out of memory. This memory shortage can be global, affecting the entire instance, or specific to a cgroup. Common scenarios and causes are as follows:

Reason Type

Example scenario

A cgroup does not have sufficient memory.

In the following log example, the OOM Killer is triggered in the /mm_test cgroup, which contains the test process.

[Wed Sep  8 18:01:32 2021] test invoked oom-killer: gfp_mask=0x240****(GFP_KERNEL), nodemask=0, order=0, oom_score_adj=0
[Wed Sep  8 18:01:32 2021] Task in /mm_test killed as a result of limit of /mm_test
[Wed Sep  8 18:01:32 2021] memory: usage 204800kB, limit 204800kB, failcnt 26

Cause: The memory usage of the /mm_test cgroup reached its limit (200 MB), which triggered the OOM Killer.

A parent cgroup does not have sufficient memory.

In the following log example, the test process belongs to the /mm_test/2 cgroup, but the OOM Killer is triggered in the parent cgroup /mm_test.

[Fri Sep 10 16:15:14 2021] test invoked oom-killer: gfp_mask=0x240****(GFP_KERNEL), nodemask=0, order=0, oom_score_adj=0
[Fri Sep 10 16:15:14 2021] Task in /mm_test/2 killed as a result of limit of /mm_test
[Fri Sep 10 16:15:14 2021] memory: usage 204800kB, limit 204800kB, failcnt 1607

Cause: The memory usage of the /mm_test/2 cgroup did not reach its limit, but the memory usage of the parent cgroup /mm_test reached its limit (200 MB), which triggered the OOM Killer.

System-wide out of memory

In the following log example, limit of host indicates a global memory shortage on the instance. The log data shows that the free memory (free) on Node 0 has dropped below the low watermark (low).

[Sat Sep 11 12:24:42 2021] test invoked oom-killer: gfp_mask=0x62****(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0,
[Sat Sep 11 12:24:42 2021] Task in /user.slice killed as a result of limit of host
[Sat Sep 11 12:24:42 2021] Node 0 DMA32 free:155160kB min:152412kB low:190512kB high:228612kB
[Sat Sep 11 12:24:42 2021] Node 0 Normal free:46592kB min:46712kB low:58388kB high:70064kB

Cause: The amount of free memory on the instance is smaller than the lower limit of free memory, and memory reclamation cannot resolve the issue of insufficient memory.

A memory node does not have sufficient memory.

In a scenario in which OOM Killer is triggered as recorded in the following log, the log data provides the following information:

  • limit of host indicates a memory shortage on a memory node.

  • The instance has two memory nodes: Node 0 and Node 1.

  • The free memory (free) on Node 1 is below the low watermark (low).

  • The instance still has a large amount of free memory (free:4111496).

[Sat Sep 11 09:46:24 2021] main invoked oom-killer: gfp_mask=0x62****(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
[Sat Sep 11 09:46:24 2021] main cpuset=mm_cpuset mems_allowed=1
[Sat Sep 11 09:46:24 2021] Task in / killed as a result of limit of host
[Sat Sep 11 09:46:24 2021] Mem-Info:
[Sat Sep 11 09:46:24 2021] active_anon:172 inactive_anon:4518735 isolated_anon:
    free:4111496 free_pcp:1 free_cma:0
[Sat Sep 11 09:46:24 2021] Node 1 Normal free:43636kB min:45148kB low:441424kB high:837700kB
[Sat Sep 11 09:46:24 2021] Node 1 Normal: 856*4kB (UME) 375*8kB (UME) 183*16kB (UME) 184*32kB (UME) 87*64kB (ME) 45*128kB (UME) 16*256kB (UME) 5*512kB (UE) 14*1024kB (UME) 0     *2048kB 0*4096kB = 47560kB
[Sat Sep 11 09:46:24 2021] Node 0 hugepages_total=360 hugepages_free=360 hugepages_surp=0 hugepages_size=1048576kB
[Sat Sep 11 09:46:24 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Sat Sep 11 09:46:24 2021] Node 1 hugepages_total=360 hugepages_free=360 hugepages_surp=0 hugepages_size=1048576kB
[Sat Sep 11 09:46:25 2021] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

Cause: In a Non-Uniform Memory Access (NUMA) architecture, an operating system can have multiple memory nodes. You can run the cat /proc/buddyinfo command to view information about the nodes. If the cpuset.mems parameter restricts a cgroup to use memory from only specific nodes, the OOM Killer can be triggered even if the instance has sufficient free memory overall.

A buddy system does not have sufficient memory in the event of memory fragmentation.

In a scenario in which OOM Killer is triggered as recorded in the following log, the log data provides the following information:

  • The OOM Killer is triggered during the order=3 memory allocation phase.

  • The free memory (free) on Node 0 is still above the low watermark (low).

  • The corresponding memory for Node 0 in the buddy system is 0 (0*32kB (M)).

[Sat Sep 11 15:22:46 2021] insmod invoked oom-killer: gfp_mask=0x60****(GFP_KERNEL), nodemask=(null), order=3, oom_score_adj=0
[Sat Sep 11 15:22:46 2021] insmod cpuset=/ mems_allowed=0
[Sat Sep 11 15:22:46 2021] Task in /user.slice killed as a result of limit of host
[Sat Sep 11 15:22:46 2021] Node 0 Normal free:23500kB min:15892kB low:19864kB high:23836kB active_anon:308kB inactive_anon:194492kB active_file:384kB inactive_file:420kB unevi    ctable:0kB writepending:464kB present:917504kB managed:852784kB mlocked:0kB kernel_stack:2928kB pagetables:9188kB bounce:0kB
[Sat Sep 11 15:22:46 2021] Node 0 Normal: 1325*4kB (UME) 966*8kB (UME) 675*16kB (UME) 0*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =

Cause: If the buddy system does not have sufficient memory when the operating system allocates memory, the system triggers OOM Killer to free up memory and allocates the freed memory to the buddy system.

Note

The buddy system is a kernel memory management mechanism in Linux that mitigates memory fragmentation and efficiently allocates and frees up memory blocks of different sizes.

Solutions

Perform the following steps based on the scenario to troubleshoot the issue.

A cgroup or parent cgroup does not have sufficient memory

We recommend that you assess the processes that are occupying memory and terminate unnecessary processes to free up memory. If your business requires a large amount of memory and the instance type of your instance does not meet this requirement, you can upgrade to an instance type that has a larger memory size.

  1. Upgrade the instance type of your instance.

    For more information, see Overview of instance configuration changes.

  2. You can manually adjust the cgroup memory limit according to the increase in memory.

    sudo bash -c 'echo <value> > /sys/fs/cgroup/memory/<cgroup_name>/memory.limit_in_bytes'

    In this command, replace <value> with the new memory limit for the cgroup and <cgroup_name> with the actual cgroup name.

System-wide out of memory

If an instance does not have sufficient memory, check the following items:

  • Usage of the slab_unreclaimable memory

    cat /proc/meminfo | grep "SUnreclaim"

    The slab_unreclaimable memory is the memory that cannot be reclaimed by the system. When the slab_unreclaimable memory takes up more than 10% of the total memory, the system may have slab memory leaks. For information about how to troubleshoot memory leaks, see What do I do if an instance has a high percentage of slab_unreclaimable memory? If the issue persists, submit a ticket.

  • Usage of the systemd memory

    cat /proc/1/status | grep "RssAnon"

    When OOM Killer is triggered in the kernel, the first process (PID 1) of the system is skipped. In this case, the systemd memory usage does not exceed 200 MB. If exceptions occur, you can update the systemd version.

  • Usage of the Transparent Enormous Pages (THP) feature

    If the THP feature is enabled, memory bloat may occur and trigger OOM Killer. You can optimize THP performance. For more information, see How do I use THP to tune performance in Alibaba Cloud Linux?.

A memory node does not have sufficient memory

If the OOM Killer is triggered because a memory node has insufficient memory, reconfigure the cpuset.mems parameter. This ensures that the cgroup can use memory from the correct nodes.

  1. Run the following command to query the number of memory nodes in the system:

    cat /proc/buddyinfo
  2. Configure the cpuset.mems parameter.

    sudo bash -c 'echo <value> > /sys/fs/cgroup/cpuset/<cgroup_name>/cpuset.mems'

    In this command, replace <value> with the number of the corresponding memory node and <cgroup_name> with the actual cgroup name.

    For example, if the system has three nodes (Node 0, Node 1, and Node 2) and you want to allow the cgroup to use memory from Node 0 and Node 2, set <value> to 0,2.

A buddy system does not have sufficient memory in the event of memory fragmentation

If OOM Killer is triggered due to memory fragmentation, defragment the memory on a regular basis during off-peak hours. You can run the following command to defragment the memory:

sudo bash -c 'echo 1 > /proc/sys/vm/compact_memory'