All Products
Search
Document Center

Alibaba Cloud Linux:What do I do if OOM Killer is triggered?

Last Updated:Mar 18, 2024

When a Linux operating system does not have sufficient memory, the system reclaims memory and allocates the reclaimed memory to other processes. If memory reclamation does not resolve the memory insufficiency issue, the system triggers Out of Memory Killer (OOM Killer) to forcefully free up the memory that is occupied by processes. This alleviates memory pressure. This topic describes the possible causes of the issue that OOM Killer is triggered in Alibaba Cloud Linux and how to resolve the issue.

Problem description

The following sample log indicates that the test process triggered OOM Killer in Alibaba Cloud Linux:

565 [Sat Sep 11 12:24:42 2021] test invoked oom-killer: gfp_mask=0x62****(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
566 [Sat Sep 11 12:24:42 2021] test cpuset=/ mems_allowed=0
567 [Sat Sep 11 12:24:42 2021] CPU: 1 PID: 29748 Comm: test Kdump: loaded Not tainted 4.19.91-24.1.al7.x86_64 #1
568 [Sat Sep 11 12:24:42 2021] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS e62**** 04/01/2014

Possible causes

OOM Killer is triggered when an instance or a cgroup in the instance does not have sufficient memory. The following table describes the possible causes of the issue that OOM Killer is triggered in Alibaba Cloud Linux.

Cause

Example scenario

A cgroup does not have sufficient memory.

In a scenario in which OOM Killer is triggered as recorded in the following log, OOM Killer is triggered in the /mm_test cgroup to which the test process belongs:

[Wed Sep  8 18:01:32 2021] test invoked oom-killer: gfp_mask=0x240****(GFP_KERNEL), nodemask=0, order=0, oom_score_adj=0
[Wed Sep  8 18:01:32 2021] Task in /mm_test killed as a result of limit of /mm_test
[Wed Sep  8 18:01:32 2021] memory: usage 204800kB, limit 204800kB, failcnt 26

Cause: The memory usage of the /mm_test cgroup has reached the upper limit of 200 MB.

A parent cgroup does not have sufficient memory.

In a scenario in which OOM Killer is triggered as recorded in the following log, the test process belongs to the /mm_test/2 cgroup but OOM Killer is triggered in the /mm_test cgroup:

[Fri Sep 10 16:15:14 2021] test invoked oom-killer: gfp_mask=0x240****(GFP_KERNEL), nodemask=0, order=0, oom_score_adj=0
[Fri Sep 10 16:15:14 2021] Task in /mm_test/2 killed as a result of limit of /mm_test
[Fri Sep 10 16:15:14 2021] memory: usage 204800kB, limit 204800kB, failcnt 1607

Cause: The memory usage of the /mm_test/2 cgroup has not reached the upper limit, but the memory usage of the /mm_test parent cgroup has reached the upper limit of 200 MB.

An instance does not have sufficient memory.

In a scenario in which OOM Killer is triggered as recorded in the following log, limit of host indicates that the instance does not have sufficient memory. In the log data, the amount of free memory (the value of the free parameter) of Node 0 is smaller than the lower limit of free memory (the value of the low parameter).

[Sat Sep 11 12:24:42 2021] test invoked oom-killer: gfp_mask=0x62****(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0,
[Sat Sep 11 12:24:42 2021] Task in /user.slice killed as a result of limit of host
[Sat Sep 11 12:24:42 2021] Node 0 DMA32 free:155160kB min:152412kB low:190512kB high:228612kB
[Sat Sep 11 12:24:42 2021] Node 0 Normal free:46592kB min:46712kB low:58388kB high:70064kB

Cause: The amount of free memory on the instance is smaller than the lower limit of free memory, and memory reclamation cannot resolve the issue of insufficient memory.

A memory node does not have sufficient memory.

In a scenario in which OOM Killer is triggered as recorded in the following log, the log data provides the following information:

  • limit of host indicates that a memory node does not have sufficient memory.

  • The instance has two memory nodes: Node 0 and Node 1.

  • The amount of free memory (the value of the free parameter) of Node 1 is smaller than the lower limit of free memory (the value of the low parameter).

  • The instance has a large amount of free memory (free:4111496).

[Sat Sep 11 09:46:24 2021] main invoked oom-killer: gfp_mask=0x62****(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
[Sat Sep 11 09:46:24 2021] main cpuset=mm_cpuset mems_allowed=1
[Sat Sep 11 09:46:24 2021] Task in / killed as a result of limit of host
[Sat Sep 11 09:46:24 2021] Mem-Info:
[Sat Sep 11 09:46:24 2021] active_anon:172 inactive_anon:4518735 isolated_anon:
    free:4111496 free_pcp:1 free_cma:0
[Sat Sep 11 09:46:24 2021] Node 1 Normal free:43636kB min:45148kB low:441424kB high:837700kB
[Sat Sep 11 09:46:24 2021] Node 1 Normal: 856*4kB (UME) 375*8kB (UME) 183*16kB (UME) 184*32kB (UME) 87*64kB (ME) 45*128kB (UME) 16*256kB (UME) 5*512kB (UE) 14*1024kB (UME) 0     *2048kB 0*4096kB = 47560kB
[Sat Sep 11 09:46:24 2021] Node 0 hugepages_total=360 hugepages_free=360 hugepages_surp=0 hugepages_size=1048576kB
[Sat Sep 11 09:46:24 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Sat Sep 11 09:46:24 2021] Node 1 hugepages_total=360 hugepages_free=360 hugepages_surp=0 hugepages_size=1048576kB
[Sat Sep 11 09:46:25 2021] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

Cause: In Non-Uniform Memory Access (NUMA) storage mode, the operating system may have multiple memory nodes. You can run the cat /proc/buddyinfo command to query resource information. If you use the cpuset.mems interface to limit a specific cgroup to using the memory of specific memory nodes, OOM Killer may still be triggered even when the instance has sufficient free memory.

A buddy system does not have sufficient memory in the event of memory fragmentation.

In a scenario in which OOM Killer is triggered as recorded in the following log, the log data provides the following information:

  • OOM Killer is triggered in the operating system during the order-3 allocation phase.

  • The amount of free memory (the value of the free parameter) of Node 0 is larger than the lower limit of free memory (the value of the low parameter).

  • The memory of the buddy system of Node 0 is 0 (0*32kB (M)).

[Sat Sep 11 15:22:46 2021] insmod invoked oom-killer: gfp_mask=0x60****(GFP_KERNEL), nodemask=(null), order=3, oom_score_adj=0
[Sat Sep 11 15:22:46 2021] insmod cpuset=/ mems_allowed=0
[Sat Sep 11 15:22:46 2021] Task in /user.slice killed as a result of limit of host
[Sat Sep 11 15:22:46 2021] Node 0 Normal free:23500kB min:15892kB low:19864kB high:23836kB active_anon:308kB inactive_anon:194492kB active_file:384kB inactive_file:420kB unevi    ctable:0kB writepending:464kB present:917504kB managed:852784kB mlocked:0kB kernel_stack:2928kB pagetables:9188kB bounce:0kB
[Sat Sep 11 15:22:46 2021] Node 0 Normal: 1325*4kB (UME) 966*8kB (UME) 675*16kB (UME) 0*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =

Cause: If the buddy system does not have sufficient memory when the operating system allocates memory, the system triggers OOM Killer to free up memory and allocates the freed memory to the buddy system.

Note

The buddy system is a kernel memory management mechanism in Linux that mitigates memory fragmentation and efficiently allocates and frees up memory blocks of different sizes.

Solutions

Perform the following steps based on the scenario to troubleshoot the issue.

A cgroup or parent cgroup does not have sufficient memory

We recommend that you assess the processes that are occupying memory and terminate unnecessary processes to free up memory. If your business requires a large amount of memory and the instance type of your instance does not meet this requirement, you can upgrade to an instance type that has a larger memory size.

  1. Upgrade the instance type of your instance.

    For more information, see Overview of instance configuration changes.

  2. Run the following command to adjust the upper limit of memory for the specified cgroup:

    sudo bash -c 'echo <value> > /sys/fs/cgroup/memory/<cgroup_name>/memory.limit_in_bytes'

    Replace <value> with a new upper limit of memory and <cgroup_name> with the actual cgroup name.

An instance does not have sufficient memory

If an instance does not have sufficient memory, check the following items:

  • Usage of the slab_unreclaimable memory

    cat /proc/meminfo | grep "SUnreclaim"

    The slab_unreclaimable memory is the memory that cannot be reclaimed by the system. When the slab_unreclaimable memory takes up more than 10% of the total memory, the system may have slab memory leaks. For information about how to troubleshoot memory leaks, see What do I do if an instance has a high percentage of slab_unreclaimable memory? If the issue persists, submit a ticket.

  • Usage of the systemd memory

    cat /proc/1/status | grep "RssAnon"

    When OOM Killer is triggered in the kernel, the first process (PID 1) of the system is skipped. In this case, the systemd memory usage does not exceed 200 MB. If exceptions occur, you can update the systemd version.

  • Usage of the Transparent Huge Pages (THP) feature

    If the THP feature is enabled, memory bloat may occur and trigger OOM Killer. You can optimize THP performance. For more information, see How do I use THP to tune performance in Alibaba Cloud Linux?.

A memory node does not have sufficient memory

If OOM Killer is triggered due to insufficient memory of memory nodes, re-configure the value of the cpuset.mems interface to enable cgroups to properly use the memory of the memory nodes.

  1. Run the following command to query the number of memory nodes in the system:

    cat /proc/buddyinfo
  2. Run the following command to specify the value of the cpuset.mems interface:

    sudo bash -c 'echo <value> > /sys/fs/cgroup/cpuset/<cgroup_name>/cpuset.mems'

    Replace <value> with the actual memory node number and <cgroup_name> with the actual cgroup name.

    For example, assume that the instance has three memory nodes: Node 0, Node 1, and Node 2. To allow the cgroup to use the memory of Node 0 and Node 2, set <value> to 0,2.

A buddy system does not have sufficient memory in the event of memory fragmentation

If OOM Killer is triggered due to memory fragmentation, defragment the memory on a regular basis during off-peak hours. You can run the following command to defragment the memory:

sudo bash -c 'echo 1 > /proc/sys/vm/compact_memory'