If your Elastic Compute Service (ECS) instance goes down and the Out of memory and no killable processes error message appears in an error log, you can use the solution described in this topic to fix the issue.
Problem description
[28663.625353] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[28663.625363] [ 1799] 0 1799 26512 245 56 3 0 -1000 sshd
[28663.625367] [29219] 0 29219 10832 126 26 3 0 -1000 systemd-udevd
[28663.625375] Kernel panic - not syncing: Out of memory and no killable processes...
[28663.634374] CPU: 1 PID: 3578 Comm: kworker/u176:4 Tainted: G OE 3.10.0-1062.9.1.el7.x86_64 #1
[28663.676873] Call Trace:
[28663.679312] [<ffffffff8139f342>] dump_stack+0x63/0x81
[28663.684421] [<ffffffff811b2245>] panic+0xf8/0x244
[28663.689184] [<ffffffff811b98db>] out_of_memory+0x2eb/0x550
[28663.694726] [<ffffffff811be254>] __alloc_pages_may_oom+0x114/0x1c0
[28663.700959] [<ffffffff811bedb3>] __alloc_pages_slowpath+0x7d3/0xa40
[28663.707279] [<ffffffff811bf229>] __alloc_pages_nodemask+0x209/0x260
[28663.713599] [<ffffffff81216535>] alloc_pages_current+0x95/0x140
[28663.719573] [<ffffffff811ba5ee>] __get_free_pages+0xe/0x40
[28663.725113] [<ffffffff81075dae>] pgd_alloc+0x1e/0x160
[28663.730225] [<ffffffff810875e4>] mm_init+0x184/0x240
[28663.735249] [<ffffffff81088102>] mm_alloc+0x52/0x60
[28663.740186] [<ffffffff81257640>] do_execveat_common.isra.37+0x250/0x780
[28663.759839] [<ffffffff81257b9c>] do_execve+0x2c/0x30
[28663.764864] [<ffffffff810a231b>] call_usermodehelper_exec_async+0xfb/0x150
[28663.777246] [<ffffffff81741dd9>] ret_from_fork+0x39/0x50
Cause
When the operating system kernel of the instance fails to allocate memory to processes
and attempts to terminate some processes to release memory, no processes that are running on the instance can
be terminated. As a result, the instance goes down. This issue may be caused because a memory leak
occurs in the operating system kernel or because processes whose oom_score_adj
value is set to -1000
use excessive memory and cannot be terminated. In both cases, the available memory in the system is insufficient.
Solution
- Check whether a memory leak occurs. For more information, see Identify the causes of high percentage of the slab_unreclaimable memory in the Linux operating system.
- Check whether the
oom_score_adj
value is properly set.
If you have requests or feedback, you can submit a ticket to contact Alibaba Cloud.