If your Elastic Compute Service (ECS) instance goes down and the Out of memory and no killable processes error message appears in an error log, you can use the solution described in this topic to fix the issue.

Problem description

An instance goes down at runtime and a call stack similar to the following one is displayed:
[28663.625353] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[28663.625363] [ 1799]     0  1799    26512      245      56       3        0         -1000 sshd
[28663.625367] [29219]     0 29219    10832      126      26       3        0         -1000 systemd-udevd
[28663.625375] Kernel panic - not syncing: Out of memory and no killable processes...
[28663.634374] CPU: 1 PID: 3578 Comm: kworker/u176:4 Tainted: G           OE   3.10.0-1062.9.1.el7.x86_64 #1
[28663.676873] Call Trace:
[28663.679312]  [<ffffffff8139f342>] dump_stack+0x63/0x81
[28663.684421]  [<ffffffff811b2245>] panic+0xf8/0x244
[28663.689184]  [<ffffffff811b98db>] out_of_memory+0x2eb/0x550
[28663.694726]  [<ffffffff811be254>] __alloc_pages_may_oom+0x114/0x1c0
[28663.700959]  [<ffffffff811bedb3>] __alloc_pages_slowpath+0x7d3/0xa40
[28663.707279]  [<ffffffff811bf229>] __alloc_pages_nodemask+0x209/0x260
[28663.713599]  [<ffffffff81216535>] alloc_pages_current+0x95/0x140
[28663.719573]  [<ffffffff811ba5ee>] __get_free_pages+0xe/0x40
[28663.725113]  [<ffffffff81075dae>] pgd_alloc+0x1e/0x160
[28663.730225]  [<ffffffff810875e4>] mm_init+0x184/0x240
[28663.735249]  [<ffffffff81088102>] mm_alloc+0x52/0x60
[28663.740186]  [<ffffffff81257640>] do_execveat_common.isra.37+0x250/0x780
[28663.759839]  [<ffffffff81257b9c>] do_execve+0x2c/0x30
[28663.764864]  [<ffffffff810a231b>] call_usermodehelper_exec_async+0xfb/0x150
[28663.777246]  [<ffffffff81741dd9>] ret_from_fork+0x39/0x50

Cause

When the operating system kernel of the instance fails to allocate memory to processes and attempts to terminate some processes to release memory, no processes that are running on the instance can be terminated. As a result, the instance goes down. This issue may be caused because a memory leak occurs in the operating system kernel or because processes whose oom_score_adj value is set to -1000 use excessive memory and cannot be terminated. In both cases, the available memory in the system is insufficient.

Solution

Notice Before you perform the operations, we recommend that you create snapshots for the ECS instances to back up data to prevent data loss due to accidental operations. For more information about snapshots, see Snapshot overview.
Perform the following checks:

If you have requests or feedback, you can submit a ticket to contact Alibaba Cloud.