All Products
Search
Document Center

Enterprise Distributed Application Service:Why does the running process of an application suddenly disappear?

Last Updated:Aug 08, 2023

This issue occurs because the physical memory of the operating system is exhausted or the process of a Java Virtual Machine (JVM) that runs the application stops responding. This topic uses the Linux operating system as an example to describe how to resolve this issue.

Out-of-memory (OOM) killer triggered by low physical memory on the operating system

By default, the OOM killer mechanism is enabled for an operating system. If an operating system runs low on physical memory and swap space, the OOM killer selectively kills processes. Linux allocates each running process a score that is called oom_score. You can view the score in /proc/<pid>/oom_score. A high score indicates a higher priority. In this case, the OOM killer kills processes in descending order and starts from the process with the highest score.

If a process is killed by the OOM killer, the OOM killer writes information. For example, OOM killer writes the process ID (PID) to the logs of the operating system. This way, you can search operating system (OS) logs to check whether a process is killed by the OOM killer.

The following logs show that a process of the ECS cluster is killed by the OOM killer:

[Wed Aug 31 16:36:42 2017] Out of memory: Kill process 43805 (keystone-all) score 249 or sacrifice child
            [Wed Aug 31 16:36:42 2017] Killed process 43805 (keystone-all) total-vm:4446352kB, anon-rss:4053140kB, file-rss:68kB
            [Wed Aug 31 16:56:25 2017] keystone-all invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
            [Wed Aug 31 16:56:25 2017] keystone-all cpuset=/ mems_allowed=0
            [Wed Aug 31 16:56:25 2017] CPU: 2 PID: 88196 Comm: keystone-all Not tainted 3.10.0-327.13.1.el7.x86_64 #1

The following logs show that a process of the Swarm cluster is killed by the OOM killer.

Memory cgroup out of memory: Kill process 20911 (beam.smp) score 1001 or sacrifice child
Killed process 20977 (sh) total-vm:4404kB, anon-rss:0kB, file-rss:508kB

To search for logs, you can run the following command:

grep -i 'killed process' /var/log/messages

You can also run the following command:

egrep "oom-killer|total-vm" /var/log/messages

If the issue occurs, use one of the following methods to resolve the issue:

  • Increase the physical memory size of ECS instances. Otherwise, reduce the memory size that is allocated to the killed process.

  • Check whether a swap partition is mounted to an ECS instance. In most cases, the OOM killer is triggered because no swap partition is mounted to one or more ECS instances. Mounting a swap partition to an ECS instance has a negative impact on the performance of the instance. However, the mounting of a swap partition ensures the health of processes. If no swap partition is mounted, search for how to mount a swap partition in Linux and mount a swap partition.

JVM processes that run applications unexpectedly exit

In most cases, the running process of a JVM may stop responding due to invalid Java Native Interface (JNI) calls, out-of-heap-space errors in C, and other errors. If the preceding issue occurs, a file named hs_err_<jvm_pid>.log is generated in the working directory of the current JVM process. You can run the pwdx <jvm_pid> command to query the directory. In the log file, you can identify the cause of the error or the thread that is being executed when the error occurs. You can also enable core dumps to be generated for further analysis.

In addition, you can go to the details page of the application and enable Analysis of Abnormal Exit in the Application Settings section of the Basic Information tab in the Enterprise Distributed Application Service (EDAS) console. If the application monitoring and alerting feature is enabled, an alert is triggered when the JVM process unexpectedly exits. In this case, you can log on to the ECS instance to query logs and perform analysis to identify the cause.

Analysis of the unexpected exit of an application in EDAS