×
Community Blog Explaining Memory Issues in Java Cloud-Native Practices

Explaining Memory Issues in Java Cloud-Native Practices

This article analyzes the problems encountered by EDAS users in the evolution of Java applications toward cloud-native and provides suggestions for cloud-native Java application memory configuration.

By Wenxin Xie (Fengjing)

Java has been one of the most popular programming languages for the past two decades with its active open-source community and well-established ecosystem. Entering the cloud-native era, the booming cloud-native technology releases cloud computing dividends, promotes the cloud-native transformation of business, and accelerates the digital transformation of enterprises.

However, Java's cloud-native transformation is facing great challenges, and there are many contradictions in Java's operating mechanism and cloud-native characteristics. With the help of cloud-native technology for deep-cost optimization, resource cost management has risen to an unprecedented height. Resources on the public cloud are charged on a pay-as-you-go basis. Users are very sensitive to resource usage. In terms of memory usage, the execution mechanism based on the Java virtual machine introduces a fixed basic memory overhead for Java programs. Compared with native languages (such as C++/Golang), Java applications occupy a huge amount of memory and are called memory devourers. Therefore, it is more expensive for Java applications to migrate to the cloud. In addition, the complexity of the system increases after the application is migrated into the cloud. Common users do not have a clear understanding of the memory of Java applications on the cloud and have no idea how to properly configure the memory for the application. What's more, it's different for them to do troubleshooting when Out-Of-Memory (OOM) problems occur.

Why does OOM occur when the heap memory does not even exceed Xmx? How can we understand the memory relationship between the operating system and the JVM? Why does the program occupy a lot more memory than Xmx? Where is the memory used? Why does the program in an online container require more memory? This article analyzes the problems encountered by EDAS users in the evolution of Java applications toward cloud-native and provides suggestions for cloud-native Java application memory configuration.

1. Background

Resource Configuration of Kubernetes Applications

The cloud-native architecture is based on Kubernetes. Applications are deployed on Kubernetes and run as container groups. The resource model of Kubernetes has two definitions: resource request and resource limit. Kubernetes ensures that a container has the requested number of resources but does not allow it to use the resources that exceed the limit. Let's take the following memory configuration as an example. A container can obtain at least 1024Mi of memory resources, but 4096Mi at most. Once the memory usage exceeds the upper limit, the container will be out of memory and then restarted by the Kubernetes controller.

spec:
  containers:
  - name: edas
    image: alibaba/edas
    resources:
      requests:
        memory: "1024Mi"
      limits:
        memory: "4096Mi"
    command: ["java", "-jar", "edas.jar"]

Container OOM

For the OOM mechanism of containers, first, we need to review the concept of containers. When we talk about containers, we will say that this is a sandbox technology. As a sandbox, the container is relatively independent inside, with boundaries and sizes. The independent running environment in the container is implemented through Linux's Namespace mechanism, and the namespaces (such as PID, Mount, UTS, IPD, and Network in the container) are concealed so the host Namespace cannot be seen in the container, nor the Namespace of other containers. The boundary and size of a container refer to restricting the use of CPU, memory, IO, and other resources by the container. Otherwise, excessive resources occupied by a single container may cause slow or abnormal running of other containers. Cgroup is a mechanism provided by the Linux kernel that can limit the resources used by a single process or multiple processes. It is also a core technology to implement container resource constraints. A container appears to the operating system as nothing more than a special process whose use of resources is constrained by Cgroup. If the amount of memory used by a process exceeds the Cgroup limit, the process will be killed by the OOM Killer.

Therefore, the container OOM means the container process running on Linux is out of memory. Cgroup is not an obscure technology. It is implemented by Linux as a file system, which is very consistent with Unix's philosophy that everything is a file. For Cgroup V1, you can view the Cgroup configuration of the current container in the /sys/fs/cgroup/ directory of the container.

For container memory, memory.limit_in_bytes and memory.usage_in_bytes are the two most important parameters in the memory control group. The former identifies the maximum memory that can be used by the current container process group, and the latter is the total memory used by the current container process group. In general, the closer the used value is to the maximum value, the higher the risk of OOM.

# The memory limit of the current container
$ cat /sys/fs/cgroup/memory/memory.limit_in_bytes
4294967296
# The actual memory usage of the current
$ cat /sys/fs/cgroup/memory/memory.usage_in_bytes
39215104

JVM OOM

Speaking of OOM, Java developers are more familiar with the JVM OOM. It throws java.lang.OutOfMemoryError when the JVM does not have enough memory to allocate space for an object and the garbage collector has no space to reclaim. According to the JVM specification, all memory regions may throw OOM except the program counter. The most common JVM OOM cases are:

  • java.lang.OutOfMemoryError

Java heap space overflow – This error is thrown when the heap space does not have enough space to store the newly created object. This is usually caused by memory leaks or improper heap size settings. For memory leaks, you need to use memory monitoring software to find the leaked code in the program, and the heap size can be modified using parameters (such as-Xms and-Xmx).

  • java.lang.OutOfMemoryError

PermGen space/Metaspace overflow – The objects that permanent generation stores include class information and constants. The JDK 1.8 uses Metaspace to replace the permanent generation. This error is usually reported because the number of classes loaded is too large or the size is too big. You can modify the-XX:MaxPermSize or-XX:MaxMetaspaceSize to expand the PermGen space/Metaspace.

  • java.lang.OutOfMemoryError

Unable to create a new native thread. Each Java thread needs to occupy a certain amount of memory space. When the JVM sends a request to the underlying operating system to create a new native thread, such an error mentioned above will be reported if there aren't enough resources to be allocated. Possible causes are insufficient native memory, the number of threads exceeding the limit of the maximum number of threads in the operating system caused by thread leak, ulimit, or the number of threads exceeding the kernel.pid_max. You need to upgrade resources, limit the size of the thread pool, and reduce the size of the thread stack.

2. Why Does OOM Occur When the Heap Memory Does Not Exceed Xmx?

Here's a scenario I think many of you have encountered. Java applications deployed in Kubernetes often restart, and the exit status of the container is exit code 137 reason: all information on OOM Killed points to obvious OOM. However, JVM monitoring data shows that the heap memory usage does not exceed the maximum heap memory limit Xmx. What's more, after the OOM automatic heapdump parameter is configured, no dump file is generated when OOM occurs.

According to the preceding background, Java applications in containers may have two types of OOM exceptions: one is JVM OOM, and the other is container OOM. OOM of the JVM is an error caused by insufficient space in the JVM memory area. The JVM actively throws an error and exits the process. You can observe the data to see that the memory usage exceeds the upper limit, and the JVM will leave a corresponding error record. The OOM of containers is a system behavior. The memory used by the entire container process group exceeds the Cgroup limit and is killed by the system OOM Killer. Relevant records are left in system logs and Kubernetes events.

In general, Java program memory usage is limited by both JVM and Cgroup, in which Java heap memory is limited by Xmx parameters, and JVM OOM occurs when the memory exceeds the limit. The entire process memory is limited by the container memory limit value, and container OOM occurs after exceeding the limit. You need to make distinctions and troubleshoo OOM problems based on observation data, JVM error records, system logs, and Kubernetes events to adjust configurations as needed.

3. How Do We Understand the Relationship between the Operating System and the JVM Memory?

As mentioned above, the Java container OOM essentially means that the memory used by the Java process exceeds the Cgroup limit and is killed by the OOM Killer of the operating system. How do we view the memory of a Java process from the perspective of the operating system? The operating system and the JVM have their own memory models. How do they map? It is important to understand the memory relationship between the JVM and the operating system to explore the OOM problem of Java processes.

Taking the most commonly used OpenJDK as an example. The JVM is essentially a C++ process running on the operating system, so its memory model also has the general characteristics of Linux processes. The virtual address space of the Linux process is divided into kernel space and user space, and the user space is subdivided into many segments. Here, several highly relevant segments are selected to describe the mapping between JVM memory and process memory.

1

  • Code Segment: Generally, it refers to mapping program code in memory. Here, it refers to the code of the JVM itself, not the Java code.
  • Data Segment: It indicates data that has initialized the variables at the beginning of the program running. Here, it refers to the data of the JVM itself.
  • Heap Space: The runtime heap is a memory segment that's most different from a Java process and an ordinary process. The heap in the Linux process memory model provides memory space for objects dynamically allocated by the process in the run time. In addition, almost everything in the JVM memory model is a new object created by the JVM process in the run time. The Java heap in the JVM memory model is simply a logical space that the JVM builds on its process heap space.
  • Stack Space: The running stack of the process is not the thread stack in the JVM memory model but some running data that the operating system needs to retain to run the JVM.

As mentioned above, the concept of heap space exists both in Linux process memory layout and JVM memory layout but differs immensely. Therefore, it is easy for us to confuse one with the other. The Java heap is smaller in scope than the heap of a Linux process. It is a segment of logical space established by the JVM on its process heap space, while the process heap space also contains memory data that supports the JVM virtual machine to run, such as the Java thread stack, code cache, GC, and compiler data.

4. Why Does the Program Take up A Lot More Memory Than Xmx? Where Is the Memory Used?

In the eyes of Java developers, the objects opened up in Java code during running are all placed in the Java heap, so many people will equate Java heap memory with Java process memory. They also use the Java heap memory limit parameter Xmx as the process memory limit parameter and set the container memory limit to the same size as Xmx, only to find that the container is out of memory.

In essence, in addition to the heap memory (Heap), the JVM has the so-called non-heap memory (Non-Heap), excluding the memory managed by the JVM and the local memory that bypasses the JVM directly. The memory usage of the Java process is briefly summarized in the following figure:

2

JDK8 introduces the Native Memory Tracking (NMT) feature that tracks the internal memory usage of the JVM. By default, NMT is turned off and on using the JVM parameter: -XX:NativeMemoryTracking=[off | summary | detail]

$ java -Xms300m -Xmx300m -XX:+UseG1GC -XX:NativeMemoryTracking=summary -jar app.jar

Here, the maximum heap memory is limited to 300 MB, G1 is used as the GC algorithm, and NMT is enabled to track the memory usage of the process.

Note: Enabling NMT results in a performance overhead of 5% -10%.

After NMT is enabled, you can use the jcmd command to print the JVM memory usage. Here, only the memory summary information is displayed. The unit is set to MB.

$ jcmd <pid> VM.native_memory summary scale=MB

Total JVM Memory

Native Memory Tracking:
Total: reserved=1764MB, committed=534MB

The NMT report shows that the process currently has 1764MB of reserved memory and 534MB of committed memory, which is much higher than the maximum heap memory of 300 MB. Retention refers to opening up a continuous period of virtual address memory for the process, which can be understood as the possible amount of memory that the process uses. Commit refers to mapping the virtual address with physical memory, which can be understood as the amount of memory currently occupied by the process.

It should be noted that the memory counted by NMT is different from the memory counted by the operating system. Linux follows the lazy allocation mechanism when allocating memory, and only when the process accesses memory pages is it swapped into physical memory. Therefore, the physical memory usage of the process the top command sees is different from that seen in the NMT report. NMT is used here to describe memory usage from the JVM perspective.

Java Heap

Java Heap (reserved=300MB, committed=300MB)
    (mmap: reserved=300MB, committed=300MB)

Java heap memory, as it is set, opens up 300M of memory space.

Metaspace

Class (reserved=1078MB, committed=61MB)
      (classes #11183)
      (malloc=2MB #19375) 
      (mmap: reserved=1076MB, committed=60MB)

The loaded classes are stored in Metaspace, where 11183 classes are loaded, with nearly 1G reserved, and 61M submitted.

The more classes you load, the more metaspace you use. The size of the metaspace is limited by -XX:MaxMetaspaceSize (unlimited by default) and -XX:CompressedClassSpaceSize (1G by default).

Thread

Thread (reserved=60MB, committed=60MB)
       (thread #61)
       (stack: reserved=60MB, committed=60MB)

The JVM thread stack also needs to occupy some space. Here, 61 threads occupy 60M of space, and the stack of each thread is about 1M by default. The stack size is controlled by the -Xss parameter.

Code Cache

Code (reserved=250MB, committed=36MB)
     (malloc=6MB #9546) 
     (mmap: reserved=244MB, committed=30MB)

The code cache area is mainly used to store the code and Native methods compiled by the JIT instant compiler. Currently, 36M of code is cached. You can use the -XX:ReservedCodeCacheSize parameter to set the capacity of the code buffer area.

GC

GC (reserved=47MB, committed=47MB)
   (malloc=4MB #11696) 
   (mmap: reserved=43MB, committed=43MB)

The garbage collector (GC) also needs some memory space to support GC operations. The space occupied by GC is related to the specific GC algorithm used. The GC algorithm here uses 47M. In other cases where the configuration is the same, use SerialGC instead:

GC (reserved=1MB, committed=1MB)
   (mmap: reserved=1MB, committed=1MB)

You can see that the SerialGC algorithm uses only 1M of memory. This is because SerialGC is a simple serial algorithm that involves a simple data structure and a small amount of calculated data, so the memory occupied is also small. However, a simple GC algorithm may cause performance degradation, and you need to balance the performance and memory before you make a choice.

Symbol

Symbol (reserved=15MB, committed=15MB)
       (malloc=11MB #113566) 
       (arena=3MB #1)

The symbol of the JVM contains a symbol table and a string table, which occupies 15 MB.

Non-JVM Memory

NMT can only count the internal memory of the JVM, and some of the memory is not managed by the JVM. In addition to JVM-managed memory, programs can explicitly request off-heap memory ByteBuffer.allocateDirect, which is limited by the -XX:MaxDirectMemorySize parameter (equal to-Xmx by default). JNI modules loaded by System.loadLibrary can also apply for off-heap memory without JVM control.

In summary, no model can accurately measure the memory usage of Java processes. What we can do is take into account as many factors as possible. Some memory areas can be limited by JVM parameters (such as code cache and metaspace), but some memory areas are not controlled by JVM and are related to specific application code.

Total memory = Heap + Code Cache + Metaspace + Thread stacks + 
               Symbol + GC + Direct buffers + JNI + ...

5. Why Do Online Containers Require More Memory Than Local Test Containers?

Users often give feedback like why running the same code in an online container always consumes more memory than running locally (and even OOM occurs). Here are some possible answers:

JVM Version without Container-Aware

On a general physical or virtual machine, when the-Xmx parameter is not set, the JVM will find the maximum amount of memory it can use from a common location (for example, the /proc directory in Linux) and then use 1/4 of the maximum memory of the host as the default JVM maximum heap memory. However, the early JVM version did not adapt itself to the container. When JVM runs in the container, the JVM maximum heap is still set according to 1/4 of the host memory. However, the host memory of the general cluster node is much larger than the local development machine, and the Java process heap space in the container is opened larger, which naturally consumes more memory. At the same time, the container is subject to the Cgroup resource limit. When the memory usage of the container process group exceeds the Cgroup limit, the container process group will be out of memory. For this reason, OpenJDK after 8u191 introduces the UseContainerSupport parameter that's enabled by default, which enables JVM in the container to perceive the container memory limit and set the maximum heap memory amount according to 1/4 of the Cgroup memory limit.

Online Business Consumes More Memory

Businesses that provide external services often bring more active memory allocation actions (such as creating new objects and starting execution threads). These operations need to open up memory space, so online businesses often consume more memory. The higher the traffic peak, the more memory consumed. Therefore, it is necessary to expand the application memory configuration according to its business traffic to ensure the quality of service.

6. Configuration Recommendations for Cloud-Native Java Application Memory

  1. Use the container-aware JDK version. For clusters that use Cgroup V1, upgrade to 8u191+, Java 9, Java 10, and later. For clusters that use Cgroup V2, upgrade to 8u372+ or Java 15 and later.
  2. Use the NativeMemoryTracking (NMT) to understand the JVM memory usage of your application. NMT can track the memory usage of the JVM. In tests, NMT can be used to figure out the approximate distribution of the memory used by the program JVM as a reference for memory capacity configuration. The JVM parameter-XX:NativeMemoryTracking is used to enable NMT. After NMT is enabled, you can run the jcmd command to print the memory usage of the JVM.
  3. Set the container memory limit based on the memory usage of the Java program. The Cgroup memory limit value of a container is derived from the memory limit value set for the container. When the amount of memory used by the container process exceeds the limit, container OOM occurs. In order to prevent OOM from occurring when the program runs normally or when the business fluctuates, you should set the container memory limit by 20% to 30% based on the amount of memory used by the Java process. If you do not know the actual memory usage of a program that is run for the first time, you can set a large limit to allow the program to run for a period and adjust the container memory limit based on the process memory observed.
  4. Automatically dump memory snapshots during OOM and configure persistent storage for dump files. For example, you can use a PVC to mount the dump files to a hostPath, OSS, or NAS file system. This way, the on-site data is retained as much as possible to support subsequent troubleshooting.
2 1 1
Share on

You may also like

Comments

Dikky Ryan Pratama May 8, 2023 at 6:50 am

very good article

5885210357136728 March 15, 2024 at 12:25 am

Incredibly useful article but without understanding underlying OS process memory management, it could be hard to follow.

Related Products