Java memory analysis using MAT - Alibaba Cloud Developer Forums: Cloud Discussion Forums

Forum Moderator
Forum Moderator
  • UID539
  • Fans3
  • Follows0
  • Posts19

Java memory analysis using MAT

More Posted time:Aug 23, 2016 15:43 PM
Abstract: Heap dump is a snapshot of Java memory at a specific time. A full GC is usually conducted before each heap dump, so that the dumped content contains objects after the GC. Content of dump files: 1. All objects: class, field, native values and references; 2. All classes: classloader, class name, superclass and static fields; 3. G...
Heap dump is a snapshot of Java memory at a specific time. A full GC is usually conducted before each heap dump, so that the dumped content contains objects after the GC.

Content of dump files:
1. All objects: class, field, native values and references;
2. All classes: classloader, class name, superclass and static fields;
3. GC root: JVM-defined reachable objects;
4. Thread stacks and local variables: call stacks of the thread, information of each frame of local objects.
The dump file does not contain memory allocation information, so it is not possible to check who has created an object and which object he/she has created.
Shallow heap is the memory space of an object. An object needs 32 or 64 bits.
Retained set of X is a group of objects that have been removed after X is collected in the JVM GC.
Retained heap of X is the sum of shallow heap sizes of all objects in the retained set of X. In other words, it is the memory space needd to keep X active.
More generally, the shallow heap is the actual memory space of an object, while the retained heap is the released memory space after an object is collected in GC.

This figure illustrates what the leading set and retained set are.

Dominator tree: defines an object x dominates object y. Every route from root to y passes x. In other words, as long as there is an activey object, there must be an x object. Dominator tree is the tree structure converted from the reference relationship graph. It helps to find the dependency between objects for staying alive and recognizes the maximum chunk of retained memory. Immediate dominator x of y is the dominator closest to y.
Dominator tree has several attributes:
1. Objects contained in the subtree of object x (object set of x dominate) represent the retained set of x;
2. If x is the immediate dominator of y, the immediate dominator of x dominates y in the same way, and so on;
3. The sides in the dominate tree are not equal to the sides in the reference relationship graph, as they are not strict and direct object references.

This figure demonstrates the conversion from a reference relationship graph to a dominator tree.

GC root: One GC root is one object which can be accessed and read from outside the heap. Using the methods below, you can make an object a GC root.
1. System class: The class loaded by bootstrap or system-class loaders, such as java.util.* in rt.jar;
2. JNI local: The local variable in native code, such as user-defined JNI code and JVM internal code;
3. JNI global: The global variable in native code;
4. Thread block: The objects referenced in the currently active thread blocks;
5. Thread: started threads that are not stopped;
6. Busy monitor: The objects with wait() or notify() called or objects that are synchronized. For synchronized methods, the static method refers to the class, and non-static method refers to the object;
7. Java local: The local variable, such as the incoming parameters of the method and variables created in the method;
8. Native stack: The incoming and outgoing parameters in native code, such as file/net/IO methods and reflection parameters;
9. Finalizable: The object waiting for its finalizer to run in a queue;
10. Unfinalized: An object with the finalize method, but it is neither finalized nor in the finalizer queue to be finalized;
11. Unreachable: The objects that won’t be reached. They are marked as root in MAT to retain objects. Otherwise they won’t appear in the analysis;
12. Java stack frame: Java stack frames contain local variables. It will be generated when the dump is resolved and the stack frame is set as an object in Preferences;
13. Unknown: The root type of the location.

Next, we will provide some methods to get the dump file:
1. Dump at OOM: JVM parameter: -XX:+HeapDumpOnOutOfMemoryError
2. Dump in an interactive environment:
1) JVM parameter: -XX:+HeapDumpOnCtrlBreak
2) Using external tools: jmap -dump:format=b,file=<filename.hprof> <pid>
3) Using external tools: jconsole
4) Using external tools: MAT
5) kill -3 <pid>
6) jstack -l <pid> > <dumpfile>

Here are some troubleshooting methods:
1. Big objects searched out through top consumers can be grouped by class, classloader and package;
2. Finding the accountable object through immediate dominator is very helpful for quickly locating the holder of a group of objects. This operation directly addresses the question of “who makes these objects alive” instead of the question of “who has the reference of these objects”, increasing directness and efficiency;
3. Run classloader for analysis. The importance is highlighted in two aspects. First, applications can use different classloaders. Second, classes by different classloaders are stored in different permanent generations, which are collectable in theory. When a class was loaded by different classloaders, the more important loader should be determined based on the number of instances under the loader, and the other loader should be collected;
4. Thread analysis. The heap dump contains the thread information and the overview and details about the thread can be viewed in MAT. In thread details, information about the heap memory of the thread and thread stack is provided, along with the local stack of the operating system. If heap dump is not adopted and we find something is wrong with the system, how can we troubleshoot the problem through threads? First, run top -H -p <pid> to view the running status of Java applications in the thread mode and locate the most CPU-consuming or memory-consuming thread. Record the thread ID, then input printf %x <tid> to turn it into hexadecimal coding, input jstack -l <pid> > thread.log to dump the Java process thread. Find the tid and analyze which thread uses the system resources.
5. Java container class analysis. Because Java container class is most frequently used to store objects, it is at most risk to cause mey leakage. We can see this issue from several perspectives: 1) Query array fill ratio (The fill ratio refers to the proportion of non-empty elements in the array). Print out the fill ratio frequency distribution of non-native type arrays to troubleshoot the array utilization in the system; 2) Query arrays by size. Print a histogram of arrays grouped by size; 3) Query collection fill ratio. ArrayList/HashMap/Hashtable/Properties/Vector/WeakHashMap/ConcurrentHashMap$Segment; 4) A histogram of collections grouped by size; 5) View all the objects in a list; 6) View all the objects in the hashmap; 7) View objects in the hashset; 8) Check the map collision rate; 9) Check all the arrays with only one constant.
6. Finalizer analysis. 1) Query objects being processed by the finalizer; 2) Query objects to be processed by the finalizer; 3) View the finalizer thread directly; 4) View the local objects of the finalizer thread.