All Products
Search
Document Center

Application Real-Time Monitoring Service:Use flame graphs to locate performance bottlenecks

Last Updated:Apr 27, 2025

Application Real-Time Monitoring Service (ARMS) provides the continuous profiling feature that generates flame graphs for root cause analysis of performance bottlenecks, such as high CPU/memory utilization and latency spikes. 

What is a flame graph

A flame graph is a visual profiling tool that graphically represents call stack hierarchies and their execution time distribution, enabling developers to identify performance bottlenecks.

f13b95a2436706e37974aad93e9e0a40

A flame graph consists of an x-axis, a y-axis, and multiple boxes. Each box represents a function in the stack. The x-axis measures the proportion of resource usage of a function, and the y-axis measures the depth of a function. By comparing flame graphs at different time points, you can efficiently diagnose and handle the performance bottlenecks of a program.

Categories

Flame graphs are classified into two categories: flame graph (narrow sense) and icicle graph. In a flame graph in a narrow sense, the top elements are at the top, and the bottom elements are at the bottom, as shown in Figure 1. In an icicle graph, the top elements are at the bottom, whereas the bottom elements are at the top, as shown in Figure 2.

Figure 1. Flame graph (narrow sense)

f13b95a2436706e37974aad93e9e0a40

Figure 2. Icicle graph

image

Use a flame graph

As a flame graph represents a stack, functions with wide boxes consume more CPU than those with narrow boxes.

In computer science, a stack is an abstract data type that serves as a collection of elements with two main operations: Push and Pop. Push operations insert elements into the stack, and Pop operations remove elements from the stack. The stack bottom contains functions that are initially called, and the stack top contains child functions that are recently called. When the last child function is executed at the top, it is removed from the stack. The more time consumed to execute the function, the more time consumed by its parent function and the wider its box, as shown in the following figure.

image

You can perform the following steps to analyze a flame graph:

  1. Find the top based on the flame graph type.

  2. If the total resource usage of the flame graph is high, check whether the stack top has wide boxes.

  3. If the stack top has a wide box, search from top to bottom, find the first method defined by the application, and then check whether the method can be optimized.

Example

The following figure shows a flame graph with high resource usage. Enable the continuous profiling feature and perform the following steps to discover performance bottlenecks.

image

  1. As it is an icicle graph with the stack top at the bottom and the stack bottom at the top, you need to analyze it from the bottom up.

  2. The java.util.LinkedList.node(int) method on the right side of the stack top has a wide box.

  3. Because the java.util.LinkedList.node(int) method is a library function of Java Development Kit (JDK), you need to search up further, and you can find the java.util.LinkedList.get(int) method and its parent method com.alibaba.cloud.pressure.memory.HotSpotAction.readFile(). As the first service method defined by the application, the com.alibaba.cloud.pressure.memory.HotSpotAction.readFile() method consumes 3.89 seconds, accounting for 76.06% of the stack. Therefore, a conclusion can be drawn that the com.alibaba.cloud.pressure.memory.HotSpotAction.readFile() method consumes a large amount of resources in the specified time period. You can use the method to analyze the logic of relevant methods and check whether they can be optimized.

    In addition, based on the java.net.SocketInputStream method in the lower-left corner of the flame graph, you can find the first parent method defined by the application is com.alibaba.cloud.pressure.memory.HotSpotAction.invokeAPI, accounting for about 23% of the stack.

References

When you use the continuous profiling feature: