Use the continuous profiling feature of Application Monitoring - Application Real-Time Monitoring Service

The continuous profiling feature of Application Monitoring can effectively discover bottlenecks caused by CPU, memory, or I/O in Java programs, and display statistics data by method name, class name, and line number. This helps developers optimize programs, reduce latency, increase throughput, and save costs. This topic describes how to enable the continuous profiling feature, and how to view the profiling data.

A performance test has been performed for the continuous profiling feature. When all the features of a mainstream Spring Web application were enabled, the CPU utilization increased by about 5%, the off-heap memory usage increased by about 50 MB, and the garbage collection (GC) and request latency barely increased.

Prerequisites

Important

Only Application Monitoring Pro Edition supports the continuous profiling feature. For information about how to activate the Pro Edition, see Pay-as-you-go.
The continuous profiling feature is not supported in some regions of Alibaba Finance Cloud and Alibaba Gov Cloud. If you want to use the feature in these regions, join the DingTalk group (ID: 22560019672) to obtain technical support.

Your application is monitored by Application Monitoring. The version of the ARMS agent is 2.7.3.5 or later. For information about how to monitor applications in Application Monitoring, see Application Monitoring overview. For information about how to upgrade the ARMS agent, see Upgrade the ARMS agent.
An Object Storage Service (OSS) bucket is specified in the access control policy of the virtual private cloud (VPC) where the application resides. The OSS bucket is used to store profiling data. If the bucket is not specified in the policy, the data cannot be collected. The name of the OSS bucket is arms-profiling-<regionId>. Replace <regionId> with the region ID. For example, if your application is deployed in the China (Hangzhou) region, the bucket name is arms-profiling-cn-hangzhou.

Limits

Operating system kernel

The operating system kernel must be Linux 2.6.32-431.23.3.el6.x86_64 or later.

Note

You can run the uname -r command to query the kernel version.

JDK version

The continuous profiling feature uses the Java Virtual Machine Tool Interface (JVM TI) to obtain the method stacks of an application. This allows you to obtain the CPU utilization and memory usage details during application runtime. JVM TI may cause application crashes. The following JDK versions have fixed this issue: OpenJDK 8u352, 11.0.17, and 17.0.5, and Oracle JDK 11.0.21 and 17.0.9. For more information, see AsyncGetCallTrace may crash JVM on guarantee. For other JDK versions, the Application Monitoring R&D team has implemented various tests and found that application crashes happen only in a minority of scenarios. Application Monitoring does not forcibly disable the continuous profiling feature. You can use the feature for specific application IP addresses when necessary. Nevertheless, to ensure the stability of your application, we recommend that you use a JDK version that meets the requirements.

The continuous profiling feature mainly depends on the debug symbols in JDKs. However, Alpine base images remove the debug symbols from JDKs to control memory usage. This way, the continuous profiling feature cannot be used as expected. We recommend that you do not use Alpine base images.

The following table lists the recommended JDK versions.

JDK	Version
OpenJDK	OpenJDK 8u352+ OpenJDK 11.0.17+ OpenJDK 17.0.5+
Oracle JDK	Oracle JDK 11.0.21+ Oracle JDK 17.0.9+

Enable the continuous profiling feature in the ARMS console

Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Applications.
On the Applications page, select a region in the top navigation bar and click the name of the application that you want to manage.
Note
If the icon is displayed in the Language column, the application is connected to Application Monitoring. If a hyphen (-) is displayed, the application is connected to Managed Service for OpenTelemetry.
In the left-side navigation pane, click Application Settings. On the page that appears, click the Custom Configuration tab.
In the Continuous profiling section of the Custom Configuration tab, turn on Main Switch, and configure the IP white list or Network segment address parameter.
On the Custom Configuration tab, click Save.

View profiling data

Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Applications.
On the Applications page, select a region in the top navigation bar and click the name of the application that you want to manage.
Note
If the icon is displayed in the Language column, the application is connected to Application Monitoring. If a hyphen (-) is displayed, the application is connected to Managed Service for OpenTelemetry.
In the left-side navigation pane, click Continuous profiling.
In the application instance list, select the application instance. On the right side of the page, set the time period.
On the Single View tab in the right-side pane, perform the following operations to query data and view aggregation analysis results.
1. In the Time window size section (icon 1), select a snapshot duration, and drag on the line chart to select a time period.
2. From the drop-down list (icon 2), select the data that you want to view, including data about CPU, JVM heap, and JVM GC.
3. As shown in the figure, data within the time period (icon 3) is displayed. You can click Aggregation & Analysis to view the snapshot details.
  Figure 1. Performance Analytics
  - The Self column displays the time or resources that each method consumes within the method stack, excluding the time or resources consumed by child methods. The data can be used to identify methods that spend excessive time or resources for their own.
  - The Total column displays the time or resources consumed by each method, including the time or resources consumed by all of its child methods. The data can be used to identify methods that contribute the most time or resources.
  When you analyze hotspot code, you can locate the time-consuming methods by focusing on the Self column or the wide flame at the bottom of the right-side flame graph. Generally, wide flame indicates a system performance bottleneck.
  Figure 2. Metrics
  Figure 3. Snapshots
On the Comparison View tab, you can compare and analyze data generated in two time periods.

Use the hotspot code feature

In addition to the continuous profiling feature, you can also enable the hotspot code analysis feature. This way, you can use the continuous profiling technology to regularly collect stack snapshots of request threads and simulate code execution. For more information, see Use the code diagnostics feature to diagnose slow traces.