All Products
Search
Document Center

Application Real-Time Monitoring Service:Use the code diagnostics feature to diagnose slow traces

Last Updated:Apr 11, 2024

The code diagnostics feature of Application Real-Time Monitoring Service (ARMS) uses the continuous profiling technology to regularly collect method stack snapshots of threads and simulate code execution.

Scenarios

  • If a promotion encounters slow calls, the code diagnostics feature can quickly locate the faulty code.

  • If the system encounters a large number of slow calls, the code diagnostics feature can automatically save the faulty code.

  • If your business is too complex to reproduce occasional slow calls, the code diagnostics feature can simulate code execution and method calls.

  • If the methods and instrumentation at non-framework layers are missing from traces, the code diagnostics feature helps you restore the time consumed for the methods about instrumentation.

Prerequisites

  • The version of the ARMS agent is 3.1.4 or later.

  • The operating system kernel and JDK version meet the requirements. The code diagnostics feature depends on the continuous profiling technology. For more information, see Use the continuous profiling feature.

  • Only synchronous traces support the code diagnostics feature. Asynchronous traces do not support the feature.

Enable the code diagnostics feature

  1. Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Applications.

  2. On the Applications page, select a region in the top navigation bar and click the name of the application that you want to manage.

    Note

    If the Java图标 icon is displayed in the Language column, the application is connected to Application Monitoring. If a hyphen (-) is displayed, the application is connected to Managed Service for OpenTelemetry.

  3. In the left-side navigation pane, click Application Settings. On the page that appears, click the Custom Configuration tab.

  4. In the Continuous profiling section, turn on Main switch and Code hotspot, and then configure the IP address of an application instance or the CIDR blocks of multiple instances.

  5. In the lower part of the tab, click Save.

    The modification takes effect without the need to restart the application.

View hotspot code data on the Interface Invocation page

Example: Parse and traverse JSON data and call downstream HTTP interfaces.

public class HotSpotAction extends AbsAction {

  private RestTemplate restTemplate = new RestTemplate();
  
  // The request method.
  @Override
  public void runBusiness() {
    readFile();
    invokeAPI();
  }

  // Execute an HTTP call.
  private void invokeAPI() {
    String url = "https://httpbin.org/get";
    String response = restTemplate.getForObject(url, String.class);
  }

   // Read and parse the file data.
  private double readFile() {
    InputStreamReader reader = new InputStreamReader(
        ClassLoader.getSystemResourceAsStream("data/xxx.json"));
    LinkedList<Movie> movieList = GSON.fromJson(reader, new TypeToken<LinkedList<Movie>>() {
    }.getType());
    double totalCount = 0;
    for (int i = 0; i < movieList.size(); i++) {
      totalCount += movieList.get(i).rating();
    }
    return totalCount;
  }
}
  1. Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Applications.

  2. On the Applications page, select a region in the top navigation bar and click the name of the application that you want to manage.

    Note

    If the Java图标 icon is displayed in the Language column, the application is connected to Application Monitoring. If a hyphen (-) is displayed, the application is connected to Managed Service for OpenTelemetry.

  3. In the left-side navigation pane, click Interface Invocation. On the page that appears, select an interface and click the Interface Snapshot tab.

  4. On the Interface Snapshot tab, click a trace ID.

  5. Click the Magnifier icon in the Details column and then click the Code Hotspot tab.

    image

    The left side of the figure shows the time consumed for all methods involved, and the right side shows the flame graph of all method stack information of each method.

    • The Self column displays the time or resources that each method consumes within the stack, excluding the time or resources that their child methods consume. The data can be used to identify methods that spend excessive time or resources for their own.

    • The Total column displays the time or resources consumed for each method itself, including the time or resources consumed for all of its child methods. The data can be used to identify methods that contribute the most time or resources.

    When you analyze code logic, you can locate the time-consuming methods by focusing on the Self column or the wide flame at the bottom of the right-side flame graph. Generally, wide flame indicates a system performance bottleneck. The java.lang.Thread.sleep() method in the preceding figure consumes much time because of system performance bottlenecks.

    Based on the preceding figure, perform the following analysis.

    1. Arrange the values in the Self column in ascending order. Find and click the method java.util.LinkedList.node(int) with the largest value. The relevant methods are shown in the flame graph.

      image

    2. You can find that the java.util.LinkedList.node(int) method has the widest box at the stack top of the flame graph.

    3. Because the java.util.LinkedList.node(int) method is a library function of Java Development Kit (JDK), you need to search up further, and you can find the java.util.LinkedList.get(int) method and its parent method com.alibaba.cloud.pressure.memory.HotSpotAction.readFile(). As the first service method defined by the application, the com.alibaba.cloud.pressure.memory.HotSpotAction.readFile() method consumes 3.75 seconds, accounting for 69.88% of the stack. Therefore, a conclusion can be drawn that the com.alibaba.cloud.pressure.memory.HotSpotAction.readFile() method consumes a large amount of resources in the specified time period. You can use the method to analyze the logic of relevant methods and check whether they can be optimized.

FAQ

  • Why is the time consumed for the code less than that for the request?

    To minimize the impact of the code diagnostics feature on application performance, we have optimized data collection. This way, the time displayed is less than the actual time. Generally, the deviation is no more than 20 milliseconds. We recommend that you ignore the deviation of absolute values and focus on the methods that consume the longest time.

  • Do the statistics of code diagnostics have limitations?

    • For requests that take more than 15 minutes, the code diagnostics feature provides only the analysis data for the first 15 minutes.

    • To reduce system overheads, Application Monitoring does not collect code data from requests that consume less than 500 milliseconds. Therefore, the Hotspot Code tab is not available for all requests.