Comprehensive troubleshooting is used to quickly locate the faulty link through the business primary key. This function must be used with the monitoring function. This topic describes the best practices of comprehensive troubleshooting.
- An application monitoring job has been created in the ARMS console, and ARMS Java Agent has been mounted and started in the Java program for application monitoring. For details, see the procedures for ARMS Java Agent installation in Manually connect common applications.
In the program, arms-sdk-1.7.0.jar is introduced.
Note: If you cannot obtain the pom.xml file, download arms-sdk-1.7.0-SNAPSHOT.jar.
After the preceding prerequisites are met, execute the following codes to retrieve TraceId and RpcId:
Span span = Tracer.builder().getSpan();
String traceId = span.getTraceId();
String rpcId = span.getRpcId();
After retrieving TraceId and RpcId, print and output business logs as needed. The following shows sample business logs containing TraceId and RpcId. The logs are output to /home/admin/logs/example/example.log. You can also output logs to SLS, MQ, or other channels.
Each of the preceding business logs represents a trajectory of the user.
Create a custom monitoring job by following the instructions provided in the topic “Custom monitoring”, use the preceding sample business logs as data sources, and split logs in custom mode as shown in the following figure.
Then Create a comprehensive troubleshooting event set and configure the event set as shown in the following figure.
Business Primary Key: indicates the field used to search for business events. In this example, the business primary keys are action and username.
Select a time field: Select business time rather than system time.
TraceID: Set this as required.
RpcID: Set this as required.
After configuring the event set, start the custom monitoring job.
In this case, the business log represents the trajectory of the user, and the corresponding application is a shopping website. Assume that user kevin.yang complained that he failed to place an order after 14:20 on July 12, 2018. You can identify the cause using either of the following methods:
Method 1: Query traces.
- In the left-side navigation pane, chooseMulti-dimensional Query. On the Instances page, click the Trace Querytab.
Enter the date range in Parameter Value for Date, and select Business Primary Key from the Parameter Name drop-down list below. In the right-side Parameter Value text box, enter the business primary key value, for example, username:kevin.yang. Then click Search. All traces within the specified time range are displayed in the search results.
Click the TraceId of an abnormal trace in the search results, and then click the Business Trajectory tab. All business events corresponding to the TraceId are displayed. Identify the causes based on the business events.
Method 2: Query comprehensive troubleshooting events.
- In the left-side navigation pane, chooseMulti-dimensional Query, and click the Event Query tab.
Enter the date range in Parameter Value for Date, and select the event set previously configured from the Comprehensive Troubleshooting Event Set drop-down list . Then click Search. All traces within the specified time range are displayed in the search results.
Click Trace Query in the search results, and then click the Business Trajectory tab. All business events corresponding to the TraceId are displayed. Identify the causes based on the business events.