On the Trace Details page, you can query the details about a trace based on the trace ID in the selected region.
The Trace Details page shows traces of methods that are called remotely. It does not display methods that are called locally.
Trace details are used to locate the elapsed time and exceptions in each step during a distributed call. Local calls are not the focus of traces. We recommend that you view service logs to check the elapsed time and exceptions for local calls. For example, the Trace Details page does not display the process where the local logic methodA() calls localMethodB() and localMethodC(). Therefore, sometimes the elapsed time on a parent node is greater than the total elapsed time on all subnodes.
To view trace details, you can log on to the EDAS consoleEDAS console, choose Microservice Governance > Spring Cloud/Dubbo/HSF, and then click Trace Detail. In more typical cases, you can view slow or abnormal services in the trace query result. The following uses an example to describe how to view the details of a trace through trace query.
- In the left-side navigation pane, choose Trace query.. On the Trace Query page, perform
- On the Trace Query page, find the most time-consuming HSF method, database request, or other remote
- For database, Redis, Message Queue (MQ), or other simple calls, identify the cause of slow access to these nodes and check whether slow SQL or network congestion occurs.
- For a High-speed Service Framework (HSF) method, further analyze the reason why the method consumes so much time.
- Confirm the elapsed time on a local method.
Move the pointer over the timeline of the method. A pop-up window appears, showing the time it takes the consumer to send the request, the time it takes the provider to process the request, and the time it takes the consumer to receive the response.
If the time it takes the provider to process the request is long, analyze the service. Otherwise, analyze the cause by using the method for analyzing call timeout.
As shown in the following figure, it takes 606 ms for the provider to process the request.
- Check whether the total elapsed time on subnodes is close to the elapsed time on this
- If the time difference is small, most of the time is consumed by network calls. In this case, reduce network calls as much as possible to shorten the elapsed time on each method, as shown in the following figure.
- If the time difference is large, for example, the elapsed time on the parent node is 607 ms while the total elapsed time on the subnodes is less than 100 ms, as shown in the following figure, the time is consumed on the service logic of the provider, rather than the request of the remote call.
- Locate the time-consuming call.
Inspect time-consuming calls by viewing the timelines of nodes to first locate the call initiated before the excessive time consumption. The red box in the following figure marks the time-consuming logic. This is the local logic, for which further troubleshooting is required.
- After locating the time-consuming logic, review the code or add a logging method to the code to locate the specific error.
- If the code does not consume so much time, check whether garbage collection (GC) occurred according to gc.log .
- Locate the timeout error.
As shown in the following figure, a timeout error occurred. Evaluate the time as follows:
The time is divided into three parts:
Consumer sends request (0 ms): indicates the elapsed time from the sending of a request by the consumer to the receipt of the request by the provider, including the time for serialization, network transmission, and deserialization. If this process takes a long time, check whether consumer GC is triggered. A lot of time is consumed if the serialization or deserialization object is large, the network is under a high transmission load, or provider GC occurs.
Provider processes request (10077 ms): indicates the elapsed time from the receipt of the request by the provider to its response to the consumer. During this period, the provider processes the request, and the time consumed by other operations are not included.
Consumer receives the response (3002 ms): indicates the elapsed time from the sending of the response by the provider to the receipt of the response by the consumer. With the 3s timeout period, the provider directly returns a timeout error if the operation times out, but the provider continues processing the request. If this process consumes a lot of time, perform troubleshooting by using the same method as that for the consumer sending the request.