Application Real-Time Monitoring Service (ARMS) provides trace data for developers to track code execution. A trace captures the end-to-end path of a request in a distributed system. When a service calls another, a trace is generated, visualizing the hierarchial call sequence.
Benefits
In a distributed system, processing a request often involves calling multiple services. If a request times out, encounters an error, or throws an exception, troubleshooting can become challenging. Traces offer the following advantages to O&M engineers:
Troubleshooting: When a request fails or an error occurs, traces reveal the entire request path and execution status of each service, enabling quick identification of the errors.
Performance optimization: By analyzing traces, O&M engineers can identify request execution times and system bottlenecks, facilitating performance improvements.
System monitoring: Traces provide real-time system monitoring and analysis, helping O&M personnel assess system health and resource utilization.
Terms
Trace
A trace captures the complete execution of a request or transaction, from start to finish. For example, the full lifecycle of a request sent from a client being received and processed constitutes a trace. Structurally, each trace is a tree of spans and assigned a unique trace ID. This ID remains consistent throughout the request's lifecycle, enabling centralized querying of all related spans for debugging.
Span
As the basic unit of distributed tracing, a span represents a single logical operation within a trace. A span can be a method call, a program block invocation, a remote procedure call (RPC), or a database query. Each span is assigned a unique span ID, with start and end timestamps captured. Parent span IDs are also associated with each span, indicating the upstream span that preceded the current one. Spans are nested to form the trace's tree structure, mapping the service dependencies.