Diagnose Latency & Root Cause with Trace Explorer - ARMS

When distributed applications experience slow responses or failures, isolating the root cause across dozens of services requires correlating spans, filtering by duration, and comparing traces at scale. Trace Explorer in Application Real-Time Monitoring Service (ARMS) provides real-time query and analysis of stored full trace data. Combine filter conditions and aggregation dimensions to pinpoint slow requests, diagnose failures, and surface common patterns across traces.

Prerequisites

Before you begin, make sure that you have:

An ARMS agent installed for the application

Important

Application Monitoring provides a redesigned application details page for users on the new billing mode. If you have not switched to the new billing mode, click Switch to New Version on the Application List page to access the new interface.

Open Trace Explorer

Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring > Application List.
On the Application List page, select a region in the top navigation bar and click the application name.
Note
The Language column indicates the language each application is written in:
- : Java
- : Go
- : Python
- Hyphen (-): application monitored through Managed Service for OpenTelemetry.
In the top navigation bar, click Trace Explorer.
In the upper-right corner, select a time range.

Filter traces

Use any of the following methods to narrow down results:

Quick Filter: In the Quick Filter section, filter by status, duration, application name, span name, or host address. Selected conditions appear in the search box.
Search box: Click the search box to modify existing filter conditions or add new ones from the dropdown.
Query statement: Type a query directly in the search box. For syntax details, see Trace Explorer query syntax.

Note

Click the icon next to the search box to save filter conditions.
Click Saved View to load previously saved filters.
Aggregate queried data by specific dimensions for deeper analysis.

Trace list

After you apply filters, Trace Explorer displays the results in several sections: a bar chart showing call counts and HTTP errors, a time series curve for call duration, and span and trace lists. Use the trace list to scan results and navigate to individual traces for deeper investigation.

Span status legends

Legend	Meaning	Source
	Normal	Spans with `statusCode=1`
	Error	Spans with `statusCode=2`
	Exception	Spans where `attributes.excep.ids` contains a value

The bar chart in the upper section color-codes HTTP status codes: 2XX (green), 3XX (yellow), 4XX (orange), and 5XX (red). These codes come from the attributes.http.status_code or attributes.http.response.status_code field. If a span contains both fields, the HTTP status code takes precedence over the span state indicator.

Available actions

Click a trace ID or click Details in the Actions column to view trace details and topology. See Trace details.
Click Logs in the Actions column to view trace-related logs. See Log analysis.
Click the icon to expand all spans under a trace ID. By default, only root spans are displayed per trace ID.
Click the icon in the upper-right corner to show or hide list columns.
Hover over a trace and click the icon to add its parameter values as a filter condition.

Scatter chart

The Scatter plot tab plots each trace as a point with time on the X axis and duration on the Y axis. Use this view to identify outlier traces with unusually high latency. Hover over a point to view basic trace information, or click it to open trace details. See Trace details.

Trace aggregation

While the trace list analyzes individual spans, trace aggregation reconstructs complete traces for a higher-level view. It queries up to 5,000 distributed traces based on your conditions, retrieves all corresponding spans by trace ID, and aggregates the results while preserving trace integrity. Use this view to understand how time and errors are distributed across your service call graph.

Note

Aggregate queries run on trace data matching your conditions. Combining multiple conditions may increase calculation time.

Aggregation parameters

Parameter	Description
spanName	Name of the span.
serviceName	Application associated with the span.
Number of requests / percentage of requests	Percentage of total requests that call this span. Formula: requests calling this span / total requests x 100%.
span / Request Multiple	Average number of times each request calls this span. Formula: total spans / total requests.
Average self-consumption / proportion	Average duration of the span excluding its child spans. Formula: span duration - total child span durations. Important For asynchronous calls, child span durations are included.
Average Duration	Average duration of the span.
Number of exceptions / percentage of exceptions	Percentage of requests with exceptions. Formula: requests with exceptions / total requests. Important The number of requests with exceptions is not the same as the total exception count. If the request multiple exceeds 1, a single request may produce multiple exceptions.

Aggregation example

Consider a trace where Span A calls Span B and Span C:

spanName	serviceName	Requests / percentage	span / Request Multiple	Average self-consumption / proportion	Average Duration	Exceptions / percentage
A	demo	10 / 100.00%	10 / 1.00	5.00 ms / 25.00%	20 ms	2 / 20.00%
- B	demo	4 / 40.00%	8 / 2.00	16.00 ms / 100.00%	16 ms	2 / 50.00%
- C	demo	1 / 10.00%	1 / 1.00	4.00 ms / 100.00%	4 ms	1 / 100.00%

How to read this table:

Request distribution: Span A has 10 requests (100%). Only 4 of those requests call Span B (40%), and only 1 calls Span C (10%). The remaining requests skip Span B and C due to conditional logic or exceptions.
Span frequency: Span A is called once per request (multiple = 1.00). Span B has 8 spans across 4 requests, so each request calls Span B twice on average (multiple = 2.00).
Self-time distribution: Span A's self-time is 5.00 ms (25% of the 20 ms average duration), meaning 75% of the time is spent in child spans. Span B and C show 100% self-time because they have no children.
Exception distribution: Span A has 2 exceptions across 10 requests (20%). Span B has 2 exceptions across 4 requests (50%). Because Span B's request multiple is 2.00, a likely distribution is: 2 of the 4 requests encounter an exception on the first call, with the second call succeeding.

Note

To view details for a specific trace, hover over the span name (shown in blue) and click the recommended traceId.

Trace topology

The Full Link Topology tab shows the inter-application call topology for aggregated traces. Use this view to visualize service dependencies and identify error-prone or slow links between applications. Each application node displays the request count, error count, and response time.

Analyze slow and failed traces

Trace Explorer surfaces common patterns across slow and failed traces to help locate root causes. Narrow your investigation by host, interface, or combined conditions such as serviceName="arms-demo" AND ip="192.168.1.1". This analysis also helps you identify slow interfaces for targeted optimization.

Slow trace analysis

ARMS analyzes the 1,000 longest-duration traces and surfaces the five dimensions most correlated with slowness.

Slow trace details

ARMS selects the 1,000 longest traces that exceed the configured threshold, samples 1,000 traces below the threshold, compares them, and identifies the three characteristics most correlated with slow calls.

Note

Set the threshold based on your needs. For example, to find characteristics of traces exceeding 1 minute, set the threshold to 60000 milliseconds.

Failed trace analysis

ARMS randomly samples 1,000 failed traces and surfaces the five dimensions most correlated with failures.

Failed trace details

ARMS compares failed traces with normal traces and identifies the three characteristics most correlated with failures.

Trace details

The trace details view has four sections, labeled 1 through 4 in the following figure.

Component tags (1)

Tags represent call types and span counts. Call types are determined by the attributes.component.name field. Click a tag to show or hide its related spans.

Trace bar chart (2)

The horizontal bar chart provides a high-level overview of the trace and its span distribution. Use it to quickly identify where time is spent across the trace.

Each bar represents a span. Only spans with a duration exceeding 1% of the total trace duration are shown.
Colors distinguish different applications. In the preceding figure, blue represents the opentelemetry-demo-adservice application.
A black line within each bar represents self-time: the span's total duration minus time spent in child spans. For example, if Span A takes 10 ms and its child Span B takes 8 ms, Span A's self-time is 2 ms.
The timeline shows the full time range of the trace.

Focus and filter spans (3)

Each row represents a span with its parent-child hierarchy. Parent spans display a number indicating their child span count. Available operations:

Collapse/Expand: Click the icon to collapse or expand a span and its children.
Focus: Select a span and click the icon to display only that span and its downstream spans.
Defocus: Click the icon to exit focus mode.
Filter: Enter a span name, application name, or attribute in the search box to filter the trace down to matching spans and their path to the entry span. Clear the search box and click the Search icon to remove the filter.
Zoom: Click the icon to zoom in and hide the bar chart. Click the icon to restore it.

Span details (4)

The span details panel shows attributes, metrics, logs, and exception data for the selected span.

Additional Information: Displays span attributes, resources, details, and events, grouped by type. For field descriptions, see Trace Explorer parameters.
Metrics: Shows metrics related to the span.
- For Java applications monitored by ARMS: JVM and host metrics.
- For traces from open source agents: RED Method metrics (rate, errors, duration).
Logs: Displays business logs linked to the trace. If a Simple Log Service (SLS) Logstore is configured for the application, query business logs by trace ID in the Logstore.
Exceptions: Shows exception information for the span, if any.
Event Config: Configure interaction events for one or more trace attributes to access additional trace details or related logs and metrics. See Configure a custom interaction event for a trace.

Access trace data programmatically

Trace data is stored in Simple Log Service (SLS). Access it programmatically for custom analysis:

SLS project: proj-xtrace-<encode>-<region-id> (for example, if the region is cn-hangzhou)
Logstore: logstore-tracing

For data format details, see Trace Explorer parameters. For analysis examples, see Analyze trace data in real time.

FAQ

Why are SQL statements truncated in traces?

ARMS limits SQL statements to 1,024 characters by default. To increase this limit, go to the Custom Configurations tab of your application and change Maximum retention length for SQL collection in the Database call configuration section.

Why do Nacos spans with 30-second durations appear after upgrading to ARMS agent V4.x?

ARMS agent 4.x instruments HTTP long polling requests that Nacos uses for configuration updates and service discovery. These spans are not captured in V3.x. For details, see ARMS agent V4.x upgrade announcement.

To exclude these spans, add a filter rule in the Interface call configuration section on the Custom Configurations tab of your application.

Note

This feature requires ARMS agent V4.2.x or later.

Application Real-Time Monitoring Service:Trace Explorer

Prerequisites

Open Trace Explorer

Filter traces

Trace list

Span status legends

Available actions

Scatter chart

Trace aggregation

Aggregation parameters

Aggregation example

Trace topology

Analyze slow and failed traces

Slow trace analysis

Slow trace details

Failed trace analysis

Failed trace details

Trace details

Component tags (1)

Trace bar chart (2)

Focus and filter spans (3)

Span details (4)

Access trace data programmatically

FAQ

Why are SQL statements truncated in traces?

Why do Nacos spans with 30-second durations appear after upgrading to ARMS agent V4.x?

See also