When distributed requests slow down or fail, you need a way to pinpoint which service, span, or attribute is responsible. Trace Explorer in Application Real-Time Monitoring Service (ARMS) lets you filter, aggregate, and analyze stored trace data in real time to diagnose latency bottlenecks, error patterns, and inter-service dependencies across your entire call chain.
Typical troubleshooting workflow
A typical investigation follows this path:
Open Trace Explorer and set a time range.
Filter by status, duration, service, or span name to narrow results.
Review the trace list and HTTP status distribution for error spikes or latency outliers.
Drill into a trace to inspect span-level timing, attributes, and exceptions.
Aggregate span metrics across up to 5,000 traces to identify systemic patterns.
Filter and search traces
Log on to the ARMS console.
In the left-side navigation pane, choose Application Monitoring > Trace Explorer. Select a region in the top navigation bar.
Select a time range in the upper-right corner of the page.
Filter traces using any of the following methods:
Quick Filter: Select from predefined fields -- status, duration, application name, span name, or host address. Selected conditions appear in the search bar.
Drop-down filter panel: Click the search bar to open the panel. Modify existing conditions or add new ones.
Query statement: Type a query directly in the search bar. For syntax details, see Usage methods of Trace Explorer.
To save the current filter conditions as a view, click theicon next to the Aggregation Dimension drop-down list.
To load a saved view, click Saved View and select one from the list.
To group queried data, select an aggregation dimension.
Trace list
After filters are applied, Trace Explorer displays three visualizations:
HTTP status bar chart -- Distribution of HTTP response codes across traces.
Duration time series -- Span duration over time, revealing latency trends.
Span and trace lists -- Individual spans and traces with key metadata.

HTTP status code colors
The bar chart color-codes responses by HTTP status class:
| Status class | Color |
|---|---|
| 2XX | Green |
| 3XX | Yellow |
| 4XX | Orange |
| 5XX | Red |
Status codes are derived from attributes.http.status_code or attributes.http.response.status_code. When both an HTTP status code and a span status exist, the HTTP status code takes precedence.
Span status indicators
Color bars on the left side of the TraceId column indicate span status:
| Color bar | Status code | Meaning |
|---|---|---|
![]() | statusCode=0 | Unset |
![]() | statusCode=1 | OK |
![]() | statusCode=2 | Error |
The Status column uses icons to indicate span state:
| Icon | State | Condition |
|---|---|---|
![]() | Normal | statusCode=0 (unset) or statusCode=1 (OK) |
![]() | Error | statusCode=2 |
![]() | Exception | attributes.excep.ids contains a value |
Trace list actions
Click a trace ID or Details in the Actions column to view trace details and topology.
Click Logs in the Actions column to view trace-correlated logs.
Click the
icon to expand all spans under a trace ID. By default, only root spans are displayed per trace.Click the
icon in the upper-right corner to customize which columns appear in the list.Hover over a span and click the
icon to add that value as a filter condition.

Scatter plot
The Scatter plot tab plots each trace by time (X-axis) and duration (Y-axis), making latency outliers easy to spot visually. Hover over a point for basic trace information, or click it to open trace details.

Trace aggregation
Trace aggregation queries up to 5,000 distributed traces, retrieves their spans by trace ID, and aggregates the results -- preserving trace integrity throughout. This reveals systemic patterns that individual span analysis cannot surface.
Queries with multiple conditions may take time to compute. Wait for the results to load completely.

Aggregation metrics
| Metric | Description |
|---|---|
| spanName | Name of the span. |
| serviceName | Application associated with the span. |
| Request count / request ratio | Number of requests that call this span, as both a count and a percentage of total requests. Formula: request count / total requests x 100%. |
| Span count / request multiple | Average times each request invokes this span. Formula: span count / request count. A value of 2.0 means each request calls this span twice on average. |
| Average self-time / proportion | Time the span spends in its own logic, excluding child spans. Formula: total span time - time in all child spans. For asynchronous calls, self-time equals total span time. |
| Average duration | Average total duration of the span. |
| Exception count / exception ratio | Number of requests with exceptions in this span. Formula: requests with exceptions / total requests. The exception count differs from the total number of exceptions -- if the request multiple exceeds 1, a single request may produce multiple exceptions. |
Aggregation example
Consider a trace where Span A calls Span B and Span C:
| spanName | serviceName | Request count / ratio | Span count / multiple | Avg self-time / proportion | Avg duration | Exception count / ratio |
|---|---|---|---|---|---|---|
| A | demo | 10 / 100% | 10 / 1.00 | 5.00 ms / 25% | 20 ms | 2 / 20% |
| - B | demo | 4 / 40% | 8 / 2.00 | 16.00 ms / 100% | 16 ms | 2 / 50% |
| - C | demo | 1 / 10% | 1 / 1.00 | 4.00 ms / 100% | 4 ms | 1 / 100% |
How to read this table:
Request distribution: All 10 requests pass through Span A, but only 4 reach Span B and 1 reaches Span C. The remaining requests skip these child spans due to conditional logic or exceptions.
Span frequency: Span A is called once per request (multiple = 1.00). Span B is called twice per request on average (8 spans / 4 requests = 2.00), indicating a loop or retry pattern.
Time distribution: Span A's self-time is 5 ms (25% of its 20 ms total duration), meaning 75% of the time is spent in child spans. Span B and Span C have 100% self-time because they have no children.
Exception distribution: Span B has 2 exceptions across 4 requests (50% exception ratio). Since each request calls Span B twice, a likely pattern is: 2 requests succeed entirely, while the other 2 each fail on the first call but succeed on the retry.
Hover over a blue span name to see a recommended trace ID. Click the trace ID to view its details.
Trace topology
The Full Link Topology tab displays the inter-application call topology for aggregated traces. Each node represents an application and shows three metrics: request count, error count, and response time.

Analyze slow and failed traces
Trace Explorer automatically analyzes slow and failed traces to surface the dimensions most correlated with performance issues. Slow traces may be concentrated on a specific host, or belong to a particular interface. You can filter by host, interface, or combine multiple filter conditions to locate problems. For example: serviceName="arms-demo" AND ip="192.168.1.1". This analysis also helps you identify slow interfaces for targeted optimization.
Slow trace analysis
ARMS selects the 1,000 longest traces and identifies five dimensions most strongly correlated with slow performance.

Slow trace details
ARMS selects the 1,000 longest traces above a configurable threshold and samples 1,000 traces below it. By comparing the two groups, it surfaces three characteristics most strongly correlated with high latency.
Set the threshold based on your performance requirements. For example, to analyze traces slower than 1 minute, set the threshold to 60000 milliseconds.

Failed trace analysis
ARMS randomly selects 1,000 failed traces and identifies five dimensions most strongly correlated with failures.

Failed trace details
ARMS compares failed traces against normal traces and surfaces three characteristics most strongly correlated with errors.

View trace details
Click a trace ID to open the trace details view. It consists of four sections:

Component tags
Tags at the top of the view group spans by call type, as defined by the attributes.component.name field. Each tag shows the call type name and its span count. Click a tag to show or hide spans of that type.
Trace timeline bar chart
A horizontal bar chart provides a visual overview of the entire trace:
Each bar represents a span. Only spans exceeding 1% of the total trace duration are shown.
Colors distinguish applications. For example, blue might represent
opentelemetry-demo-adservice.Black line segments within bars indicate self-time -- the span's processing time excluding child spans. If Span A takes 10 ms and its child Span B takes 8 ms, Span A's self-time is 2 ms.
The timeline axis shows the time range of the trace.
Span tree and navigation
The span tree displays each span as a row, showing parent-child relationships through indentation. A number before each parent span indicates how many child spans it contains.
Available controls:
Collapse/Expand: Click the
icon to collapse or expand a span's children.Focus: Click the
icon to isolate a span and its downstream calls.Defocus: Click the
icon to restore the full view.Filter: Enter a span name, application name, or attribute in the search box to filter the tree. The view shows the matching span and all ancestor spans up to the entry span. Clear the search box and click the Search icon to remove the filter.
Zoom: Click the
icon to zoom in and hide the bar chart. Click the
icon to restore the bar chart.
Span details panel
Select a span to view its details in the right panel:
Additional Information: Displays span attributes, resources, details, and events, grouped by type. For field definitions, see Trace Explorer parameters.
Metrics: Shows span-related metrics. For Java applications monitored by ARMS, this includes JVM and host metrics. For traces from open-source agents, RED Method metrics (rate, errors, duration) are displayed.

Logs: Business logs correlated with this trace. If a Simple Log Service (SLS) Logstore is configured for the application, click through to query logs by trace ID.
Exceptions: Exception details for the selected span, if any.
Event Config: Configure custom interaction events for one or more span attributes. Use these events to link trace data to related logs, metrics, or external systems. For setup instructions, see Configure a custom interaction event for a trace.
Custom development
Trace data is stored in Simple Log Service (SLS) with the following naming conventions:
| Parameter | Format | Example |
|---|---|---|
| Project name | proj-xtrace-<encode>-<region-id> | proj-xtrace-abc123-cn-hangzhou |
| Logstore name | logstore-tracing | logstore-tracing |
For data field definitions, see Trace Explorer parameters. For examples of building custom analysis on stored trace data, see Analyze trace data in real time by using Trace Explorer.
icon next to the Aggregation Dimension drop-down list.




