All Products
Search
Document Center

Application Real-Time Monitoring Service:Trace Explorer

Last Updated:Mar 11, 2026

When distributed requests slow down or fail, you need a way to pinpoint which service, span, or attribute is responsible. Trace Explorer in Application Real-Time Monitoring Service (ARMS) lets you filter, aggregate, and analyze stored trace data in real time to diagnose latency bottlenecks, error patterns, and inter-service dependencies across your entire call chain.

Typical troubleshooting workflow

A typical investigation follows this path:

  1. Open Trace Explorer and set a time range.

  2. Filter by status, duration, service, or span name to narrow results.

  3. Review the trace list and HTTP status distribution for error spikes or latency outliers.

  4. Drill into a trace to inspect span-level timing, attributes, and exceptions.

  5. Aggregate span metrics across up to 5,000 traces to identify systemic patterns.

Filter and search traces

  1. Log on to the ARMS console.

  2. In the left-side navigation pane, choose Application Monitoring > Trace Explorer. Select a region in the top navigation bar.

  3. Select a time range in the upper-right corner of the page.

  4. Filter traces using any of the following methods:

    • Quick Filter: Select from predefined fields -- status, duration, application name, span name, or host address. Selected conditions appear in the search bar.

    • Drop-down filter panel: Click the search bar to open the panel. Modify existing conditions or add new ones.

    • Query statement: Type a query directly in the search bar. For syntax details, see Usage methods of Trace Explorer.

To save the current filter conditions as a view, click the Save icon next to the Aggregation Dimension drop-down list.
To load a saved view, click Saved View and select one from the list.
To group queried data, select an aggregation dimension.

Trace list

After filters are applied, Trace Explorer displays three visualizations:

  • HTTP status bar chart -- Distribution of HTTP response codes across traces.

  • Duration time series -- Span duration over time, revealing latency trends.

  • Span and trace lists -- Individual spans and traces with key metadata.

Trace list overview

HTTP status code colors

The bar chart color-codes responses by HTTP status class:

Status classColor
2XXGreen
3XXYellow
4XXOrange
5XXRed

Status codes are derived from attributes.http.status_code or attributes.http.response.status_code. When both an HTTP status code and a span status exist, the HTTP status code takes precedence.

Span status indicators

Color bars on the left side of the TraceId column indicate span status:

Color barStatus codeMeaning
statusCode=0statusCode=0Unset
statusCode=1statusCode=1OK
statusCode=2statusCode=2Error

The Status column uses icons to indicate span state:

IconStateCondition
NormalNormalstatusCode=0 (unset) or statusCode=1 (OK)
ErrorErrorstatusCode=2
ExceptionExceptionattributes.excep.ids contains a value

Trace list actions

  • Click a trace ID or Details in the Actions column to view trace details and topology.

  • Click Logs in the Actions column to view trace-correlated logs.

  • Click the Expand icon to expand all spans under a trace ID. By default, only root spans are displayed per trace.

  • Click the Settings icon in the upper-right corner to customize which columns appear in the list.

  • Hover over a span and click the Filter icon to add that value as a filter condition.

Trace list actions

Scatter plot

The Scatter plot tab plots each trace by time (X-axis) and duration (Y-axis), making latency outliers easy to spot visually. Hover over a point for basic trace information, or click it to open trace details.

Scatter plot

Trace aggregation

Trace aggregation queries up to 5,000 distributed traces, retrieves their spans by trace ID, and aggregates the results -- preserving trace integrity throughout. This reveals systemic patterns that individual span analysis cannot surface.

Queries with multiple conditions may take time to compute. Wait for the results to load completely.
Trace aggregation

Aggregation metrics

MetricDescription
spanNameName of the span.
serviceNameApplication associated with the span.
Request count / request ratioNumber of requests that call this span, as both a count and a percentage of total requests. Formula: request count / total requests x 100%.
Span count / request multipleAverage times each request invokes this span. Formula: span count / request count. A value of 2.0 means each request calls this span twice on average.
Average self-time / proportionTime the span spends in its own logic, excluding child spans. Formula: total span time - time in all child spans. For asynchronous calls, self-time equals total span time.
Average durationAverage total duration of the span.
Exception count / exception ratioNumber of requests with exceptions in this span. Formula: requests with exceptions / total requests. The exception count differs from the total number of exceptions -- if the request multiple exceeds 1, a single request may produce multiple exceptions.

Aggregation example

Consider a trace where Span A calls Span B and Span C:

spanNameserviceNameRequest count / ratioSpan count / multipleAvg self-time / proportionAvg durationException count / ratio
Ademo10 / 100%10 / 1.005.00 ms / 25%20 ms2 / 20%
- Bdemo4 / 40%8 / 2.0016.00 ms / 100%16 ms2 / 50%
- Cdemo1 / 10%1 / 1.004.00 ms / 100%4 ms1 / 100%

How to read this table:

  • Request distribution: All 10 requests pass through Span A, but only 4 reach Span B and 1 reaches Span C. The remaining requests skip these child spans due to conditional logic or exceptions.

  • Span frequency: Span A is called once per request (multiple = 1.00). Span B is called twice per request on average (8 spans / 4 requests = 2.00), indicating a loop or retry pattern.

  • Time distribution: Span A's self-time is 5 ms (25% of its 20 ms total duration), meaning 75% of the time is spent in child spans. Span B and Span C have 100% self-time because they have no children.

  • Exception distribution: Span B has 2 exceptions across 4 requests (50% exception ratio). Since each request calls Span B twice, a likely pattern is: 2 requests succeed entirely, while the other 2 each fail on the first call but succeed on the retry.

Hover over a blue span name to see a recommended trace ID. Click the trace ID to view its details.

Trace topology

The Full Link Topology tab displays the inter-application call topology for aggregated traces. Each node represents an application and shows three metrics: request count, error count, and response time.

Trace topology

Analyze slow and failed traces

Trace Explorer automatically analyzes slow and failed traces to surface the dimensions most correlated with performance issues. Slow traces may be concentrated on a specific host, or belong to a particular interface. You can filter by host, interface, or combine multiple filter conditions to locate problems. For example: serviceName="arms-demo" AND ip="192.168.1.1". This analysis also helps you identify slow interfaces for targeted optimization.

Slow trace analysis

ARMS selects the 1,000 longest traces and identifies five dimensions most strongly correlated with slow performance.

image

Slow trace details

ARMS selects the 1,000 longest traces above a configurable threshold and samples 1,000 traces below it. By comparing the two groups, it surfaces three characteristics most strongly correlated with high latency.

Set the threshold based on your performance requirements. For example, to analyze traces slower than 1 minute, set the threshold to 60000 milliseconds.
image

Failed trace analysis

ARMS randomly selects 1,000 failed traces and identifies five dimensions most strongly correlated with failures.

image

Failed trace details

ARMS compares failed traces against normal traces and surfaces three characteristics most strongly correlated with errors.

image

View trace details

Click a trace ID to open the trace details view. It consists of four sections:

Trace details

Component tags

Tags at the top of the view group spans by call type, as defined by the attributes.component.name field. Each tag shows the call type name and its span count. Click a tag to show or hide spans of that type.

Trace timeline bar chart

A horizontal bar chart provides a visual overview of the entire trace:

  • Each bar represents a span. Only spans exceeding 1% of the total trace duration are shown.

  • Colors distinguish applications. For example, blue might represent opentelemetry-demo-adservice.

  • Black line segments within bars indicate self-time -- the span's processing time excluding child spans. If Span A takes 10 ms and its child Span B takes 8 ms, Span A's self-time is 2 ms.

  • The timeline axis shows the time range of the trace.

Span tree and navigation

The span tree displays each span as a row, showing parent-child relationships through indentation. A number before each parent span indicates how many child spans it contains.

Available controls:

  • Collapse/Expand: Click the Collapse icon to collapse or expand a span's children.

  • Focus: Click the Focus icon to isolate a span and its downstream calls.

  • Defocus: Click the Defocus icon to restore the full view.

  • Filter: Enter a span name, application name, or attribute in the search box to filter the tree. The view shows the matching span and all ancestor spans up to the entry span. Clear the search box and click the Search icon to remove the filter.

  • Zoom: Click the Zoom in icon to zoom in and hide the bar chart. Click the Zoom out icon to restore the bar chart.

Span details panel

Select a span to view its details in the right panel:

  • Additional Information: Displays span attributes, resources, details, and events, grouped by type. For field definitions, see Trace Explorer parameters.

  • Metrics: Shows span-related metrics. For Java applications monitored by ARMS, this includes JVM and host metrics. For traces from open-source agents, RED Method metrics (rate, errors, duration) are displayed.

    Span metrics

  • Logs: Business logs correlated with this trace. If a Simple Log Service (SLS) Logstore is configured for the application, click through to query logs by trace ID.

  • Exceptions: Exception details for the selected span, if any.

  • Event Config: Configure custom interaction events for one or more span attributes. Use these events to link trace data to related logs, metrics, or external systems. For setup instructions, see Configure a custom interaction event for a trace.

Custom development

Trace data is stored in Simple Log Service (SLS) with the following naming conventions:

ParameterFormatExample
Project nameproj-xtrace-<encode>-<region-id>proj-xtrace-abc123-cn-hangzhou
Logstore namelogstore-tracinglogstore-tracing

For data field definitions, see Trace Explorer parameters. For examples of building custom analysis on stored trace data, see Analyze trace data in real time by using Trace Explorer.

References