What is Managed Service for OpenTelemetry? - Application Real-Time Monitoring Service

Managed Service for OpenTelemetry gives you end-to-end visibility into requests that cross dozens of microservices, so you can pinpoint the root cause of latency spikes and errors quickly.

Managed Service for OpenTelemetry is a component of Application Real-Time Monitoring Service (ARMS) that provides distributed tracing for microservice architectures. It collects trace data from your applications, aggregates it in real time, and generates trace details, performance metrics, and service topology maps to help you identify and resolve performance bottlenecks.

Core concepts

The following concepts are central to distributed tracing:

Trace: A record of a single request as it travels through multiple services. Each trace has a unique ID that ties together all the operations involved in fulfilling that request.
Span: A single operation within a trace. Each span captures the operation name, start time, duration, and the parent span that triggered it. A trace consists of multiple spans arranged in a parent-child hierarchy.
Topology: A visual map of how your services call each other, generated automatically from trace data.

Architecture

The following diagram illustrates the data collection and processing pipeline.

Data flow

Instrument your application Integrate the client SDK into your application to capture service call data. Managed Service for OpenTelemetry provides client SDKs for multiple programming languages and is compatible with open source tracing libraries such as Jaeger and Zipkin. The SDKs support the OpenTracing standard.

Process and visualize After the SDK reports data, the service aggregates and persists it in real time. Three types of monitoring data are generated to help you troubleshoot slow requests, identify failing services, and understand call patterns.

Data type	Description
Trace details	The full span-by-span breakdown of each request, used for root-cause analysis.
Performance overview	Latency, throughput, and error rate metrics across your services.
Real-time topology	A live map of service dependencies and call relationships.

Forward to downstream services Send trace data to other Alibaba Cloud services for further analysis:

Service	Use case
Simple Log Service	Correlate traces with application logs and set up alerting rules.
MaxCompute	Run large-scale offline analysis on historical trace data.

Capabilities

Goal	How it helps
Trace requests across services	Collects all spans from distributed microservices and assembles them into end-to-end traces for query and root-cause analysis.
Monitor application performance	Captures request-level data and analyzes service and resource performance in real time, surfacing latency, error rates, and throughput.
Map service dependencies	Automatically discovers how your microservices and related PaaS products call each other, and renders a real-time topology.
Integrate with open source libraries	Works with Jaeger, Zipkin, and other open source tracing libraries built on the OpenTracing standard.
Stream data to analysis platforms	Sends trace data to Simple Log Service for log correlation and alerting, and to MaxCompute for offline analysis.

Next steps

Get started by instrumenting your first application with the Managed Service for OpenTelemetry SDK.
Explore the trace query interface to search, filter, and analyze distributed traces.
Set up alerting rules in Simple Log Service to get notified when trace metrics exceed thresholds.