All Products
Search
Document Center

Concepts

Last Updated: Sep 18, 2018

Why distributed tracing system?

To handle the complexity of business, developers are starting to adopt agile development, continuous integration, and more modern approach. The system architecture is also phasing from large-scale software on single-machine into micro-service architecture. Micro-services are built on different datasets, while these software modules can be developed by different teams, implemented with different languages, or published to more than one servers. Therefore, if one service is in trouble, dozens of applications can be jeopardized.

The distributed tracing system can log all information in the requested scope, for example the execution process and time of a remote invocation of a method. It’s an important tool for us to troubleshoot the system and measure the system performance.

What is a trace?

Generally, a trace is the execution process of a transaction or process in the (distributed) system. According to OpenTracing standard, a trace is a Directed Acyclic Graph (DAG) that consists of multiple spans, where each span is a continuously executed fragment that gets named and timed in the trace.

Here is an example of distributed invocation: After being initialized by the Client, a request first reaches the load balancer, then calls the authentication service and billing service before requesting resources, and finally returns the result.

After the data is collected and stored, the distributed tracing system usually presents the trace with sequence diagrams containing a timeline.

OpenTracing data model

The general concept

A trace that complies with OpenTracing standard is defined implicitly with spans that belong to this trace. A trace can be deemed as a Directed Acyclic Graph (DAG) that consists of multiple spans. The relationship between spans is called References. For example, here is a trace that consists of eight spans.

  1. The cause and effect relationship between spans within a single trace
  2. [Span A] ←←←(The root span)
  3. |
  4. +------+------+
  5. | |
  6. [Span B] [Span C] ←←←(Span C is the child of Span A, ChildOf)
  7. | |
  8. [Span D] +---+-------+
  9. | |
  10. [Span E] [Span F] >>> [Span G] >>> [Span H]
  11. (Span G is called after Span F, FollowsFrom)

In some cases, it’s better to present a trace with a sequence diagram based on timeline as follows.

  1. The time relationship between spans within a single trace
  2. ––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time
  3. [Span A···················································]
  4. [Span B··············································]
  5. [Span D··········································]
  6. [Span C········································]
  7. [Span E·······] [Span F··] [Span G··] [Span H··]

Traces

Tracer interface is for creating Span (startSpan function), parse the context (Extract function), and pass-through the context (Inject function). It can be used for:

  • Creating a new Span or setting the Span‘s properties

    1. /** Create and start a span, return a span. Span contains operation name and setting options.
    2. ** For example:
    3. ** Create a Span without a parentSpan:
    4. ** sp : = tracer.StartSpan("GetFeed")
    5. ** Create a Span with a parentSpan
    6. ** sp : = tracer.StartSpan("GetFeed",opentracing.ChildOf(parentSpan.Context()))
    7. **/
    8. StartSpan(operationName string, opts ...StartSpanOption) Span

    Each span contains the following objects:

    • Operation name: The operation name (also known as Span name).
    • Start timestamp: The start Time.
    • Finish timestamp: The end Time.
    • Span tag: A Span tag set which is made up of a group of key-value pairs. In the key-value pairs, the key must be a String, while the value can be String, Boolean, or numbers.
    • Span log: A set of span logs. Each log operation contains a key-value pair and a timestamp. In the key-value pairs, the key must be a String, while the value can be of any type.
    • SpanContext: pan is the context object. Each SpanContext contains the following statuses:
      • To implement any OpenTracing, you always need a unique span to transfer the trace status across process borders (for example: The ID of Trace and Span).
      • Baggage Items are the accompanying data of a trace and a set of key-value pairs. They reside in the trace, and also need to be transferred across process borders.
    • References (Inter-span relationship): Zero or more related spans (This kind of relationship between spans is built with SpanContext).
  • Data pass-through

    Data pass-through is done in two steps:

    1. Parse the SpanContext from Carrier.

      1. // Inject() takes the `sm` SpanContext instance and injects it for
      2. // propagation within `carrier`. The actual type of `carrier` depends on
      3. // the value of `format`.
      4. /** Parse the SpanContext (including traceId, spanId, baggage) from Carrier according to format parameters.
      5. ** For example:
      6. ** carrier : = opentracing.HTTPHeadersCarrier(httpReq.Header)
      7. ** clientContext, err : = tracer.Extract(opentracing.HTTPHeaders, carrier)
      8. **/
      9. Extract(format interface{}, carrier interface{}) (SpanContext, error)
    2. Inject SpanContext into Carrier.

      1. /**
      2. ** Inject the SpanContext (including traceId, spanId, baggage) from Carrier according to format parameters.
      3. ** e.g
      4. ** carrier : = opentracing.HTTPHeadersCarrier(httpReq.Header)
      5. ** err : = tracer.Inject(span.Context(), opentracing.HTTPHeaders, carrier)
      6. **/
      7. Inject(sm SpanContext, format interface{}, carrier interface{}) error

How data is reported?

The workflow of reporting data directly without Agent is as shown in the following figure.

The workflow of reporting data through Agent is as shown in the following figure.