×
Community Blog An Interpretation of OpenTelemetry Log Specification

An Interpretation of OpenTelemetry Log Specification

This article introduces the OpenTelemetry Log specification and the knowledge and experience related to development and O&M.

Preface

As the current standard solution for observability, OpenTelemetry has enjoyed rapid development in recent years. OpenTelemetry Trace v1.0 released was recently, and Metrics v1.0 will be released in a few months. After relatively slow development and joint efforts by many companies, the first version of the Log specification was released six months ago. After several updates, it is also galloping towards v1.0.

This article mainly introduces the OpenTelemetry Log specification, which comes from many great companies, such as Google, Microsoft, AMS, DataDog, and members of many excellent projects, including Splunk, ES, and Fluentd. It also covers a lot of knowledge and experience related to development and O&M, which deserves our attention.

Original Aspiration and Purpose

The officially proposed purposes are listed below:

  • The log model can express logs of various sources, including applications, machine events, system logs, and container logs.
  • Most existing log formats can be mapped to log models. In turn, log models can be converted into various log formats easily.
  • The log data model and semantic definition can guide the log system in recording, transmitting, storing, and understanding logs.

From the top-level purpose of OpenTelemetry, standardization is the most important in the definition of the log model to unify the common schemas of Metrics, Tracing, and Logging, so the three can be seamlessly interconnected. Surely, for the sake of as much universality as possible, the log model will be defined while referring to a large number of log formats, together with the information expression being flexible as much as possible.

Log Architecture Evolution

1

In the traditional architecture, Logs, Traces, and Metrics are generated and collected separately. With OpenTelemetry, all data will be collected by OpenTelemetry Collector and transmitted to a unified backend for association. The benefits are listed below:

  1. The application can implement observability merely through an SDK, with fewer dependencies and lower resource consumption.
  2. Only one collector is required, with lower deployment and O&M costs.
  3. The data format is unified, with easier data association.

The preceding figure shows the ultimate goal. However, it is estimated that other log collectors are still required in the next one or two years since OpenTelemetry Collector does not provide enough solid support for Log currently.

Features

  1. It should support the conversion of any type of logs into LogModel. Logs with the same meaning but different formats should be completely equivalent after conversion into LogModel.
  2. It is meaningful to map other log types to LogModel. LogModel must be able to express the semantics of other log types.
  3. The conversion of logs from type A into LogModel and then into type B should be consistent with the direct conversion from type A to type B. No data should be lost or added during the conversion.
  4. For LogModel, the data transmission and storage need to be as efficient as possible, with the CPU and memory consumption as low as possible. Serialization and deserialization should be as efficient as possible, with the storage occupancy as low as possible.

In terms of expressiveness, LogModel must be able to express at least the following three types of logs or events:

  1. System Logs: Logs generated by operating systems and hardware, such as Syslog
  2. Third-Party Application Logs: Log formats of some popular third-party software, such as Apache logs and MySQL slow logs
  3. Application Logs: Logs generated by business applications. These logs are generally printed by programmers that can also modify source codes when necessary to adapt the logs to the new LogModel.

Field Types

LogModel only defines the logical expression of record, regardless of the specific physical formats and coding forms. Each record has two field types:

  1. Top-level field with specific types and meanings
  2. Specific fields, usually in KeyValue pairs, varying in types according to different Top-level names

LogModel Definition

2

Detailed Explanations of Fields

Timestamp

uint64, nanosecond

TraceId

A byte array – For detailed information, please see: W3C Trace Context

SpanId

A byte array – If there is a SpanId, there must be a TraceId.

TraceFlags

A single byte – For detailed information, please see: W3C Trace Context

SeverityText

The readable description of the log level – If unset, it is mapped according to the default mapping rule of SeverityNumber.

SeverityNumber

SeverityNumber is similar to the log level in Syslog. OpenTelemetry defines 24 log levels in 6 categories, covering the definition of all types of log levels.

3

SeverityNumber and SeverityText can be mapped automatically. Therefore, SeverityText can be left unfilled when a log is generated to reduce the serialization, deserialization, and transmission costs. The mapping relations are listed below:

4

ShortName

Identify a log type with a specific word, usually no more than 50 bytes, for example, ProcessStarted.

Body

Its log content is of anyType, which can be int, string, bool, float, an array, or Map.

Resource

Key/Value pair list – Please see OpenTelemetry general Resource definition. Information, such as host name, process number, and service name is included, which can be used to associate with Metrics and Tracing.

Attributes

Key/Value pair list. Key is always a string, and Value is of anyType. For detailed information, please see Definitions of Attributes in Tracing.

LogModel Example

Example 1:

{
  "Timestamp": 1586960586000, // JSON needs to make a decision about
                              // how to represent nanoseconds.
  "Attributes": {
    "http.status_code": 500,
    "http.url": "http://example.com",
    "my.custom.application.tag": "hello",
  },
  "Resource": {
    "service.name": "donut_shop",
    "service.version": "semver:2.0.0",
    "k8s.pod.uid": "1138528c-c36e-11e9-a1a7-42010a800198",
  },
  "TraceId": "f4dbb3edd765f620", // this is a byte sequence
                                 // (hex-encoded in JSON)
  "SpanId": "43222c2d51a7abe3",
  "SeverityText": "INFO",
  "SeverityNumber": 9,
  "Body": "20200415T072306-0700 INFO I like donuts"
}

Example 2:

{
  "Timestamp": 1586960586000,
  ...
  "Body": {
    "i": "am",
    "an": "event",
    "of": {
      "some": "complexity"
    }
  }
}

Example 3:

{
   "Timestamp": 1586960586000,
   "Attributes":{
      "http.scheme":"https",
      "http.host":"donut.mycie.com",
      "http.target":"/order",
      "http.method":"post",
      "http.status_code":500,
      "http.flavor":"1.1",
      "http.user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36",
   }
}

Summary

According to the specifications above, the OpenTelemetry Log model mainly seeks the following points:

  1. Log must have enough expressiveness as the most detailed one of the three observability data, the other being Metrics and Traces.
  2. Log in the Trace scenario needs to be packed with TraceID and SpanID to be associated with Traces. At the same time, it can be associated with Traces and Metrics better through Resource.
  3. The performance of Log needs to be as high as possible. As logs are the observability data with the largest data volume and with no sampling, it must be ensured that the impact of Log on application performance should be kept as small as possible.
  4. Excellent compatibility is required. Since there are so many log systems existing for a long time with clear meanings, it is necessary to convert these logs seamlessly into the OpenTelemetry Log.

Overall, this model overall is very suitable for modern IT systems. However, much work is needed to implement the model successfully, including log collection, parsing, transmission, environments, and compatibility with many other existing systems. Fortunately, Fluentd is also part of the CNCF project. It may become the log collection kernel of OpenTelemetry in the future working in coordination with the Collector.

References

  1. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/overview.md
  2. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md
  3. https://www.w3.org/TR/trace-context/#trace-id
  4. https://docs.datadoghq.com/tracing/connect_logs_and_traces/java?tab=log4j2
  5. https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/overview.md#resources
  6. https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/trace/semantic_conventions
0 0 0
Share on

DavidZhang

6 posts | 0 followers

You may also like

Comments

DavidZhang

6 posts | 0 followers

Related Products