Technical Best Practices for Container Log Processing

By Bruce Wu

Background

Docker, Inc. (formerly known as dotCloud, Inc) released Docker as an open source project in 2013. Then, container products represented by Docker quickly became popular around the world due to multiple features, such as their good isolation performance, high portability, low resource consumption, and rapid startup. The following figure shows the search trends for Docker and OpenStack starting from 2013.

The container technology brings about many conveniences, such as application deployment and delivery. It also brings about many challenges for log processing, such as:

If you store logs inside containers, logs are gone when containers are removed. The lifecycle of containers is much shorter than that of virtual machines, because containers are frequently created and removed. Therefore, you need to find a way to store logs persistently.
After entering the container era, you have more objects to manage than virtual machines and physical machines. Troubleshooting problems by logging on to the target containers will complicate the problems and increase the cost.
The container technology allows you to implement microservices easier. It brings about more components when it decouples the system. You need a technology to help you comprehensively understand the running status of the system, quickly locate the problem, and accurately restore the context.

Log Processing

This article describes some common methods and best practices for container log processing by taking Docker as an example. Concluded by the Alibaba Cloud Log Service team through hard work in the log processing field for many years, these methods and practices are:

Real-time collection of container logs
Query analysis and visualization
Context analysis of logs
LiveTail - tail-f on the cloud

Real-time Collection of Container Logs

Container Log Types

To collect logs, you must first figure out where the logs are stored. This article shows you how to collect NGINX and Tomcat container logs.

Log files generated by NGINX are access.log and error.log. NGINX Dockerfile respectively redirect access.log and error.log to STDOUT and STDERR.

Tomcat generates multiple log files, including catalina.log, access.log, manager.log, and host-manager.log. tomcat Dockerfile does not redirect these logs to the standard output. Instead, they are stored inside the containers.

Most container logs are similar to NGINX and Tomcat container logs. We can categorize container logs into two types:

Container log types	Description
Standard output	Information output through STDOUT and STDERR, including text files redirected to the standard output.
Text log	Logs that are stored inside the containers and are not redirected to the standard output.

Standard Output

Logging Drivers

Standard output of containers is subject to unified processing of the logging driver. As shown in the following figure, different logging drivers write the standard output to different destinations.

The advantage of collecting the standard output of container logs is that it is easy to use.

# You can use the following command to configure a syslog logging driver for all containers at the level of docker daemon.
dockerd -–log-driver syslog –-log-opt syslog-address=udp://1.2.3.4:1111

# You can use the following command to configure a syslog logging driver for the current container
docker run -–log-driver syslog –-log-opt syslog-address=udp://1.2.3.4:1111 alpine echo hello world

Drawbacks

Using logging drivers other than json-file and journald will make the docker logs API unavailable. Assume that you use portainer to manage containers on the host and use a logging driver other than json-file and journald to collect container logs. You will not be able to view the standard output of container logs through the user interface.

Docker Logs API

For containers that use logging drivers by default, you can obtain the standard output of container logs by sending the docker logs command to the docker daemon. Log collection tools that use this method include logspout and sematext-agent-docker. You can use the following command to obtain the latest five container logs starting from 2018-01-01T15:00:00.

docker logs --since "2018-01-01T15:00:00" --tail 5 <container-id>

Drawbacks

If you apply this method when there are many logs, you will significantly strain the docker daemon. As a result, the docker daemon will not be able to promptly respond to commands for creating and removing containers.

json-file Files

Logging drivers write container logs in Json format by default to the host file at /var/lib/docker/containers/<container-id>/<container-id>-json.log. This allows you to obtain the standard output of container logs by directly collecting the host file.

We recommend that you collect container logs by collecting the json-file files. This does not cause the docker logs API to become unavailable or affect the docker daemon. In addition, many tools provide native support for collecting host files. Filebeat and Logtail are representatives of these tools.

Text Log

Mount the Host File Directory

The simplest way to collect text log files in containers is to mount the host file directory to the directory of container logs. You can do this by using the bind mounts or volumes method when you start the container, as shown in the following figure.

For access logs of a Tomcat container, use the command docker run -it -v /tmp/app/vol1:/usr/local/tomcat/logs tomcat to mount the host file directory /tmp/app/vol1 to the access log directory /usr/local/tomcat/logs of the container. This allows you to collect access logs of the Tomcat container by collecting logs under the host file directory /tmp/app/vol1.

Calculate the Mount Point of the Container Rootfs

Collecting container logs by mounting the host file directory is a little intrusive to the application, because this needs you to run the mounting command when you start the container. If the log collection process is transparent to users, that would be perfect. In fact, this can be achieved by calculating the mount point of the container rootfs.

Storage driver is a concept that is closely related to the mount point of the container rootfs. During actual use, you may choose the best fit storage driver based on various factors such as the Linux version, file system type, and container I/O. We can calculate the mount point of the container rootfs based on the type of the storage driver, and then collect logs inside containers. The following table lists the rootfs mount points of some storage drivers and the calculation method.

Logtail Solution

After comprehensively comparing various container log collection methods, and summarizing and sorting user feedback and appeals, the Log Service team develops an all-in-one solution for processing container logs.

Features

The Logtail solution has the following features:

Supports collecting host files and container logs on the host (including the standard output and log files).
Supports auto discovery of containers. That is, after you have configured the target logs to be collected, when a container that meets the conditions is created, target logs of this container will be collected automatically.
Supports specifying containers by using the Docker label and filtering environment variables. Supports whitelist and blacklist mechanisms.
Supports automatically tagging data. That is, adding data source identification information to the collected logs, such as container name, container IP address, and file path.
Supports collecting K8s container logs.

Core Competitiveness

Ensures the at-least-once semantics by using the checkpoint mechanism and deploying additional monitoring processes.
Logtail has withstood multiple double 11 and double 12 shopping festivals, and has been deployed on over one million clients within Alibaba Group. The stability and performance are guaranteed.

Collection of K8s Container Logs

Logtail is deeply integrated with the K8s ecosystem, and is able to conveniently collect K8s container logs. This is another features of Logtail.

Collection configuration management:

Supports collection configuration management through the Web console.
Supports collection configuration management through CustomResourceDefinition (CRD). This method can be easily integrated with the K8s deployment and publishing procedures.

Collection modes:

Supports collecting K8s container logs by using the DaemonSet mode. In this mode, every node runs a collection client Logtail. This mode is suitable for single-feature clusters.
Supports collecting K8s container logs by using the Sidecar mode. In this mode, every container of each node runs a collection client Logtail. This mode is suitable for large, hybrid, and PaaS clusters.

For more information about the Logtail solution, see https://www.alibabacloud.com/help/doc-detail/44259.htm

Query Analysis and Visualization

After collecting logs, you need to perform query analysis and visualization of these logs. We'll take Tomcat container logs as an example to describe the powerful query, analysis, and visualization features provided by Log Service.

Saved Search

When container logs are collected, log identification information such as container name, container IP, and target file directory is attached to these logs. This allows you to quickly locate the target container and files based on this information when you run queries. For more information about the query feature, see Query syntax.

Real-Time Analysis

The Real-time analysis feature of Log Service is compatible with the SQL syntax and offers more than 200 aggregate functions. If you know how to write SQL statements, you will be able to easily write analytic statements that meet your business needs. For example:

You can use the following statement to query for the top 10 request URIs with the highest number of access times.

* | SELECT request_uri, COUNT(*) as c GROUP by request_uri ORDER by c DESC LIMIT 10

2. You can use the following statement to query for the network traffic difference between the last 15 minutes and the last hour.

* | SELECT diff[1] AS c1, diff[2] AS c2, round(diff[1] * 100.0 / diff[2] - 100.0, 2) AS c3 FROM (select compare( flow, 3600) AS diff from (select sum(body_bytes_sent) as flow from log))

This statement calculates the data volume of different periods by using the year-on-year and period-over-period comparison functions.

Visualization

To visualize data, you can use multiple built-in charts of Log Service to visually display SQL computation result and combine multiple charts into a dashboard.

The following figure shows a dashboard for Tomcat access logs. You can see various information from this dashboard, such as the error requests ratio, data volume, and response codes over time. This dashboard presents the results of aggregated data of multiple Tomcat containers. You can also specify the container name by using the Dashboard Filter feature to view data of individual containers.

Context Analysis of Logs

Features such as query analysis and dashboards can help us view the global information and understand the overall operation status of the system. However, to locate specific problems, we usually need context information from logs.

Definition of Context

Context refers to clues to a problem, such as the information before and after an error in a log. The context involves two elements:

Minimum differentiation granularity: The minimum unit that is used to differentiate the context, such as the same thread or the same file. This minimum differentiation granularity is critical in locating problems, because it allows us to focus during investigation.
Order assurance: With the same minimum differentiation granularity, information must be presented in a strict order, even tens of thousands of operations are performed every second.

The following table shows the minimum differentiation granularity of different data sources.

Challenges of Context Query

On the background of centralized log storage, neither the log collection terminal nor the server can keep the original order of logs:

At the client level, multiple containers are running on the same host, and each container has multiple target log files to be collected. The log collection software must use multiple CPU cores to parse and pre-process logs. In addition, it must use the multithread concurrency or single thread asynchronous callback mode to address the slow I/O problem when sending network data. As a result, log data cannot arrive at the server according to the event generation sequence on the machine.
On the server, due to the horizontally expanded multi-node load balancing architecture, logs on the same client machine are distributed to multiple storage nodes. It is difficult to restore the original sequence based on logs that are distributed to multiple storage nodes.

Mechanism

Log Service effectively solves these challenges by adding some additional information to each log record, and by using the keyword query capability of the server. The following figure shows how Log Service addresses these problems.

When a log record is collected, Log Service automatically adds logging source information (the minimum granularity mentioned previously) to it as source_id. For containers, such information could be the container name and file path.
Log collection clients of Log Service usually upload multiple logs at one time as a log package. The client writes a monotonically increasing package_id to each of these log packages, and each log record of a log package has a package-based offset.
The server combines the source_id, package_id, and offset into a field, and creates an index for this field. This allows us to accurately locate a log entry based on the source_id, package_id, and offset, even when different types of logs are mixed and stored together on the server.

LiveTail - tail-f on the Cloud

Apart from viewing the log context, sometimes we also want to continuously monitor the container output.

Traditional Method

The following table shows how to monitor container logs in real time by using the traditional method.

Pain Points

Using traditional methods to monitor container logs has the following pain points:

Locating the target container among a large number of containers can be time consuming.
Different observation methods must be used to view different types of container logs, which increases the costs.
The display of key information query results is not simple and intuitive enough.

New Feature and Mechanism

To address these problems, Log Service offers the LiveTail feature. Comparing with the traditional method, LiveTail has the following advantages:

It allows you to quickly locate the target container based on a single log entry or by using the query analysis feature of Log Service.
It allows you to use a unified method to view different types of container logs without diving into the target container.
It supports keyword-based filtering.
It supports setting key columns.

LiveTail is able to quickly locate the target container and target files by using the context query mechanism as mentioned in the previous section. Then, the client regularly sends requests to the server to pull the latest data.

Community

Technical Best Practices for Container Log Processing

Background

Log Processing

Real-time Collection of Container Logs

Container Log Types

Standard Output

Logging Drivers

Drawbacks

Docker Logs API

Drawbacks

json-file Files

Text Log

Mount the Host File Directory

Calculate the Mount Point of the Container Rootfs

Logtail Solution

Features

Core Competitiveness

Collection of K8s Container Logs

Query Analysis and Visualization

Saved Search

Real-Time Analysis

Visualization

Context Analysis of Logs

Definition of Context

Challenges of Context Query

Mechanism

LiveTail - tail-f on the Cloud

Traditional Method

Pain Points

New Feature and Mechanism

References

Read previous post:

Read next post:

Alibaba Cloud Storage

You may also like

Comments

Alibaba Cloud Storage

Related Products

Big Data Consulting for Data Technology Solution

Big Data Consulting Services for Retail Solution

Simple Log Service

Data Lake Storage Solution