Comprehensive Analysis of Kubernetes Log Collection Principles

By David Zhang (Yuanyi)

Introduction

As the implementation standard in the container orchestration field, Kubernetes is applied in wider scenarios. As an important part of observability construction, logs can record detailed access requests and error information, which is helpful for problem location. All kinds of log data will be generated by applications on Kubernetes, Kubernetes components, and hosts. Log Service (SLS) supports the collection and analysis of these data. This article describes the basic principles of SLS for Kubernetes log collection.

Kubernetes Log Collection Modes

There are two modes in Kubernetes log collection: Sidecar and DaemonSet. Generally, DaemonSet is applied in small and medium-sized clusters. Sidecar is applied in ultra-large clusters (to provide services for multiple business parties. Each business party has clear custom log collection requirements, and the number of collection configurations will exceed 500).

In the DaemonSet mode, only one log agent is run on each node to collect all logs of this node. The DaemonSet mode has a smaller resource consumption, but the scalability and tenant isolation are limited, making it suitable for clusters with few functions or less business.
In the Sidecar mode, a log agent is deployed for each pod. This agent is only responsible for collecting logs of one business application. The Sidecar mode has a larger resource consumption, but it has strong flexibility and multi-tenant isolation, making it suitable for large Kubernetes clusters or clusters that serve multiple business parties as PaaS.

SLS Log Collection Principle

The log collection process consists of deploying the agent, configuring the agent, and the agent working according to the configuration. The log collection process of SLS is similar. SLS provides more deployment, configuration, and collection methods compared with the open-source collection software. There are two main deployment methods in Kubernetes: DaemonSet and Sidecar. The configuration methods include CRD, environment variable, console, and API. The collection methods include container file, container stdout, and standard file. The entire process includes:

Deploy Logtail: The deployment methods of DaemonSet and Sidecar are supported. (The first part describes the collection principles of these two methods.)
Create Collection Configurations: CRD, environment variable, console, and API configuration are supported. (The second part describes the advantages and disadvantages of these configuration methods and their application scenarios.)
Logtail Collects Data Based on the Configuration: Logtail obtains the collection configurations created in Step 2 and works based on them.

Note: DaemonSet and Sidecar modes all use machine groups with user-defined identifiers and apply to auto-scaling nodes and container scenarios by default. All collection configurations must be mounted to the machine groups to take effect.
Note: All the collection configurations of Logtail are obtained from the server. After Logtail is connected to the server, the collection configurations associated with the machine group are synchronized to the local. Then, the Logtail starts working.

DaemonSet Log Collection Principle

DaemonSet (see DaemonSet), Deployment, and StatefullSet are all advanced orchestration methods (controllers) for pods in Kubernetes. Both Deployment and StatefullSet define a replica factor and Kubernetes schedules based on the factor. There is no replica factor in DaemonSet mode. A Node is started on each node by default. It is generally used for O&M-related tasks (such as log collection, monitoring, and disk cleaning). Therefore, DaemonSet is recommended for the log collection of Logtail by default.

In DaemonSet mode, the Logtail installed by default is in the kube-system namespace. The DaemonSet is named logtail-ds, and the Logtail pod on each node is responsible for collecting data (including stdout and files) of all running pods on this node.

You can run the following command to view the status of the pod on logtail-ds: kubectl get pod -n kube-system | grep logtail-ds

Prerequisites

Logtail collects data from other pods or containers only if it can access the container runtime on the host and the data of other containers:

Access Container Runtime: The Logtail container mounts the sock of container runtime (Docker Engine/ContainerD) on the host to the container directory. Also, the Logtail container can access the Docker Engine/ContainerD of this node.
Access the Data of Other Containers: The Logtail container mounts the root directory ('/' directory) of the host to the /logtail_host of the container. Then, you can access the data of other containers through the /logtail_host directory (provided the file system of the Docker Engine is stored on the host in the form of a common file system, such as overlayfs, or the log directory of the container is mounted to the host through the hostPath or emptyDir volume).

Working Process

After these two prerequisites are met, Logtail loads the collection configuration from the SLS server and starts to work. For the log collection of containers, the working processes after Logtail starts are divided into two parts:

1. Discover containers whose logs will be collected. This mainly includes:

1) Obtain all containers and their configuration information from the container runtime (Docker Engine/ContainerD), such as the container name, ID, mount point, environment variables, and labels.

2) Locate the containers whose logs will be collected based on IncludeEnv, ExcludeEnv, IncludeLabel, and ExcludeLabel in the collection configuration. Then, the target to be collected can be located, preventing resource waste and the difficulty in data splitting caused by collecting all container logs. As shown in the following figure, configuration 1 only collects containers whose Env value is Pre. Configuration 2 only collects containers whose APP is APP1. Configuration 3 only collects containers whose ENV value is not Pre. These three configurations collect different data from different Logstores.

2. Collect the data of containers. This includes:

1) Determine the address of data collected, including the address of stdout and files. This information is in the configuration of the container. For example, the following figure identifies the LogPath of the stdout and the storage path of the container file. Pay attention to the following points:

Stdout: The stdout of the container must be saved as a file before it can be collected. You need to configure the LogDriver as json-file and local for DockerEngine and ContainerD. (Generally, the stdout will be saved to a file by default, so there is no need to worry in most cases.)
Container File: When the file system of the container file is overlay, all files in the container will be searched by UpperDir. However, the default configuration of ContainerD is devicemapper. Then, you must mount the log to HostPath or EmptyDir to find the corresponding path.

2) Collect data according to the corresponding address, but the stdout is an exception. Stdout files need to be parsed to get the users' actual stdout.

3) Parse the original logs according to the configured parsing rules. The parsing rules include performing the regular expression at the beginning of a text line, field extraction (such as regular expression, delimiter, JSON, and Anchor), filtering /discarding, and desensitization.

4) Upload data to SLS

Sidecar Log Collection Principle

Multiple containers can run on one pod in Kubernetes. These containers share a namespace. The core working container is called the main container, and the other containers are the Sidecar containers. The sidecar container plays an auxiliary role. Functions (such as synchronizing files, collecting/monitoring logs, and file cleaning) are realized through the shared volume. The same principle is used for the Sidecar log collection of Logtail. In Sidecar mode, the Logtail Sidecar containers run in addition to the main business container. The Logtail and the main container share the volumes of logs. The collection process is listed below:

The business container transfers logs to the shared volume (only in file format, stdout cannot be stored in the shared volume).
Logtail monitors logs through the shared volume. When log changes are detected, these new logs will be collected to SLS.

Please see Use the Log Service console to collect container text logs in Sidecar mode and Kubernetes File Collection Practices: Sidecar + hostPath Volumes for more information about the best practices on the Sidecar mode.

Collection Configuration Principle

Log collection agent Logtail of SLS supports CRD, environment variable, console, and API configuration. The section below lists our recommendations for different configuration scenarios:

Users with high requirements for automated deployment and O&M of CICD should adapt the CRD configuration method.
We recommend adapting the console configuration method in scenarios where the publication activities are fairly infrequent and log collection policies are not frequently changed.
Users with high development capabilities should use the API custom configuration method.
We do not recommend the environment variable method because of its poor functions.

The advantages and disadvantages of each configuration method are listed on the chart below:

	CRD Configuration	Environment Variable Configuration	Console Configuration	API Configuration
Applicable Scenario	CICD automated log collection configuration Complex log collection configuration and management	Low requirements for customized log collection Simple application with few types of logs and low complexity (We do not recommend this configuration method).	Manual management of log collection configurations	Advanced customization for requirements Users with high development capabilities
Configuration Difficulty	Moderate. Only need to understand the configuration format of SLS	Low. Only environment variables need to be configured	Low. Booting the program through the console, and the configuration is simple.	High. It is necessary to use the SDK of SLS and understand the Logtail configuration format of SLS.
Collection Configuration Customization	High. It supports all configuration parameters of SLS.	Low. Only the file address configuration is supported, and other resolution configurations are not supported.	High. It supports all configuration parameters of SLS.	High. It supports all configuration parameters of SLS.
O&M	Relatively high. It manages operation and maintenance through the CRD of Kubernetes.	Low. It only supports creating configurations and does not support modifying or deleting configurations.	Moderate. Manual management is required.	High. Users can customize the management mode for the service scenario based on the SLS interface.
Ability to Integrate with CICD	High. CRD is essentially an interface for Kubernetes, so it supports all Kubernetes CICD automation processes.	High. Environment variables are configured on pods, and seamless integration is supported.	Low. Manual processing is required.	High. Works for scenarios with self-developed CICD
Usage Notes	Use CRDs to collect container logs in DaemonSet mode Use CRDs to collect container text logs in Sidecar mode	Collect log data from containers by using Log Service	Use the Log Service console to collect container stdout and stderr in DaemonSet mode Use the Log Service console to collect container text logs in Sidecar mode	API operations relevant to Logtail configuration files, Logtail configurations, and Overview of Log Service SDKs

CRD (Operator) Configuration Method

Log Service adds a CustomResourceDefinition extension named AliyunLogConfig for Kubernetes. An alibaba-log-controller is developed to monitor the AliyunLogConfig events and automatically create Logtail collection configurations. When users create, delete, or update AliyunLogConfig resources, the alibaba-log-controller monitors the resource changes and creates, deletes, or updates the corresponding collection configurations in Log Service. Then, the association between the AliyunLogConfig of Kubernetes and the collection configuration in the log service is implemented.

CRD AliyunLogConfig Implementation

As shown in the preceding figure, Log Service adds a CustomResourceDefinition extension named AliyunLogConfig for Kubernetes. An alibaba-log-controller is developed to monitor AliyunLogConfig events.

When users create, delete, or update AliyunLogConfig resources, the alibaba-log-controller monitors the resource changes and creates, deletes, or updates the corresponding collection configurations in Log Service. Then, the association between the AliyunLogConfig of Kubernetes and the collection configuration in the log service is implemented.

Internal Implementation of alibaba-log-controller

The alibaba-log-controller consists of six modules. The functions and dependencies of each module are shown in the preceding figure:

EventListener: It monitors AliyunLogConfig CRD resources. This EventListener is a listener in a broad sense. Its main functions are listed below:
- List all AliyunLogConfig resources during initialization
- Register AliyunLogConfig to monitor changed events
- Scan full AliyunLogConfig resources regularly to prevent incidents from being missed or invalid
- Package the event and hand it over to EventHandler
EventHandler: It handles the corresponding events of Create, Update, and Delete. It serves as the core module of the alibaba-log-controller. Its main functions are listed below:
- First, check the corresponding checkpoint in the ConfigMapManager. If the event has been processed (the version numbers are the same, and the status code is 200), skip it directly.
- Pull the latest resource status from the server and check whether the version numbers are the same to prevent historical events from interfering with the processing result. If the versions are inconsistent, replace them with the server version.
- Preprocess events to meet the basic format requirements of LogSDK
- Invoke the LogSDKWrapper, create a Logstore on the Log Service console, and create, update, or delete the corresponding configurations
- Update the status of the corresponding AliyunLogConfig resources based on the preceding processing result.
ConfigMapManager: It depends on the ConfigMap mechanism of Kubernetes to implement checkpoint management of controllers, including:
- Maintain the mapping between checkpoints and ConfigMap
- Provide basic checkpoint interfaces of addition, deletion, modification, and query
LogSDKWrapper: It is the secondary encapsulation based on Alibaba Cloud LOG golang sdk. Its main functions are listed below:
- Initialize and create Log Service resources, including Project, MachineGroup, and Operation Logstore
- Convert the CRD resources to the corresponding Log Service resources, which is a one-to-many relationship
- Wrap the SDK interface to handle network exceptions, server exceptions, and permission exceptions automatically
- Be responsible for permission management, including automatically obtaining roles and updating sts tokens.
ScheduledSyner: It is a regular synchronization module in the background to prevent configuration changes and omission events during process/node failures. This ensures the eventual consistency of configuration management:
- Refresh all checkpoints and AliyunLogConfig periodically
- Check the mappings between the checkpoint and the AliyunLogConfig resources. If a configuration does not exist in the checkpoint, delete the corresponding resource in the AliyunLogConfig.
Monitor: In addition to outputting local running logs to stdout, the alibaba-log-controller collects logs directly to Log Service for remote troubleshooting. The types of collected logs are listed below:
- Kubernetes API internal exception logs
- alibaba-log-controller running logs
- alibaba-log-controller internal exception data (automatic aggregation)

Environment Variable Configuration Method

The environment variable configuration is relatively easy. When configuring Pod, users only need to add environment variables starting with the special field aliyun_logs to complete the configuration definition and data collection. This configuration is implemented by Logtail:

Logtail obtains the list of all containers from container runtime (Docker Engine/ContainerD).
For a running container, check whether there are environment variables, starting with the aliyun_logs.
For environment variables that start with aliyun_logs, map them to the Logtail collection configuration of SLS, and invoke the SLS interface to create the collection configuration.
Logtail obtains the collection configuration of the server and starts working.

Recommended Methods

The Kubernetes log collection solution can be implemented in various ways with different complexity and effects. Generally, we need to select the collection method and configuration method. Here, we recommend using the following methods:

Collection Methods
- DaemonSet: It is applied to clusters with few functions or few businesses. The number of collection configurations in the entire cluster is no more than 500. Otherwise, a large number of Logtail resources will be occupied.
- Sidecar: This is recommended for large Kubernetes clusters or clusters that serve multiple business parties as the PaaS platform. A typical standard is that the number of collection configurations of the entire clusters is more than 500.
- Mixed Method: We recommend DaemonSet mode for the stdout of containers, system logs, and some business logs. We recommend the Sidecar mode for pods that require high reliability in log collection.
Configuration Methods
- Users with high requirements for automated deployment and O&M of CICD should adapt the CRD configuration method.
- We recommend adapting the console configuration method in scenarios where the publication activities are fairly infrequent and log collection policies are not frequently changed.
- Users with high development capabilities should use the API custom configuration method.

Community

Comprehensive Analysis of Kubernetes Log Collection Principles

Introduction

Kubernetes Log Collection Modes

SLS Log Collection Principle

DaemonSet Log Collection Principle

Prerequisites

Working Process

Sidecar Log Collection Principle

Collection Configuration Principle

CRD (Operator) Configuration Method

CRD AliyunLogConfig Implementation

Environment Variable Configuration Method

Recommended Methods

References

Read previous post:

Read next post:

Alibaba Container Service

You may also like

Comments

Alibaba Container Service

Related Products

Simple Log Service

Container Service for Kubernetes

Storage Capacity Unit

Log Management for AIOps Solution