This document explains how to use Alibaba Cloud Simple Log Service (SLS) and its collector, LoongCollector, to efficiently collect, process, and analyze Kubernetes container logs. It covers core principles, key processes, selection guidance, and best practices. It also provides links to more detailed operational documentation.
Features
SLS provides the following core capabilities for Kubernetes container log collection:
Multiple log sources
Log types: Standard output (stdout), standard error (stderr), and container text file logs.
Fine-grained container filtering
You can specify or exclude containers for collection by namespace, pod, or container name, container label, or environment variable.
Complex log processing
Collect multi-line logs: Recognize log entries that span multiple lines, such as Java exception stack traces, as a single log event. This prevents logs from being incorrectly split by line feeds.
Pre-process logs: Use plugins such as the filter plugin to filter invalid data at the collector. Use the log masking and field extraction plugins to prevent raw logs from being exposed.
Parse and structure fields: Use parsing plugins for regular expressions, JSON, or separators to parse raw logs before storage.
Intelligent metadata association
When reporting container logs, LoongCollector automatically associates metadata such as container name, image, pod, namespace, and environment variables.
Reliability assurance
The checkpoint mechanism records the current collection position to ensure log integrity.
SLS provides different strategies for handling logs when a container stops based on the container runtime.
Limits
Container runtime: Only Docker and Containerd are supported.
Docker:
Requires access permissions for docker.sock.
Standard output collection supports only the JSON logging driver.
Only the overlay and overlay2 storage drivers are supported. For other storage drivers, you must mount the log directory as a volume.
Containerd:
Requires access permissions for containerd.sock.
Multi-line log limits:
To prevent a multi-line log entry from being split by output latency, the last collected log line is cached for a period. The default cache time is 3 seconds. You can change this time using the
BeginLineTimeoutMsparameter. The value must be at least 1000 milliseconds to prevent incorrect splitting.Maximum size for a single standard output log:
Default: 524288 bytes (512 KB). Maximum: 8388608 bytes (8 MB). If a single log entry exceeds 524288 bytes, you can change the limit by adding the
max_read_buffer_sizeenvironment variable to the LoongCollector container.
Collection process overview
Log on to the cluster and prepare log sources: Prepare standard output logs or text file logs for collection.
Install the LoongCollector collector: Install LoongCollector, which is used by SLS to collect and transmit logs.
Configure collection rules and parsing plugins: Define the rules for log collection.
Query and analyze logs: Query the collected logs to analyze the status of your business.
Key process descriptions
Log source and mount target requirements (Important)
For standard output logs, LoongCollector automatically identifies the file path based on container metadata.
For container text file logs, LoongCollector mounts the host root directory to its own
/logtail_hostdirectory by default, so manual mounting is not required. If you use a custom mount target, it must meet the following requirements:
Install the collector
Select a deployment mode based on your scenario:
Deployment mode: SLS supports installing LoongCollector in DaemonSet or Sidecar mode.
DaemonSet deployment mode: Configure once to automatically deploy a LoongCollector on each node in the cluster. This is the recommended mode for most scenarios.
If you use DaemonSet mode, select a deployment method based on the relationship between your cluster and SLS.
If you use an ACK cluster, the loongcollector-ds component is already integrated. To complete the installation, enable the component in the ACK console. By default, this method binds to the Alibaba Cloud account that owns the ACK cluster, and the logs are stored in the SLS instance of that account. For more information, see LoongCollector Installation and Configuration (Kubernetes).
If you use an ACK cluster but need to collect its logs into an SLS project that belongs to a different Alibaba Cloud account for reasons such as organizational structure, permission isolation, or unified monitoring, you must manually install the LoongCollector component. Then, configure the component with the destination account's ID or access credential (AccessKey) to establish the association. For more information, see LoongCollector Installation and Configuration (Kubernetes).
If you use a self-managed cluster, you must manually install the LoongCollector component. Then, configure the component with the destination account's ID or access credential (AccessKey) to establish the association. For more information, see LoongCollector Installation and Configuration (Kubernetes).
Installing LoongCollector is a prerequisite for log collection. For the complete collection process, including LoongCollector installation, see Collect container logs from a Kubernetes cluster using a CRD (standard output/file).
Sidecar deployment mode: A LoongCollector Sidecar container is injected into each pod alongside the application container. This mode involves more complex deployment and O&M. Use this mode for serverless container log collection, when the data volume of pods on a single node exceeds the DaemonSet collection limit, or for log collection from Kubernetes with secure container runtimes. For more information, see Collect text logs from a cluster (Sidecar).
Configure collection rules
SLS provides two ways to define collection configuration rules:
Configuration method | Features | Scenarios | Notes |
| Use the CRD mode for production clusters and scenarios that support CI/CD automation. |
| |
| Because configurations must be associated one by one, this method is suitable for small clusters, temporary debugging, and non-production environments. |
Core concepts
Kubernetes: Kubernetes (K8s) is an open-source container orchestration platform. It automates the deployment, scaling, and management of containerized applications. It is the core infrastructure for modern cloud-native application development and O&M.
Standard output, standard error, and text file logs: Standard output (stdout) is the information printed by a program during normal operation, such as business logs and operation records. It is output to the terminal by default and captured by the container engine for storage. Standard error (stderr) is the error or warning information from a program, such as exception stack traces and startup failure reasons. It is also captured by the container engine and can be mixed with stdout. Text file logs are logs that an application writes to a file, such as Nginx's
access.logor custom log files. These logs are written directly to the container's internal file system and are destroyed when the container is destroyed. You can use a volume to make them persistent.Checkpoint mechanism: A checkpoint records the specific position in a file up to which SLS has collected logs. By default, the checkpoint is saved in
/tmp/logtail_checkpoint. This mechanism ensures the reliability of log collection if LoongCollector restarts or a node goes down.LoongCollector (Logtail): A high-performance log collector developed by Alibaba Cloud. It supports DaemonSet and Sidecar deployment modes in Kubernetes. LoongCollector is an upgraded version of Logtail and is compatible with all Logtail features.
Kubernetes CRD: A CustomResourceDefinition (CRD) is a Kubernetes mechanism that allows users to define custom resources and create instances for configuration. The custom resource type provided by SLS is AliyunPipelineConfig.
Collection configuration: Defines the rules for log collection, such as the log type, collection path, filtering of valid logs, parsing of log content, and storage location in SLS. For more information, see What is a LoongCollector collection configuration?.
Parsing plugin: Used in the processor plugin configuration of a collection configuration. SLS provides many processing units to structure, split, filter, and mask log content. It supports various processing modes, such as regular expression, separator, JSON, and multi-line.
How log collection works
A user creates a custom resource (CR) using kubectl to define collection rules.
The loongcollector-operator continuously listens for changes to CRs in the cluster.
When a CR change is detected, the operator converts it into a specific configuration and submits it to SLS.
LoongCollector periodically sends heartbeats to SLS to retrieve configuration updates. It pulls the latest collection configuration and hot-reloads it.
loongcollector-ds collects logs based on the latest configuration and sends them to SLS through the configured endpoint.
How DaemonSet mode works
A LoongCollector is deployed on each node of the cluster to collect logs from all containers on that node. This mode features simple O&M, low resource consumption, and flexible configuration. However, it provides weak isolation.
How Sidecar mode works
A LoongCollector Sidecar container is injected into each pod alongside the application container. The log directory of the application container is mounted as a shared volume using a Kubernetes volume mechanism, such as emptyDir, hostPath, or a persistent volume (PV). This way, the log files appear in the mount paths of both the application container and the Sidecar container, which allows LoongCollector to read them directly. This mode features good multi-tenant data isolation and performance. However, it consumes more resources and is more complex to configure and maintain.
How container discovery works
For a LoongCollector container to collect logs from other containers, it must discover and identify which containers are running. This process is called container discovery.
During the container discovery phase, the LoongCollector container does not communicate with the Kubernetes cluster's kube-apiserver. Instead, it communicates directly with the container runtime daemon on the node to retrieve information about all containers on that node. This avoids putting pressure on the cluster's kube-apiserver.
LoongCollector can retrieve container context information by accessing the sock file of the container runtime, such as Docker Engine or Containerd, on the host. It supports specifying or excluding containers for log collection based on conditions such as namespace name, pod name, pod label, and container environment variables.
Standard output collection
LoongCollector automatically identifies the API or logging driver of different container runtimes, such as Docker and Containerd, based on container metadata. No manual configuration is required. It directly reads the standard output stream of all containers without accessing their internal file systems.
When collecting a container's standard output, LoongCollector periodically saves the collection point information to a checkpoint file. If LoongCollector stops and then restarts, it resumes collection from the last saved point.
Container text file log collection
Kubernetes isolates container file systems, so a collector cannot directly access files in other containers. However, container file systems are mounted from the host file system. You can mount the host's root file system to the LoongCollector container to access any file on the host. This lets you indirectly collect files from the application container's file system.
By default, LoongCollector mounts the file system of the host's root directory to its own
/logtail_hostdirectory. Manual mounting is not required. For example, if the path of a log file inside the container is/log/app.log, and its mapped path on the host is/var/lib/docker/containers/<container-id>/log/app.log, then the actual path that LoongCollector collects from is/logtail_host/var/lib/docker/containers/<container-id>/log/app.log.
How multi-line log recognition works
Each log line is matched against a custom regular expression that defines the start of a line.
If a match is found, the line is treated as the start of a new log entry, and a new log entry begins to be constructed.
If no match is found, the line is appended to the end of the current log entry.
When another line that matches the start-of-line regular expression is found, the construction of the current log entry is complete, and the construction of the next log entry begins.
Log handling when a container stops
Runtime | Container destruction latency risk | Log integrity | Optimization suggestion |
Docker | When a container is stopped, LoongCollector immediately releases the container's file handle, allowing the container to exit normally. | If collection is delayed before the container stops due to network latency or high resource usage, some logs from before the stop may be lost. | Increase the log sending frequency (decrease the value of |
Containerd | If collection is delayed due to network latency or high resource usage, the application container may not be destroyed promptly. | When a container is stopped, LoongCollector continues to hold the handle of the files inside the container (keeps the log files open) until all log file content has been sent. | Configure |
How container metadata is retrieved
To retrieve container metadata, LoongCollector interacts directly with the container runtime based on the standard Container Runtime Interface (CRI) API. This allows LoongCollector to retrieve various types of metadata in Kubernetes and non-intrusively implement the Kubernetes metadata AutoTagging feature during collection. This mechanism of direct interaction with the runtime enhances the real-time nature of data retrieval and improves the ability to manage container status.
Docker: LoongCollector uses the Docker Client to communicate with the Docker Daemon and directly retrieve container metadata. This method enables in-depth monitoring and management of containers. The main interfaces used are as follows:
ContainerList: Retrieves a list of currently running containers to quickly identify which containers are running on the current node.
ContainerInspect: Provides detailed information for each container, including key information such as configuration and status.
Events: Listens for container change events in real time to dynamically track the container lifecycle and promptly update the relevant processing logic.
When you retrieve container metadata through the Docker Client, the following information is important:
LogPath: This is the storage path of the container's standard output log file on the host. It facilitates log collection and analysis.
GraphDriver.Data: This provides the path of the container's rootfs on the host node. This path is key to understanding the storage method of the container's file system and helps with fault diagnosis and performance optimization.
Containerd: Through the CRI, LoongCollector fully supports various scenarios in containerd and cri-o runtime environments. It can efficiently collect and retrieve container metadata regardless of whether the underlying runtime is runc or Kata Containers. This ensures accurate and unified log data collection regardless of the environment in which the container is running, which helps you monitor and analyze log data in real time.
The container metadata provided by the CRI includes only the path of the container's standard output log file on the host node. The container's Rootfs path cannot be obtained directly. To solve this problem, you can use one of the following solutions:
File path search: Search the host's file system to locate the container's Rootfs path. This method involves traversing the file directories on the host and using the container's unique identifier, such as the container ID, for association and lookup. This allows for the retrieval of the container's file system. This dynamic search mechanism can overcome the problems caused by missing path information and provide support for subsequent log collection and monitoring.
Bypass the CRI and interact directly with containerd: Communicate directly with containerd at a low level to retrieve more comprehensive and accurate container information. This way, LoongCollector can bypass the limitations of the CRI to obtain the container's Rootfs path and other important metadata.
Best practices
Unified query and analysis for logs from multiple clusters or environments
For example, to uniformly query and analyze logs from clusters in different environments, such as testing and production, you can use one of the following three methods:
When collecting data, store it in the same Logstore. We recommend that you add tags to distinguish between environments using the method described in Collect container logs from a Kubernetes cluster using the console (standard output/file). When you need to perform a unified query, you can directly query and analyze the logs in that Logstore.
When collecting data, collect it into different Logstores or even projects in different regions. When you need to perform a unified query and analysis, you can create a StoreView virtual resource to associate multiple Logstores for querying. This method does not add extra storage costs, but you can only query, not modify, the data. It also does not support setting alerts for monitoring. When you use this method, you can use the tag field to determine which Logstore the log came from.
(Recommended) When collecting data, collect it into different Logstores or even projects in different regions. When you need to perform a unified query and analysis, you can use data transformation to copy the selected data and store it in a specified Logstore. This method lets you parse and process the selected data before you store it and supports setting alerts for monitoring. However, this feature incurs additional charges.
Collect logs from different sources with a single configuration
A single collection configuration does not currently support multiple sources. To collect logs from different sources, you must configure multiple collection configurations.
Fine-grained collection and multitenancy isolation
In a multitenancy scenario, you can configure different collection configurations to collect data into different projects for isolation. Data cannot be directly accessed between different projects. You can also configure different access permissions for different projects to meet security isolation requirements.
Automated O&M and CI/CD integration
You can use the CRD method to incorporate collection configurations into GitOps or Infrastructure as Code (IaC) workflows. This enables batch, automated, and traceable management of log collection.