Guidelines for collecting container logs from Kubernetes clusters - Simple Log Service

This guide explains how to use Alibaba Cloud Simple Log Service (SLS) and its collector, LoongCollector, to efficiently collect, process, and analyze logs from your Kubernetes containers. It covers core principles, key workflows, selection guidance, and best practices, serving as a foundation for specific operational guides.

Features

Simple Log Service provides the following core capabilities for Kubernetes container log collection:

Multi-source log support
- Supports various log types, including standard output (stdout), standard error (stderr), and text file logs from containers.
Granular container filtering
- Include or exclude containers based on namespace name, pod name, container name, container labels, or environment variables.
Advanced log processing
- Collect multi-line logs: Combine log entries that span multiple lines, such as Java exception stack traces, into a single log event to prevent incorrect splitting.
- Pre-process logs: Use a filter plugin to drop invalid data at the source. Use log desensitization or field extraction plugins to mask or structure sensitive data in raw logs.
- Parse fields into a structured format: Use regular expression, JSON, or separator-based parsing plugins to structure raw logs before storage.
Intelligent metadata association
- Automatically associate metadata with container logs, such as container name, image, pod, namespace, and environment variables.
Reliability
- A checkpoint mechanism records the current collection position to ensure log integrity.
- Handles logs when a container stops by providing different strategies for various container runtimes.

Limitations

container runtime: Only Docker and Containerd are supported.

Docker:
- Requires permissions to access docker.sock.
- For standard output collection, only the json-file logging driver is supported.
- Only the overlay and overlay2 storage drivers are supported. For other storage drivers, you must mount the log directory by using a volume.
Containerd:
- Requires permissions to access containerd.sock.

Multi-line log limit:

To prevent a multi-line log from being split due to output delays, the last collected line is buffered for a short period. The default buffer time is 3 seconds. You can adjust this by using the BeginLineTimeoutMs parameter. The value must be at least 1,000 milliseconds to avoid incorrect parsing.
Standard output:

The default maximum size for a single log entry is 524,288 bytes (512 KB), and the upper limit is 8,388,608 bytes (8 MB). If a single log entry exceeds 524,288 bytes, you can increase this limit by setting the max_read_buffer_size environment variable for the LoongCollector container.

Important
We recommend that you do not enable standard output and standard error collection simultaneously, as this may cause log entries to be interleaved incorrectly.

Collection workflow overview

Log on to the cluster and prepare the log source: Prepare standard output logs or text file logs for collection.
Install the LoongCollector collector: Use LoongCollector to collect and transmit logs to Simple Log Service.
Configure collection rules and parsing plugins: Define the rules for log collection.
Query and analyze logs: Query the collected logs to monitor your services.

Key processes

Log source and mount point requirements

For standard output logs, LoongCollector automatically discovers the log file path based on container metadata.
For container text file logs, LoongCollector mounts the host root directory to its own /logtail_host directory by default, so manual mounting is typically not required. If you need to use a custom mount point, ensure that it meets the following requirements:
Custom mount point requirements
Log file path:
- Do not use symbolic links:
  - Incorrect configuration: /var/log -> /mnt/logs.
  - Correct configuration: Use the physical path /mnt/logs directly.
- Mount path matching rule: If your application container's data directory is mounted by using a volume, the collection path must be the same as or a subdirectory of the mount point.
```
1Mount point: /var/log/service
2✅ Valid collection path: /var/log/service or /var/log/service/subdir
3❌ Invalid collection path: /var/log (The path is not specific enough)
```

Install the collector

Select a deployment mode based on your use case:

Deployment mode: Simple Log Service supports installing LoongCollector by using the DaemonSet or sidecar mode.

DaemonSet deployment mode: After a one-time configuration, a LoongCollector agent is automatically deployed on each node in the cluster. This mode is suitable for most scenarios.
- In DaemonSet mode, the appropriate deployment method depends on the relationship between your cluster and Simple Log Service.
  - For Container Service for Kubernetes (ACK) clusters, the loongcollector-ds component is pre-integrated. You only need to enable the component in the ACK console to complete the installation. By default, this method binds the collector to the Alibaba Cloud account that owns the ACK cluster, and logs are stored in the Simple Log Service Project of that account. For more information, see Installation and configuration.
  - If you use an ACK cluster but need to collect its logs into a Simple Log Service Project under a different Alibaba Cloud account, for example, due to organizational structure, permission isolation, or centralized monitoring, you must manually install the LoongCollector component. Configure it with the Alibaba Cloud account ID or AccessKey of the destination account to connect to the destination account. For more information, see Installation and configuration.
  - If you use a self-managed cluster, you must manually install the LoongCollector component and configure it with the Alibaba Cloud account ID or AccessKey of the destination account to connect to the destination account. For more information, see Installation and configuration.
  Installing LoongCollector is a prerequisite for log collection. For a complete walkthrough that includes installation steps, see Collect container logs from a Kubernetes cluster by using a CRD (standard output/file).
Sidecar deployment mode: A LoongCollector sidecar container is injected into each application pod. This mode is more complex to deploy and maintain. Use this mode for serverless container log collection, for nodes where the pod data volume significantly exceeds the collection capacity of a DaemonSet, or for log collection in Kubernetes environments that use secure container runtimes. For more information, see Collect Kubernetes pod text logs (sidecar mode).

Collection rules

Simple Log Service provides the following two methods for defining collection rules:

Configuration method	Features	Use cases	Notes
Kubernetes CRD	Native Kubernetes Integration: Declare configurations by using a CRD (CustomResourceDefinition) for seamless integration with the Kubernetes API. Configuration as Code: Supports GitOps workflows and enables version control. Real-time Updates: The Operator automatically watches for changes and syncs them to LoongCollector in real time.	Recommended for production clusters and environments that use CI/CD automation.	You must use only one method to create and modify a single collection configuration. Using multiple methods for the same configuration can cause it to become invalid. If multiple collection configurations target the same file, you must enable the Allow multiple collections for a file option. Otherwise, which configuration takes effect is chosen at random. However, we recommend using data manipulation to process and store multiple copies of a log.
Simple Log Service console	Simple operation: Provides a graphical, no-code configuration experience. Quick verification: Ideal for rapid testing and validation. Centralized management: View all configurations in a unified console.	Suitable for small-scale clusters, temporary debugging, or non-production environments because configurations must be associated individually in large-scale clusters.

Key concepts

Kubernetes: Kubernetes (K8s) is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It is the core infrastructure for modern cloud-native application development and operations.
Standard output, standard error, and text file logs: Standard output (stdout) is information printed during normal program execution, such as business logs and operational records. By default, it is directed to the terminal and captured by the container engine. Standard error (stderr) is information about errors or warnings, such as exception stack traces or startup failures. It is also captured by the container engine and can be mixed with stdout. A text file log is written to a file by an application, such as an Nginx access.log or a custom log file. These logs are written directly to the container's file system and are deleted when the container is destroyed, but they can be persisted by using a volume.
checkpoint mechanism: A checkpoint records the exact position in a file from which Simple Log Service has collected logs. By default, the checkpoint is saved in /tmp/logtail_checkpoint. This ensures reliable log collection after disruptions, such as a LoongCollector restart or node failure.
LoongCollector (Logtail): LoongCollector is a high-performance log collector developed by Alibaba Cloud. It supports both DaemonSet and sidecar deployment modes in Kubernetes. LoongCollector is the successor to Logtail and is fully backward-compatible.
Kubernetes CRD: A CRD (CustomResourceDefinition) is a Kubernetes mechanism that allows you to define custom resources and create instances for configuration. The custom resource type provided by Simple Log Service is AliyunPipelineConfig.
collection configuration: A collection configuration defines rules for the type of logs to collect, the collection path, log filtering, log content parsing, and the storage location in Simple Log Service. For more information, see What is a collection configuration?.
parsing plugin: A parsing plugin is a processing unit used within the processing plugin configuration of a collection configuration. It is used for structuring, splitting, filtering, or desensitizing log content. Simple Log Service supports multiple processing modes, including regular expression, separator, JSON, and multi-line.

How it works

A user creates a Custom Resource (CR) by using kubectl to define a collection rule.
The loongcollector-operator continuously watches for changes to CRs in the cluster.
When a change is detected, the Operator converts the CR into a LoongCollector configuration and applies it to Simple Log Service.
The LoongCollector agent periodically sends heartbeats to Simple Log Service to fetch configuration updates, pulls the latest collection configuration, and applies it dynamically.
The loongcollector-ds agent collects logs according to the new configuration and sends them to SLS through the configured endpoint.

DaemonSet mode

A LoongCollector agent is deployed on each node in the cluster to collect logs from all containers on that node. This mode features simple operations, low resource consumption, and flexible configuration, but it offers weaker isolation between tenants.

How the DaemonSet mode works

In DaemonSet mode, the Kubernetes cluster ensures that only one LoongCollector container runs on each node to collect logs from all containers on that node.
When a new node joins the cluster, Kubernetes automatically creates a LoongCollector container on it. When a node leaves the cluster, Kubernetes automatically destroys the LoongCollector container on that node. This automatic scaling means you do not need to manually manage LoongCollector instances.

Sidecar mode

In each pod, a LoongCollector sidecar container is injected alongside the application container. The application container's log directory is mounted as a shared volume by using a Kubernetes volume mechanism, such as emptyDir, hostPath, or a PVC. This makes the log files accessible to both the application container and the sidecar container, allowing LoongCollector to read them directly. This mode provides strong multi-tenant isolation and high performance but consumes more resources and is more complex to configure and maintain.

How the sidecar mode works

In sidecar mode, each pod runs a dedicated LoongCollector container to collect logs from all other containers within that same pod. Log collection is isolated between different pods.
To collect log files from other containers in the same pod, you must use a shared volume. The same volume must be mounted to both the application container and the LoongCollector container.
If the data volume of a single pod on a node is exceptionally large and exceeds the collection capacity of the DaemonSet agent, the sidecar mode allows you to allocate specific resources to LoongCollector, improving its collection performance and stability.
Serverless container environments do not have a node concept, which makes the DaemonSet deployment mode inapplicable. In these cases, the sidecar mode integrates effectively with the serverless architecture, ensuring a flexible and adaptive log collection process.

Container discovery

To collect logs from other containers, the LoongCollector container must first identify which containers are running on the node. This process is called container discovery.

During container discovery, the LoongCollector container communicates directly with the container runtime daemon on the node instead of the Kubernetes cluster's kube-apiserver. This allows it to get information about all containers on the current node without putting additional load on the cluster's kube-apiserver.
LoongCollector accesses the socket of the container runtime (Docker or Containerd) on the host to obtain container context. It supports including or excluding containers from log collection based on criteria such as namespace name, pod name, pod labels, and container environment variables.

Standard output collection

LoongCollector automatically identifies the correct API or logging driver for different container runtimes, such as Docker and Containerd, based on container metadata. No manual configuration is required. It reads the standard output stream of all containers directly, without accessing the file systems inside the containers.

When collecting standard output from a container, LoongCollector periodically saves its collection progress to a checkpoint file. If LoongCollector stops and restarts, it resumes collection from the last saved position.

Container text file log collection

Because Kubernetes container file systems are isolated, a collector cannot directly access files in other containers. However, because container file systems reside on the host's file system, LoongCollector can mount the host's root file system. This allows it to access any file on the host, thereby indirectly accessing files in the application container's file system.
By default, LoongCollector mounts the host's root file system to its own /logtail_host directory. No manual mounting is typically required. For example, if a log file's path inside a container is /log/app.log, and its mapped path on the host is /var/lib/docker/containers/<container-id>/log/app.log, the actual collection path used by LoongCollector is /logtail_host/var/lib/docker/containers/<container-id>/log/app.log.

Multi-line log parsing

LoongCollector uses a user-defined regular expression to match the beginning of a log line.

Successful match: The line is treated as the start of a new log entry.
Failed match: The line is appended to the current log entry.

When another line matches the start-of-line regular expression, this completes the current log entry and begins a new one.

Log processing when a container stops

Runtime	Destruction latency risk	Log integrity	Optimization
Docker	When a container is stopped, LoongCollector immediately releases the container's file handle, allowing the container to exit normally.	If collection is delayed before the container stops, for example, due to network latency or high resource usage, some logs generated just before the stop may be lost.	Increase the log sending frequency (decrease `flush_interval`).
Containerd	If collection is delayed, for example, due to network latency or high resource usage, the application container may not be destroyed promptly.	When a container is stopped, LoongCollector continues to hold the file handles for the container, keeping the log files open until all content has been sent.	Configure `max_hold_buffer_size` to limit memory usage.

Container metadata retrieval

LoongCollector interacts directly with the container runtime through the standard CRI (Container Runtime Interface) API to retrieve Kubernetes metadata. This enables non-intrusive, automatic tagging of Kubernetes metadata during collection. This direct interaction allows for real-time data retrieval.

Docker: LoongCollector uses the Docker Client to communicate with the Docker Daemon to retrieve container metadata directly. This allows for in-depth monitoring and management. The main APIs used include:
- ContainerList: Retrieves a list of running containers, providing an overview of active containers on the node.
- ContainerInspect: Provides detailed information for each container, including configuration and status.
- Events: Listens for container events in real time, allowing for dynamic tracking of the container lifecycle.
When retrieving container metadata through the Docker Client, the following information is particularly important:
- LogPath: The path on the host where the container's standard output log file is stored. This is essential for log collection and analysis.
- GraphDriver.Data: The path of the container's rootfs on the host. This path is critical for understanding the container's file system storage and helps with troubleshooting and performance optimization.
Containerd: By using the CRI, LoongCollector supports containerd and CRI-O environments. This allows it to collect metadata from various underlying runtimes, such as runc or Kata Containers, ensuring consistent data collection.
- The container metadata provided by the CRI includes the path of the container's standard output log file on the host but does not directly provide the container's rootfs path. The following methods are used to find this path:
  - File path search: LoongCollector searches the host's file system for the container's rootfs path by using its unique identifier (such as the container ID).
  - Direct interaction with containerd: To get more comprehensive information, LoongCollector can bypass the CRI and communicate directly with containerd at a lower level. This allows it to obtain the container's rootfs path and other important metadata that is not available through the CRI.

Best practices

Unified query across environments

To query and analyze logs from different environments, such as testing and production, in a unified manner, you can use one of the following three methods:

Store data from all environments in the same LogStore. We recommend adding tags to distinguish between environments by following the steps in Collect container logs from a cluster by using the console (standard output/file). You can then query and analyze all logs directly within that LogStore.
Collect data into different LogStores or even into Projects in different regions. When you need to perform a unified query, create a StoreView to query across multiple LogStores. This method does not incur additional storage costs, but it is read-only and does not support alerting. You can use a tag field to identify the source LogStore of each log entry.
(Recommended) Collect data into different LogStores or Projects. When you need to perform a unified analysis, use data manipulation to copy and forward selected data to a centralized LogStore. This method allows you to parse and process data before storage and supports alerting, but it is a paid feature.

Collecting logs from multiple sources

A single collection configuration can target only one source. To collect logs from multiple sources, you must create a separate collection configuration for each.

Granular collection and multi-tenant isolation

In a multi-tenant environment, you can use different collection configurations to send data to separate Projects for isolation. Data in different Projects is isolated and cannot be accessed from one another. You can configure different access permissions for each Project to meet security and isolation requirements.

Automated operations and CI/CD integration

By using the CRD method, you can incorporate collection configurations into your GitOps or Infrastructure as Code (IaC) workflows. This enables batch, automated, and traceable management of log collection.