This guide explains how to use Alibaba Cloud Simple Log Service (SLS) and its collector, LoongCollector, to efficiently collect, process, and analyze logs from your Kubernetes containers. It covers core principles, key workflows, selection guidance, and best practices, serving as a foundation for specific operational guides.
Features
Simple Log Service provides the following core capabilities for Kubernetes container log collection:
-
Multi-source log support
-
Supports various log types, including standard output (stdout), standard error (stderr), and text file logs from containers.
-
-
Granular container filtering
-
Include or exclude containers based on namespace name, pod name, container name, container labels, or environment variables.
-
-
Advanced log processing
-
Collect multi-line logs: Combine log entries that span multiple lines, such as Java exception stack traces, into a single log event to prevent incorrect splitting.
-
Pre-process logs: Use a filter plugin to drop invalid data at the source. Use log desensitization or field extraction plugins to mask or structure sensitive data in raw logs.
-
Parse fields into a structured format: Use regular expression, JSON, or separator-based parsing plugins to structure raw logs before storage.
-
-
Intelligent metadata association
-
Automatically associate metadata with container logs, such as container name, image, pod, namespace, and environment variables.
-
-
Reliability
-
A checkpoint mechanism records the current collection position to ensure log integrity.
-
Handles logs when a container stops by providing different strategies for various container runtimes.
-
Limitations
-
container runtime: Only Docker and Containerd are supported.
Docker:
-
Requires permissions to access docker.sock.
-
For standard output collection, only the
json-filelogging driver is supported. -
Only the
overlayandoverlay2storage drivers are supported. For other storage drivers, you must mount the log directory by using a volume.
Containerd:
-
Requires permissions to access containerd.sock.
-
-
Multi-line log limit:
To prevent a multi-line log from being split due to output delays, the last collected line is buffered for a short period. The default buffer time is 3 seconds. You can adjust this by using the
BeginLineTimeoutMsparameter. The value must be at least 1,000 milliseconds to avoid incorrect parsing. -
Standard output:
The default maximum size for a single log entry is 524,288 bytes (512 KB), and the upper limit is 8,388,608 bytes (8 MB). If a single log entry exceeds 524,288 bytes, you can increase this limit by setting the
max_read_buffer_sizeenvironment variable for the LoongCollector container.ImportantWe recommend that you do not enable standard output and standard error collection simultaneously, as this may cause log entries to be interleaved incorrectly.
Collection workflow overview
-
Log on to the cluster and prepare the log source: Prepare standard output logs or text file logs for collection.
-
Install the LoongCollector collector: Use LoongCollector to collect and transmit logs to Simple Log Service.
-
Configure collection rules and parsing plugins: Define the rules for log collection.
-
Query and analyze logs: Query the collected logs to monitor your services.
Key processes
Log source and mount point requirements
-
For standard output logs, LoongCollector automatically discovers the log file path based on container metadata.
-
For container text file logs, LoongCollector mounts the host root directory to its own
/logtail_hostdirectory by default, so manual mounting is typically not required. If you need to use a custom mount point, ensure that it meets the following requirements:
Install the collector
Select a deployment mode based on your use case:
Deployment mode: Simple Log Service supports installing LoongCollector by using the DaemonSet or sidecar mode.
-
DaemonSet deployment mode: After a one-time configuration, a LoongCollector agent is automatically deployed on each node in the cluster. This mode is suitable for most scenarios.
-
In DaemonSet mode, the appropriate deployment method depends on the relationship between your cluster and Simple Log Service.
-
For Container Service for Kubernetes (ACK) clusters, the loongcollector-ds component is pre-integrated. You only need to enable the component in the ACK console to complete the installation. By default, this method binds the collector to the Alibaba Cloud account that owns the ACK cluster, and logs are stored in the Simple Log Service Project of that account. For more information, see Installation and configuration.
-
If you use an ACK cluster but need to collect its logs into a Simple Log Service Project under a different Alibaba Cloud account, for example, due to organizational structure, permission isolation, or centralized monitoring, you must manually install the LoongCollector component. Configure it with the Alibaba Cloud account ID or AccessKey of the destination account to connect to the destination account. For more information, see Installation and configuration.
-
If you use a self-managed cluster, you must manually install the LoongCollector component and configure it with the Alibaba Cloud account ID or AccessKey of the destination account to connect to the destination account. For more information, see Installation and configuration.
Installing LoongCollector is a prerequisite for log collection. For a complete walkthrough that includes installation steps, see Collect container logs from a Kubernetes cluster by using a CRD (standard output/file).
-
-
-
Sidecar deployment mode: A LoongCollector sidecar container is injected into each application pod. This mode is more complex to deploy and maintain. Use this mode for serverless container log collection, for nodes where the pod data volume significantly exceeds the collection capacity of a DaemonSet, or for log collection in Kubernetes environments that use secure container runtimes. For more information, see Collect Kubernetes pod text logs (sidecar mode).
Collection rules
Simple Log Service provides the following two methods for defining collection rules:
|
Configuration method |
Features |
Use cases |
Notes |
|
Recommended for production clusters and environments that use CI/CD automation. |
|
|
|
Suitable for small-scale clusters, temporary debugging, or non-production environments because configurations must be associated individually in large-scale clusters. |
Key concepts
-
Kubernetes: Kubernetes (K8s) is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It is the core infrastructure for modern cloud-native application development and operations.
-
Standard output, standard error, and text file logs: Standard output (stdout) is information printed during normal program execution, such as business logs and operational records. By default, it is directed to the terminal and captured by the container engine. Standard error (stderr) is information about errors or warnings, such as exception stack traces or startup failures. It is also captured by the container engine and can be mixed with stdout. A text file log is written to a file by an application, such as an Nginx
access.logor a custom log file. These logs are written directly to the container's file system and are deleted when the container is destroyed, but they can be persisted by using a volume. -
checkpoint mechanism: A checkpoint records the exact position in a file from which Simple Log Service has collected logs. By default, the checkpoint is saved in
/tmp/logtail_checkpoint. This ensures reliable log collection after disruptions, such as a LoongCollector restart or node failure. -
LoongCollector (Logtail): LoongCollector is a high-performance log collector developed by Alibaba Cloud. It supports both DaemonSet and sidecar deployment modes in Kubernetes. LoongCollector is the successor to Logtail and is fully backward-compatible.
-
Kubernetes CRD: A CRD (CustomResourceDefinition) is a Kubernetes mechanism that allows you to define custom resources and create instances for configuration. The custom resource type provided by Simple Log Service is AliyunPipelineConfig.
-
collection configuration: A collection configuration defines rules for the type of logs to collect, the collection path, log filtering, log content parsing, and the storage location in Simple Log Service. For more information, see What is a collection configuration?.
-
parsing plugin: A parsing plugin is a processing unit used within the processing plugin configuration of a collection configuration. It is used for structuring, splitting, filtering, or desensitizing log content. Simple Log Service supports multiple processing modes, including regular expression, separator, JSON, and multi-line.
How it works
-
A user creates a Custom Resource (CR) by using
kubectlto define a collection rule. -
The
loongcollector-operatorcontinuously watches for changes to CRs in the cluster. -
When a change is detected, the Operator converts the CR into a LoongCollector configuration and applies it to Simple Log Service.
-
The LoongCollector agent periodically sends heartbeats to Simple Log Service to fetch configuration updates, pulls the latest collection configuration, and applies it dynamically.
-
The
loongcollector-dsagent collects logs according to the new configuration and sends them to SLS through the configured endpoint.
DaemonSet mode
A LoongCollector agent is deployed on each node in the cluster to collect logs from all containers on that node. This mode features simple operations, low resource consumption, and flexible configuration, but it offers weaker isolation between tenants.
Sidecar mode
In each pod, a LoongCollector sidecar container is injected alongside the application container. The application container's log directory is mounted as a shared volume by using a Kubernetes volume mechanism, such as emptyDir, hostPath, or a PVC. This makes the log files accessible to both the application container and the sidecar container, allowing LoongCollector to read them directly. This mode provides strong multi-tenant isolation and high performance but consumes more resources and is more complex to configure and maintain.
Container discovery
To collect logs from other containers, the LoongCollector container must first identify which containers are running on the node. This process is called container discovery.
-
During container discovery, the LoongCollector container communicates directly with the container runtime daemon on the node instead of the Kubernetes cluster's kube-apiserver. This allows it to get information about all containers on the current node without putting additional load on the cluster's kube-apiserver.
-
LoongCollector accesses the socket of the container runtime (Docker or Containerd) on the host to obtain container context. It supports including or excluding containers from log collection based on criteria such as namespace name, pod name, pod labels, and container environment variables.
Standard output collection
LoongCollector automatically identifies the correct API or logging driver for different container runtimes, such as Docker and Containerd, based on container metadata. No manual configuration is required. It reads the standard output stream of all containers directly, without accessing the file systems inside the containers.
When collecting standard output from a container, LoongCollector periodically saves its collection progress to a checkpoint file. If LoongCollector stops and restarts, it resumes collection from the last saved position.
Container text file log collection
-
Because Kubernetes container file systems are isolated, a collector cannot directly access files in other containers. However, because container file systems reside on the host's file system, LoongCollector can mount the host's root file system. This allows it to access any file on the host, thereby indirectly accessing files in the application container's file system.
-
By default, LoongCollector mounts the host's root file system to its own
/logtail_hostdirectory. No manual mounting is typically required. For example, if a log file's path inside a container is/log/app.log, and its mapped path on the host is/var/lib/docker/containers/<container-id>/log/app.log, the actual collection path used by LoongCollector is/logtail_host/var/lib/docker/containers/<container-id>/log/app.log.
Multi-line log parsing
LoongCollector uses a user-defined regular expression to match the beginning of a log line.
-
Successful match: The line is treated as the start of a new log entry.
-
Failed match: The line is appended to the current log entry.
When another line matches the start-of-line regular expression, this completes the current log entry and begins a new one.
Log processing when a container stops
|
Runtime |
Destruction latency risk |
Log integrity |
Optimization |
|
Docker |
When a container is stopped, LoongCollector immediately releases the container's file handle, allowing the container to exit normally. |
If collection is delayed before the container stops, for example, due to network latency or high resource usage, some logs generated just before the stop may be lost. |
Increase the log sending frequency (decrease |
|
Containerd |
If collection is delayed, for example, due to network latency or high resource usage, the application container may not be destroyed promptly. |
When a container is stopped, LoongCollector continues to hold the file handles for the container, keeping the log files open until all content has been sent. |
Configure |
Container metadata retrieval
LoongCollector interacts directly with the container runtime through the standard CRI (Container Runtime Interface) API to retrieve Kubernetes metadata. This enables non-intrusive, automatic tagging of Kubernetes metadata during collection. This direct interaction allows for real-time data retrieval.
-
Docker: LoongCollector uses the Docker Client to communicate with the Docker Daemon to retrieve container metadata directly. This allows for in-depth monitoring and management. The main APIs used include:
-
ContainerList: Retrieves a list of running containers, providing an overview of active containers on the node.
-
ContainerInspect: Provides detailed information for each container, including configuration and status.
-
Events: Listens for container events in real time, allowing for dynamic tracking of the container lifecycle.
When retrieving container metadata through the Docker Client, the following information is particularly important:
-
LogPath: The path on the host where the container's standard output log file is stored. This is essential for log collection and analysis.
-
GraphDriver.Data: The path of the container's rootfs on the host. This path is critical for understanding the container's file system storage and helps with troubleshooting and performance optimization.
-
-
Containerd: By using the CRI, LoongCollector supports
containerdand CRI-O environments. This allows it to collect metadata from various underlying runtimes, such asruncor Kata Containers, ensuring consistent data collection.-
The container metadata provided by the CRI includes the path of the container's standard output log file on the host but does not directly provide the container's rootfs path. The following methods are used to find this path:
-
File path search: LoongCollector searches the host's file system for the container's rootfs path by using its unique identifier (such as the container ID).
-
Direct interaction with containerd: To get more comprehensive information, LoongCollector can bypass the CRI and communicate directly with containerd at a lower level. This allows it to obtain the container's rootfs path and other important metadata that is not available through the CRI.
-
-
Best practices
Unified query across environments
To query and analyze logs from different environments, such as testing and production, in a unified manner, you can use one of the following three methods:
-
Store data from all environments in the same LogStore. We recommend adding tags to distinguish between environments by following the steps in Collect container logs from a cluster by using the console (standard output/file). You can then query and analyze all logs directly within that LogStore.
-
Collect data into different LogStores or even into Projects in different regions. When you need to perform a unified query, create a StoreView to query across multiple LogStores. This method does not incur additional storage costs, but it is read-only and does not support alerting. You can use a
tagfield to identify the source LogStore of each log entry. -
(Recommended) Collect data into different LogStores or Projects. When you need to perform a unified analysis, use data manipulation to copy and forward selected data to a centralized LogStore. This method allows you to parse and process data before storage and supports alerting, but it is a paid feature.
Collecting logs from multiple sources
A single collection configuration can target only one source. To collect logs from multiple sources, you must create a separate collection configuration for each.
Granular collection and multi-tenant isolation
In a multi-tenant environment, you can use different collection configurations to send data to separate Projects for isolation. Data in different Projects is isolated and cannot be accessed from one another. You can configure different access permissions for each Project to meet security and isolation requirements.
Automated operations and CI/CD integration
By using the CRD method, you can incorporate collection configurations into your GitOps or Infrastructure as Code (IaC) workflows. This enables batch, automated, and traceable management of log collection.