Collect cluster logs by using the Sidecar mode (Alibaba Cloud ACK clusters and self-managed Kubernetes clusters) - Simple Log Service

In a Kubernetes environment, the Sidecar mode is an ideal log collection solution for fine-grained management of application logs, multi-tenant data isolation, or ensuring log collection is strictly bound to the application lifecycle. This mode works by injecting a separate LoongCollector (Logtail) container into your application pod. This setup enables dedicated log collection for that pod and offers powerful flexibility and isolation.

How it works

In Sidecar mode, an application container and a LoongCollector (Logtail) log collection container run side-by-side within your application pod. They work together using shared volumes and lifecycle synchronization mechanisms.

Log sharing: The application container writes its log files to a shared volume, typically an emptyDir volume. The LoongCollector (Logtail) container mounts the same shared volume, which allows it to read and collect these log files in real time.
Configuration association: Each LoongCollector (Logtail) Sidecar container declares its identity by setting a unique custom identifier. In the Simple Log Service console, you can create a machine group that uses the same identifier. This way, all Sidecar instances with the same identifier automatically apply the collection configurations from that machine group.
Lifecycle synchronization: To prevent log loss when a pod terminates, the application container and the LoongCollector (Logtail) container communicate using signal files (cornerstone and tombstone) in a shared volume. This mechanism works in conjunction with the pod's graceful termination period (terminationGracePeriodSeconds) to ensure a graceful shutdown. The application container stops writing first, LoongCollector finishes sending all remaining logs, and then both containers exit together.

Preparations

Before you collect logs, you must create a project and a Logstore to manage and store the logs. If you already have these resources, skip this step and proceed to Step 1: Inject the LoongCollector Sidecar container.

Project: A resource management unit in Simple Log Service that is used to isolate and manage logs for different projects or services.
Logstore: A log storage unit that is used to store logs.

Create a project

Log on to the Simple Log Service console.
Click Create Project and configure the following parameters:
- Region: Select the region where your logs originate. This cannot be changed after the project is created.
- Project Name: The name must be globally unique within Alibaba Cloud. This cannot be changed after the project is created.
- You can leave the other settings as default and click Create. For more information about the other parameters, see Create a project.

Create a Logstore

Click the project name to go to the project details page.
In the navigation pane on the left, click Log Storage, and then click +.
On the Create Logstore page, configure the following parameters:
- Logstore Name: Enter a name that is unique within the project. This name cannot be changed after creation.
- Logstore Type: Select Standard or Query based on the feature comparison.
- Billing Mode:
  - Pay-by-feature: You are billed independently for resources such as storage, indexing, and read/write operations. This mode is suitable for small-scale scenarios or when your feature usage is uncertain.
  - Pay-by-ingested-data: You are billed only for the amount of raw data ingested. This mode provides 30 days of free storage and free features such as data transformation and delivery. It is suitable for business scenarios with a storage period of approximately 30 days or complex data processing pipelines.
- Data Retention Period: Specifies the number of days to retain logs. The value can be an integer from 1 to 3,650. A value of 3,650 indicates permanent retention. The default is 30 days.
- You can keep the default settings for the other parameters and click OK. For more information about the other parameters, see Manage a Logstore.

Step 1: Inject the LoongCollector Sidecar container

Inject a LoongCollector Sidecar container into the application pod and configure shared volumes to enable log collection. If you have not deployed the application or are just testing, you can use the Appendix: YAML example to quickly validate the process.

1. Modify the application pod YAML configuration

Define shared volumes

In spec.template.spec.volumes, add three shared volumes at the same level as containers:

volumes:
  # Shared log directory (written by the application container, read by the Sidecar)
  - name: ${shared_volume_name} # <-- The name must match the name in volumeMounts
    emptyDir: {}
  
  # Signal directory for inter-container communication (for graceful shutdown)
  - name: tasksite
    emptyDir:
      medium: Memory  # Use memory as the medium for better performance
      sizeLimit: "50Mi"
  
  # Shared host timezone configuration: Synchronizes the timezone for all containers in the pod
  - name: tz-config # <-- The name must match the name in volumeMounts
    hostPath:
      path: /usr/share/zoneinfo/Asia/Shanghai  # Modify the timezone as needed

Configure application container mounts

In the volumeMounts section of your application container, such as your-business-app-container, add the following mount items:

Ensure that the application container writes logs to the ${shared_volume_path} directory to enable log collection by LoongCollector.

volumeMounts:
  # Mount the shared log volume to the application log output directory
  - name: ${shared_volume_name}
    mountPath: ${shared_volume_path}  # Example: /var/log/app

  # Mount the communication directory
  - name: tasksite
    mountPath: /tasksite  # Shared directory for communication with the Loongcollector container

  # Mount the timezone file
  - name: tz-config
    mountPath: /etc/localtime
    readOnly: true

Inject the LoongCollector Sidecar container

In the spec.template.spec.containers array, append the following Sidecar container definition:

- name: loongcollector
  image: aliyun-observability-release-registry.cn-shenzhen.cr.aliyuncs.com/loongcollector/loongcollector:v3.1.1.0-20fa5eb-aliyun
  command: ["/bin/bash", "-c"]
  args:
    - |
      echo "[$(date)] LoongCollector: Starting initialization"
      
      # Start the LoongCollector service
      /etc/init.d/loongcollectord start
      
      # Wait for the configuration to download and the service to be ready
      sleep 15
      
      # Verify the service status
      if /etc/init.d/loongcollectord status; then
        echo "[$(date)] LoongCollector: Service started successfully"
        touch /tasksite/cornerstone
      else
        echo "[$(date)] LoongCollector: Failed to start service"
        exit 1
      fi
      
      # Wait for the application container to complete (via the tombstone file signal)
      echo "[$(date)] LoongCollector: Waiting for business container to complete"
      until [[ -f /tasksite/tombstone ]]; do
        sleep 2
      done
      
      # Allow time to upload remaining logs
      echo "[$(date)] LoongCollector: Business completed, waiting for log transmission"
      sleep 30
      
      # Stop the service
      echo "[$(date)] LoongCollector: Stopping service"
      /etc/init.d/loongcollectord stop
      echo "[$(date)] LoongCollector: Shutdown complete"
  # Health check
  livenessProbe:
    exec:
      command: ["/etc/init.d/loongcollectord", "status"]
    initialDelaySeconds: 30
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3
  # Resource configuration
  resources:
    requests:
      cpu: "100m"
      memory: "128Mi"
    limits:
      cpu: "2000m"
      memory: "2048Mi"
  # Environment variable configuration
  env:
    - name: ALIYUN_LOGTAIL_USER_ID
      value: "${your_aliyun_user_id}"
    - name: ALIYUN_LOGTAIL_USER_DEFINED_ID
      value: "${your_machine_group_user_defined_id}"
    - name: ALIYUN_LOGTAIL_CONFIG
      value: "/etc/ilogtail/conf/${your_region_config}/ilogtail_config.json"
    # Enable full drain mode to ensure all logs are sent before the pod terminates
    - name: enable_full_drain_mode
      value: "true"  
    # Append pod environment context as log tags
    - name: ALIYUN_LOG_ENV_TAGS
      value: "_pod_name_|_pod_ip_|_namespace_|_node_name_|_node_ip_"
    # Automatically inject pod and node metadata as log tags
    - name: "_pod_name_"
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: "_pod_ip_"
      valueFrom:
        fieldRef:
          fieldPath: status.podIP
    - name: "_namespace_"
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace
    - name: "_node_name_"
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
    - name: "_node_ip_"
      valueFrom:
        fieldRef:
          fieldPath: status.hostIP
  # Volume mounts (shared with the application container)
  volumeMounts:
    # Read-only mount for the application log directory
    - name: ${shared_volume_name} # <-- Shared log directory name
      mountPath: ${dir_containing_your_files} # <-- Path to the shared directory in the sidecar
      readOnly: true
    # Mount the communication directory
    - name: tasksite
      mountPath: /tasksite
    # Mount the timezone
    - name: tz-config
      mountPath: /etc/localtime
      readOnly: true

2. Modify the application container's lifecycle logic

Depending on the workload type, modify the application container to support a coordinated exit with the Sidecar:

Short-lived tasks (Job/CronJob)

# 1. Wait for LoongCollector to be ready
echo "[$(date)] Business: Waiting for LoongCollector to be ready..."
until [[ -f /tasksite/cornerstone ]]; do
  sleep 1
done
echo "[$(date)] Business: LoongCollector is ready, starting business logic"

# 2. Execute core business logic (ensure logs are written to the shared directory)
echo "Hello, World!" >> /app/logs/business.log

# 3. Save the exit code
retcode=$?
echo "[$(date)] Business: Task completed with exit code: $retcode"

# 4. Notify LoongCollector that the business task is complete
touch /tasksite/tombstone
echo "[$(date)] Business: Tombstone created, exiting"

exit $retcode

Long-lived services (Deployment / StatefulSet)

# Define the signal handler function
_term_handler() {
    echo "[$(date)] [nginx-demo] Caught SIGTERM, starting graceful shutdown..."

    # Send a QUIT signal to Nginx for a graceful stop
    if [ -n "$NGINX_PID" ]; then
        kill -QUIT "$NGINX_PID" 2>/dev/null || true
        echo "[$(date)] [nginx-demo] Sent SIGQUIT to Nginx PID: $NGINX_PID"

        # Wait for Nginx to stop gracefully
        wait "$NGINX_PID"
        EXIT_CODE=$?
        echo "[$(date)] [nginx-demo] Nginx stopped with exit code: $EXIT_CODE"
    fi

    # Notify LoongCollector that the application container has stopped
    echo "[$(date)] [nginx-demo] Writing tombstone file"
    touch /tasksite/tombstone

    exit $EXIT_CODE
}

# Register the signal handler
trap _term_handler SIGTERM SIGINT SIGQUIT

# Wait for LoongCollector to be ready
echo "[$(date)] [nginx-demo]: Waiting for LoongCollector to be ready..."
until [[ -f /tasksite/cornerstone ]]; do 
    sleep 1
done
echo "[$(date)] [nginx-demo]: LoongCollector is ready, starting business logic"

# Start Nginx
echo "[$(date)] [nginx-demo] Starting Nginx..."
nginx -g 'daemon off;' &
NGINX_PID=$!
echo "[$(date)] [nginx-demo] Nginx started with PID: $NGINX_PID"

# Wait for the Nginx process
wait $NGINX_PID
EXIT_CODE=$?

# Also notify LoongCollector if the exit was not caused by a signal
if [ ! -f /tasksite/tombstone ]; then
    echo "[$(date)] [nginx-demo] Unexpected exit, writing tombstone"
    touch /tasksite/tombstone
fi

exit $EXIT_CODE

3. Set the graceful termination period

In spec.template.spec, set a sufficient termination grace period to ensure LoongCollector has enough time to upload the remaining logs.

spec:
  # ... Your other existing spec configurations ...
  template:
    spec:
      terminationGracePeriodSeconds: 600  # 10-minute graceful shutdown period

4. Variable descriptions

Variable	Description
`${your_aliyun_user_id}`	Set this to the ID of your Alibaba Cloud account. For more information, see Configure user identifiers.
`${your_machine_group_user_defined_id}`	Set a custom ID for the machine group. This ID is used to create a custom machine group. Example: `nginx-log-sidecar`. Important Ensure that this ID is unique within the region of your project.
`${your_region_config}`	Specify the configuration based on the region of your Simple Log Service project and the network type used for access. For information about regions, see Service regions. Example: If your project is in the China (Hangzhou) region, use `cn-hangzhou` for internal network access or `cn-hangzhou-internet` for public network access.
`${shared_volume_name}`	Set a custom name for the volume. Important The `name` parameter under the `volumeMounts` node must be the same as the `name` parameter under the `volumes` node. This ensures that the LoongCollector container and the application container are mounted on the same volume.
`${dir_containing_your_files}`	Set the mount path. This is the directory in the container where the text logs to be collected are located.

5. Apply the configuration and verify the result

Run the following command to deploy the changes:
```
kubectl apply -f <YOUR-YAML>
```
Check the pod status to confirm that the LoongCollector container was injected successfully:
```
kubectl describe pod <YOUR-POD-NAME>
```
If the status of the two containers (the application container and loongcollector) is Normal, the injection is successful.

Step 2: Create a machine group with a custom ID

This step registers the LoongCollector Sidecar instances with Simple Log Service. This lets you centrally manage and deliver collection configurations.

Procedure

Create a machine group
1. In the target project, click Resources > Machine Groups in the navigation pane on the left.
2. On the Machine Groups page, click > Create Machine Group.
Configure the machine group
Configure the following parameters and click OK:
- Name: The name of the machine group. This cannot be changed after creation. The name must meet the following requirements:
  - The name can contain only lowercase letters, digits, hyphens (-), and underscores (_).
  - The name must start and end with a lowercase letter or a digit.
  - The length must be 2 to 128 characters.
- Machine Group Identifier: Select Custom Identifier.
- Custom Identifier: Enter the value of the ALIYUN_LOGTAIL_USER_DEFINED_ID environment variable that you set for the LoongCollector container in the YAML file in Step 1. The value must be an exact match. Otherwise, the association fails.
Check the machine group heartbeat status
After the machine group is created, click its name and check the heartbeat status in the machine group status area.
- OK: Indicates that LoongCollector has successfully connected to Simple Log Service and the machine group is registered.
- FAIL:
  - The configuration may not have taken effect. It takes about 2 minutes for the configuration to become effective. You can refresh the page and try again later.
  - If the status is still FAIL after 2 minutes, see Troubleshoot Logtail machine group issues to diagnose the problem.

Each pod corresponds to a separate LoongCollector instance. We recommend that you use different custom IDs for different services or environments to facilitate fine-grained management.

Step 3: Create a collection configuration

In this step, you will define which log files LoongCollector collects, how it parses the log structure, and how it filters content. Then, you will apply the configuration to the machine group.

Procedure

On the Logstores page, click before the name of the target Logstore to expand it.
Click the icon next to Data Ingestion. In the Quick Data Ingestion dialog box, find the Kubernetes - File card and click Ingest Now.
Configure the machine group, and then click Next:
- Scenario: Select Kubernetes Clusters.
- Deployment Method: Select Sidecar.
- Select machine group: In the Source Machine Group list, select the custom identifier-based machine group that you created in Step 2, and click to add it to the Applied Machine Group list.
On the Logtail Configuration page, configure the Logtail collection rule.

1. Global and input configurations

Define the name of the collection configuration, the log source, and the collection scope.

Global Configurations:

Configuration Name: A custom name for the collection configuration. This name must be unique within the project and cannot be changed after it is created. Naming conventions:
- Can contain only lowercase letters, digits, hyphens (-), and underscores (_).
- Must start and end with a lowercase letter or a digit.

Input Configuration:

Type: Text Log Collection.
Logtail Deployment Mode: Select Sidecar.
File Path Type:
- Path In Container: Collects log files from within the container.
- Host Path: Collects logs from local services on the host.
File Path: The path from which logs are collected.
- Linux: The path must start with a forward slash (/). For example, /data/mylogs/**/*.log specifies all files with the .log extension in the /data/mylogs directory.
- Windows: The path must start with a drive letter. For example, C:\Program Files\Intel\**\*.Log.
Maximum Directory Monitoring Depth: The maximum directory depth that the wildcard character ** in the File Path can match. The default value is 0, which means that only the current directory is monitored.

2. Log processing and structuring

Configure log processing rules to transform raw, unstructured logs into structured, searchable data. This improves the efficiency of log queries and analysis. We recommend that you first add a log sample:

In the Processor Configurations section of the Logtail Configuration page, click Add Sample Log and enter the log content to be collected. The system identifies the log format based on the sample and helps generate regular expressions and parsing rules, which simplifies the configuration.

Use case 1: Process multiline logs (such as Java stack logs)

Because logs such as Java exception stacks and JSON objects often span multiple lines, the default collection mode splits them into multiple incomplete records, which causes a loss of context. To prevent this, enable multiline mode and configure a Regex to Match First Line to merge consecutive lines of the same log into a single, complete log.

Example:

Raw log without any processing	In default collection mode, each line is a separate log, breaking the stack trace and losing context	With multiline mode enabled, a Regex to Match First Line identifies the complete log, preserving its full semantic structure.

Procedure: In the Processor Configurations section of the Logtail Configuration page, enable Multi-line Mode:

For Type, select Custom or Multi-line JSON.
- Custom: For raw logs with a variable format, configure a Regex to Match First Line to identify the starting line of each log.
  - Regex to Match First Line: Automatically generate or manually enter a regular expression that matches a complete line of data. For example, the regular expression for the preceding example is \[\d+-\d+-\w+:\d+:\d+,\d+]\s\[\w+]\s.*.
    - Automatic generation: Click Generate. Then, in the Log Sample text box, select the log content that you want to extract and click Automatically Generate.
    - Manual entry: Click Manually Enter Regular Expression. After you enter the expression, click Validate.
- Multi-line JSON: SLS automatically handles line breaks within a single raw log if the log is in standard JSON format.

Processing Method If Splitting Fails:
- Discard: Discards a text segment if it does not match the start-of-line rule.
- Retain Single Line: Retains unmatched text on separate lines.

Scenario 2: Structured logs

When raw logs are unstructured or semi-structured text, such as NGINX access logs or application output logs, direct querying and analysis are often inefficient. SLS provides various data parsing plugins that can automatically convert raw logs of different formats into structured data. This provides a solid data foundation for subsequent analysis, monitoring, and alerting.

Example:

Raw log

Structured log

192.168.*.* - - [15/Apr/2025:16:40:00 +0800] "GET /nginx-logo.png HTTP/1.1" 0.000 514 200 368 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.*.* Safari/537.36"

body_bytes_sent: 368
http_referer: -
http_user_agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.x.x Safari/537.36
remote_addr:192.168.*.*
remote_user: -
request_length: 514
request_method: GET
request_time: 0.000
request_uri: /nginx-logo.png
status: 200
time_local: 15/Apr/2025:16:40:00

Configuration steps: In the Processor Configurations section of the Logtail Configuration page:

Add a parsing plugin: To add a parsing plugin, click Add Processor and configure a plugin, such as a regular expression, separator, or JSON parsing plugin, based on the log format. For example, to collect NGINX logs, select Native Processor > Data Parsing (NGINX Mode).
NGINX Log Configuration: Copy the log_format definition from your Nginx server configuration file (nginx.conf) in its entirety and paste it into this text box.
Example:
```
log_format main  '$remote_addr - $remote_user [$time_local] "$request" ''$request_time $request_length ''$status $body_bytes_sent "$http_referer" ''"$http_user_agent"';
```
Important
The format definition here must be exactly the same as the format that generates the logs on the server. Otherwise, log parsing fails.
General configuration parameter descriptions: The following parameters appear in multiple data parsing plugins, and their functions and usage are consistent.
- Original Field: Specifies the source field to be parsed. The default is content, which is the entire collected log entry.
- Retain Original Field if Parsing Fails: We recommend that you enable this option. If a log cannot be parsed by the plugin, for example, due to a format mismatch, this option ensures that the raw log content is retained in the specified raw field.
- Retain Original Field if Parsing Succeeds: If selected, the raw log content is retained even if the log is parsed successfully.

3. Log filtering

Collecting a large volume of low-value or irrelevant logs, such as DEBUG or INFO level logs, wastes storage resources, increases costs, affects query efficiency, and poses data breach risks. You can implement fine-grained filtering policies for efficient and secure log collection.

Reduce costs by filtering content

Filter fields based on log content, such as collecting only logs where the level is WARNING or ERROR.

Example:

Raw log without any processing

Collect only WARNING or ERROR logs

{"level":"WARNING","timestamp":"2025-09-23T19:11:40+0800","cluster":"yilu-cluster-0728","message":"Disk space is running low","freeSpace":"15%"}
{"level":"ERROR","timestamp":"2025-09-23T19:11:42+0800","cluster":"yilu-cluster-0728","message":"Failed to connect to database","errorCode":5003}
{"level":"INFO","timestamp":"2025-09-23T19:11:47+0800","cluster":"yilu-cluster-0728","message":"User logged in successfully","userId":"user-123"}

{"level":"WARNING","timestamp":"2025-09-23T19:11:40+0800","cluster":"yilu-cluster-0728","message":"Disk space is running low","freeSpace":"15%"}
{"level":"ERROR","timestamp":"2025-09-23T19:11:42+0800","cluster":"yilu-cluster-0728","message":"Failed to connect to database","errorCode":5003}

Procedure: In the Processor Configurations section of the Logtail Configuration page

Click Add Processor and select Native Processor > Data Filtering:

Field Name: The log field to use for filtering.
Field Value: The regular expression used for filtering. Only full matches are supported, not partial keyword matches.

Control the collection scope with a blacklist

Use a blacklist to exclude specified directories or files, which prevents irrelevant or sensitive logs from being uploaded.

Procedure: In the Input Configurations > Other Input Configurations section of the Logtail Configuration page, enable Collection Blacklist and click Add.

Supports full and wildcard matching for directories and filenames. The only supported wildcard characters are the asterisk (*) and the question mark (?).

File Path Blacklist: Specifies the file paths to exclude. Examples:
- /home/admin/private*.log: Ignores all files in the /home/admin/ directory that start with private and end with .log.
- /home/admin/private*/*_inner.log: Ignores files that end with _inner.log within directories that start with private under the /home/admin/ directory.
File Blacklist: A list of filenames to ignore during collection. Example:
- app_inner.log: Ignores all files named app_inner.log during collection.
Directory Blacklist: Directory paths cannot end with a forward slash (/). Examples:
- /home/admin/dir1/: The directory blacklist will not take effect.
- /home/admin/dir*: Ignores files in all subdirectories that start with dir under the /home/admin/ directory during collection.
- /home/admin/*/dir: Ignores all files in subdirectories named dir at the second level of the /home/admin/ directory. For example, files in the /home/admin/a/dir directory are ignored, but files in the /home/admin/a/b/dir directory are collected.

Container filtering

You can set collection conditions based on container metadata, such as environment variables, pod labels, namespaces, and container names, to precisely control which containers' logs are collected.

Configuration steps: On the Logtail Configuration page, in the Input Configurations area, enable Container Filtering, and click Add.

Multiple conditions have an AND relationship. All regular expression matching is based on Go's RE2 regular expression engine, which has some limitations compared to engines such as PCRE. When you write regular expressions, follow the guidelines in Appendix: Regular expression limits (Container filtering).

Environment Variable Blacklist/Whitelist: Specify conditions for the environment variables of the containers to be collected.
K8s Pod Label Blacklist/Whitelist: Specify conditions for the labels of the pods where the containers to be collected are located.
K8s Pod Name Regex Match: Specify the containers to be collected by pod name.
K8s Namespace Regex Match: Specify the containers to be collected by namespace name.
K8s Container Name Regex Match: Specify the containers to be collected by container name.
Container Label Blacklist/Whitelist: Collects containers whose labels meet the specified conditions. This is used for Docker scenarios and is not recommended for Kubernetes scenarios.

4. Log classification

When multiple applications or instances share the same log format, it can be difficult to distinguish the log source. This lack of context reduces query and analysis efficiency. To solve this issue, you can configure topics and log tags to automate context association and logical classification.

Configure topics

When logs from multiple applications or instances have the same format but different paths, such as /apps/app-A/run.log and /apps/app-B/run.log, it is difficult to distinguish the log source. You can generate topics based on machine groups, custom names, or file path extraction to flexibly distinguish logs from different services or paths.

Procedure: Global Configurations > Other Global Configurations > Log Topic Type: Select a method for generating topics. The following three types are supported:

Machine Group Topic: When a collection configuration is applied to multiple machine groups, LoongCollector automatically uses the name of the server's machine group as the __topic__ field for upload. This is suitable for use cases where logs are divided by host.
Custom: Uses the format customized://<custom_topic_name>, such as customized://app-login. This format is suitable for static topic use cases with fixed business identifiers.

File Path Extraction: Extract key information from the full path of the log file to dynamically mark the log source. This is suitable for situations where multiple users or applications share the same log filename but have different paths. For example, when multiple users or services write logs to different top-level directories but the sub-paths and filenames are identical, the source cannot be distinguished by filename alone:

/data/logs
├── userA
│   └── serviceA
│       └── service.log
├── userB
│   └── serviceA
│       └── service.log
└── userC
    └── serviceA
        └── service.log

Configure File Path Extraction and use a regular expression to extract key information from the full path. The matched result is then uploaded to the logstore as the topic.

File path extraction rule: Based on regular expression capturing groups

When you configure a regular expression, the system automatically determines the output field format based on the number and naming of capturing groups. The rules are as follows:

In the regular expression for a file path, you must escape the forward slash (/).

Capturing group type	Use case	Generated field	Regex example	Matching path example	Generated field example
Single capturing group (only one `(.*?)`)	Only one dimension is needed to distinguish the source (such as username or environment)	Generates the `__topic__` field	`\/logs\/(.*?)\/app\.log`	`/logs/userA/app.log`	`__topic__: userA`
Multiple capturing groups - unnamed (multiple `(.*?)`)	Multiple dimensions are needed to distinguish the source, but no semantic tags are required	Generates a tag field `__tag__:__topic_{i}__`, where `{i}` is the ordinal number of the capturing group	`\/logs\/(.?)\/(.?)\/app\.log`	`/logs/userA/svcA/app.log`	`__tag__:__topic_1__userA` `__tag__:__topic_2__svcA`
Multiple capturing groups - named (using `(?P<name>.*？)`	Multiple dimensions are needed to distinguish the source, and the field meanings should be clear for easy querying and analysis	Generates a tag field `__tag__:{name}`	`\/logs\/(?P<user>.?)\/(?P<service>.?)\/app\.log`	`/logs/userA/svcA/app.log`	`__tag__:user:userA`; `__tag__:service:svcA`

Log tagging

You can enable the log tag enrichment feature to extract key information from container environment variables or Kubernetes pod labels and append it as tags. This allows for fine-grained grouping of logs.

Configuration steps: On the Logtail Configuration page, in the Input Configurations section, enable Log Tag Enrichment and click Add.

Environment Variables: Configure the environment variable name and tag name. The environment variable value is stored in the tag name.
- Environment Variable Name: Specify the name of the environment variable to extract.
- Tag Name: The name of the environment variable tag.

Pod Labels: Configure the pod label name and tag name. The pod label value is stored in the tag.
- Pod Label Name: The name of the Kubernetes pod label to extract.
- Tag Name: The name of the tag.

5. Output configuration

By default, all logs are collected and sent to the current Logstore using lz4 compression. To distribute logs from the same source to different Logstores, perform the following steps:

Dynamic multi-destination distribution

Important

Multi-destination sending is applicable only to LoongCollector 3.0.0 and later. Logtail does not support this feature.
You can configure up to five output destinations.
After you configure multiple output destinations, this collection configuration is no longer displayed in the collection configuration list of the current Logstore. For more information about how to view, modify, or delete the multi-destination distribution configuration, see How do I manage multi-destination distribution configurations?.

Configuration steps: On the Logtail Configuration page, in the Output Configuration area.

Click to expand the output configuration.
Click Add Output Target and complete the following configuration:
- Logstore: Select the target Logstore.
- Compression Method: Supports lz4 and zstd.
- Routing Configuration: Routes logs based on their tag field. Logs that meet the routing configuration are uploaded to the target Logstore. If the routing configuration is empty, all collected logs are uploaded to the target Logstore.
  - Tag Name: The name of the tag field used for routing. Enter the field name directly, such as __path__. The __tag__: prefix is not required. Tag fields are divided into the following two types:
    For more information about tags, see Manage LoongCollector collection tags.
    - Agent-related: Related to the collection agent itself and independent of any plugins. Examples include __hostname__ and __user_defined_id__.
    - Input plugin-related: Dependent on the input plugin, which provides and enriches the log with relevant information. Examples include __path__ for file collection, and _pod_name_ and _container_name_ for Kubernetes collection.
  - Tag Value: If the value of a log's tag field matches this value, the log is sent to the target Logstore.
  - Discard Tag Field: If you enable this option, the uploaded logs do not contain the tag field.

Step 4: Configure query and analysis settings

After you configure log processing and plugins, click Next to go to the Query and Analysis Configurations page:

Full-text index is enabled by default, which supports keyword searches on raw log content.
For precise queries by field, wait for the Preview Data to load, and then click Automatic Index Generation. SLS generates a field index based on the first entry in the preview data.

After the configuration is complete, click Next to finish setting up the entire collection process.

Step 5: View uploaded logs

After you create a collection configuration and apply it to a machine group, the system automatically delivers the configuration and starts collecting incremental logs.

Confirm that new content is added to the log file: LoongCollector collects only incremental logs. You can run tail -f /path/to/your/log/file and trigger a business operation to ensure that new logs are written.

Query logs: Go to the query and analysis page of the target Logstore and click Search & Analyze to check if new logs are being ingested. The default time range is the last 15 minutes. The default fields for container text logs are as follows:

Field Name	Description
__tag__:__hostname__	The name of the container's host.
__tag__:__path__	The path of the log file in the container.
__tag__:_container_ip_	The IP address of the container.
__tag__:_image_name_	The name of the image used by the container. Note If there are multiple images with the same hash but different names or tags, the collection configuration will select one of the names based on the hash for collection. It cannot be guaranteed that the selected name will be consistent with the one defined in the YAML file.
__tag__:_pod_name_	The name of the pod.
__tag__:_namespace_	The namespace to which the pod belongs.
__tag__:_pod_uid_	The unique identifier (UID) of the pod.

Key configuration notes for log collection integrity

Ensuring the integrity of log collection is a core goal of a LoongCollector Sidecar deployment. The following configuration parameters directly affect the integrity and reliability of log data.

LoongCollector resource configuration

In high-data-volume scenarios, a reasonable resource configuration is fundamental to ensuring collection performance on the client side. The key configuration parameters are as follows:

# Configure CPU and memory resources based on the log generation rate
resources:
  limits:
    cpu: "2000m"       
    memory: "2Gi"

# Parameters that affect collection performance
env:
  - name: cpu_usage_limit
    value: "2"      
  - name: mem_usage_limit
    value: "2048"  
  - name: max_bytes_per_sec
    value: "209715200"
  - name: process_thread_count
    value: "8"                             
  - name: send_request_concurrency
    value: "20"

For more information about the relationship between specific data volumes and corresponding configurations, see Logtail network types, startup parameters, and configuration files.

Server-side quota configuration

Server-side quota limits or network anomalies can obstruct data sending on the client side. This creates backpressure on the file collection side and affects log integrity. We recommend that you use CloudLens for SLS to monitor project resource quotas.

Initial collection configuration optimization

The initial file collection policy at pod startup directly affects data integrity, especially in high-speed data writing scenarios.

By configuring the initial collection size, you can specify the starting position for the first collection from a new file. The default initial collection size is 1024 KB.

During the first collection, if the file is smaller than 1024 KB, collection starts from the beginning of the file.
During the first collection, if the file is larger than 1024 KB, collection starts from the position 1024 KB from the end of the file.
The initial collection size can range from 0 to 10,485,760 KB.

enable_full_drain_mode

This is a key parameter for ensuring data integrity. It guarantees that LoongCollector completes all data collection and sending when it receives a SIGTERM signal.

# Parameter that affects collection integrity
env:
  - name: enable_full_drain_mode
    value: "true"                          # Enable full drain mode

FAQ

How do I manage multi-destination distribution configurations?

Because multi-destination distribution configurations are associated with multiple Logstores, they must be maintained on the project-level management page:

Log on to the Simple Log Service console and click the name of the target project.
On the target project page, choose Resources > Configuration Management in the navigation pane on the left.
Note
This page centrally manages all collection configurations under the project, including those that remain after a Logstore is accidentally deleted.

What to do next

Data visualization: Use visualization dashboards to monitor key metric trends.
Automated alerting for data anomalies: Set up alert policies to detect system anomalies in real time.
Simple Log Service collects only incremental logs. To collect historical logs, see Import historical log files.

Appendix: YAML example

This example shows a complete Kubernetes Deployment configuration that includes an application container (Nginx) and a LoongCollector Sidecar container. It is suitable for collecting container logs using the Sidecar mode.

Before you begin, make the following three key replacements:

Replace ${your_aliyun_user_id} with the UID of your Alibaba Cloud account.
Replace ${your_machine_group_user_defined_id} with the custom ID of the machine group that you created in Step 3. The custom ID must be an exact match.
Replace ${your_region_config} with the configuration name that matches the region and network type of your Simple Log Service project.
Example: If your project is in China (Hangzhou) and you use internal network access, set the value to cn-hangzhou. If you use public network access, set the value to cn-hangzhou-internet.

Short-lived tasks (Job/CronJob)

apiVersion: batch/v1
kind: Job
metadata:
  name: demo-job
spec:
  backoffLimit: 3                   
  activeDeadlineSeconds: 3600        
  completions: 1                     
  parallelism: 1                    
  
  template:
    spec:
      restartPolicy: Never         
      terminationGracePeriodSeconds: 300 
      
      containers:
        # Application container
        - name: demo-job
          image: debian:bookworm-slim
          command: ["/bin/bash", "-c"]
          args:
            - |
              # Wait for LoongCollector to be ready
              echo "[$(date)] Business: Waiting for LoongCollector to be ready..."
              until [[ -f /tasksite/cornerstone ]]; do 
                sleep 1
              done
              echo "[$(date)] Business: LoongCollector is ready, starting business logic"
              
              # Execute business logic
              echo "Hello, World!" >> /app/logs/business.log
              
              # Save the exit code
              retcode=$?
              echo "[$(date)] Business: Task completed with exit code: $retcode"
              
              # Notify LoongCollector that the business task is complete
              touch /tasksite/tombstone
              echo "[$(date)] Business: Tombstone created, exiting"
              
              exit $retcode
          
          # Resource limits
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500"
              memory: "512Mi"
          
          # Volume mounts
          volumeMounts:
            - name: app-logs
              mountPath: /app/logs
            - name: tasksite
              mountPath: /tasksite


        # LoongCollector Sidecar container
        - name: loongcollector
          image: aliyun-observability-release-registry.cn-hongkong.cr.aliyuncs.com/loongcollector/loongcollector:v3.1.1.0-20fa5eb-aliyun
          command: ["/bin/bash", "-c"]
          args:
            - |
              echo "[$(date)] LoongCollector: Starting initialization"
              
              # Start the LoongCollector service
              /etc/init.d/loongcollectord start
              
              # Wait for the configuration to download and the service to be ready
              sleep 15
              
              # Verify the service status
              if /etc/init.d/loongcollectord status; then
                echo "[$(date)] LoongCollector: Service started successfully"
                touch /tasksite/cornerstone
              else
                echo "[$(date)] LoongCollector: Failed to start service"
                exit 1
              fi
              
              # Wait for the application container to complete
              echo "[$(date)] LoongCollector: Waiting for business container to complete"
              until [[ -f /tasksite/tombstone ]]; do 
                sleep 2
              done
              
              echo "[$(date)] LoongCollector: Business completed, waiting for log transmission"
              # Allow enough time to transmit remaining logs
              sleep 30
              
              echo "[$(date)] LoongCollector: Stopping service"
              /etc/init.d/loongcollectord stop
              
              echo "[$(date)] LoongCollector: Shutdown complete"
          
          # Health check
          livenessProbe:
            exec:
              command: ["/etc/init.d/loongcollectord", "status"]
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          
          # Resource configuration
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
          
          # Environment variable configuration
          env:
            - name: ALIYUN_LOGTAIL_USER_ID
              value: "your-user-id"
            - name: ALIYUN_LOGTAIL_USER_DEFINED_ID
              value: "your-user-defined-id"
            - name: ALIYUN_LOGTAIL_CONFIG
              value: "/etc/ilogtail/conf/cn-hongkong/ilogtail_config.json"
            - name: ALIYUN_LOG_ENV_TAGS
              value: "_pod_name_|_pod_ip_|_namespace_|_node_name_"
            
            # Pod information injection
            - name: "_pod_name_"
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: "_pod_ip_"
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: "_namespace_"
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: "_node_name_"
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          
          # Volume mounts
          volumeMounts:
            - name: app-logs
              mountPath: /app/logs
              readOnly: true
            - name: tasksite
              mountPath: /tasksite
            - name: tz-config
              mountPath: /etc/localtime
              readOnly: true
      
      # Volume definitions
      volumes:
        - name: app-logs
          emptyDir: {}
        - name: tasksite
          emptyDir:
            medium: Memory
            sizeLimit: "10Mi"
        - name: tz-config
          hostPath:
            path: /usr/share/zoneinfo/Asia/Shanghai

Long-lived services (Deployment / StatefulSet)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-demo
  namespace: production
  labels:
    app: nginx-demo
    version: v1.0.0
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1      
      maxSurge: 1          
  selector:
    matchLabels:
      app: nginx-demo
  template:
    metadata:
      labels:
        app: nginx-demo
        version: v1.0.0    
    spec:
      terminationGracePeriodSeconds: 600  # 10-minute graceful shutdown period
      
      containers:
        # Application container - Web application
        - name: nginx-demo
          image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6          
          # Startup command and signal handling
          command: ["/bin/sh", "-c"]
          args:
            - |
              # Define the signal handler function
              _term_handler() {
                  echo "[$(date)] [nginx-demo] Caught SIGTERM, starting graceful shutdown..."
                  
                  # Send a QUIT signal to Nginx for a graceful stop
                  if [ -n "$NGINX_PID" ]; then
                      kill -QUIT "$NGINX_PID" 2>/dev/null || true
                      echo "[$(date)] [nginx-demo] Sent SIGQUIT to Nginx PID: $NGINX_PID"
                      
                      # Wait for Nginx to stop gracefully
                      wait "$NGINX_PID"
                      EXIT_CODE=$?
                      echo "[$(date)] [nginx-demo] Nginx stopped with exit code: $EXIT_CODE"
                  fi
                  
                  # Notify LoongCollector that the application container has stopped
                  echo "[$(date)] [nginx-demo] Writing tombstone file"
                  touch /tasksite/tombstone
                  
                  exit $EXIT_CODE
              }
              
              # Register the signal handler
              trap _term_handler SIGTERM SIGINT SIGQUIT

              # Wait for LoongCollector to be ready
              echo "[$(date)] [nginx-demo]: Waiting for LoongCollector to be ready..."
              until [[ -f /tasksite/cornerstone ]]; do 
                sleep 1
              done
              echo "[$(date)] [nginx-demo]: LoongCollector is ready, starting business logic"
              
              # Start Nginx
              echo "[$(date)] [nginx-demo] Starting Nginx..."
              nginx -g 'daemon off;' &
              NGINX_PID=$!
              echo "[$(date)] [nginx-demo] Nginx started with PID: $NGINX_PID"
              
              # Wait for the Nginx process
              wait $NGINX_PID
              EXIT_CODE=$?
              
              # Also notify LoongCollector if the exit was not caused by a signal
              if [ ! -f /tasksite/tombstone ]; then
                  echo "[$(date)] [nginx-demo] Unexpected exit, writing tombstone"
                  touch /tasksite/tombstone
              fi
              
              exit $EXIT_CODE
                    
          # Resource configuration
          resources:
            requests:
              cpu: "200m"
              memory: "256Mi"
            limits:
              cpu: "1000m"
              memory: "1Gi"
          
          # Volume mounts
          volumeMounts:
            - name: nginx-logs
              mountPath: /var/log/nginx
            - name: tasksite
              mountPath: /tasksite
            - name: tz-config
              mountPath: /etc/localtime
              readOnly: true

        # LoongCollector Sidecar container
        - name: loongcollector
          image: aliyun-observability-release-registry.cn-shenzhen.cr.aliyuncs.com/loongcollector/loongcollector:v3.1.1.0-20fa5eb-aliyun
          command: ["/bin/bash", "-c"]
          args:
            - |
              echo "[$(date)] LoongCollector: Starting initialization"
              
              # Start the LoongCollector service
              /etc/init.d/loongcollectord start
              
              # Wait for the configuration to download and the service to be ready
              sleep 15
              
              # Verify the service status
              if /etc/init.d/loongcollectord status; then
                echo "[$(date)] LoongCollector: Service started successfully"
                touch /tasksite/cornerstone
              else
                echo "[$(date)] LoongCollector: Failed to start service"
                exit 1
              fi
              
              # Wait for the application container to complete
              echo "[$(date)] LoongCollector: Waiting for business container to complete"
              until [[ -f /tasksite/tombstone ]]; do 
                sleep 2
              done
              
              echo "[$(date)] LoongCollector: Business completed, waiting for log transmission"
              # Allow enough time to transmit remaining logs
              sleep 30
              
              echo "[$(date)] LoongCollector: Stopping service"
              /etc/init.d/loongcollectord stop
              
              echo "[$(date)] LoongCollector: Shutdown complete"
          
          # Health check
          livenessProbe:
            exec:
              command: ["/etc/init.d/loongcollectord", "status"]
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          # Resource configuration
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "2000m"
              memory: "2048Mi"
          
          # Environment variable configuration
          env:
            - name: ALIYUN_LOGTAIL_USER_ID
              value: "${your_aliyun_user_id}"
            - name: ALIYUN_LOGTAIL_USER_DEFINED_ID
              value: "${your_machine_group_user_defined_id}"
            - name: ALIYUN_LOGTAIL_CONFIG
              value: "/etc/ilogtail/conf/${your_region_config}/ilogtail_config.json"
            
            # Enable full drain mode to ensure all logs are sent when the pod stops
            - name: enable_full_drain_mode
              value: "true"
            
            # Append pod environment context as log tags
            - name: "ALIYUN_LOG_ENV_TAGS"
              value: "_pod_name_|_pod_ip_|_namespace_|_node_name_|_node_ip_"
            # Get pod and node information
            - name: "_pod_name_"
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: "_pod_ip_"
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: "_namespace_"
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: "_node_name_"
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: "_node_ip_"
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
          
          # Volume mounts
          volumeMounts:
            - name: nginx-logs
              mountPath: /var/log/nginx
              readOnly: true
            - name: tasksite
              mountPath: /tasksite
            - name: tz-config
              mountPath: /etc/localtime
              readOnly: true
      
      # Volume definitions
      volumes:
        - name: nginx-logs
          emptyDir: {}
        - name: tasksite
          emptyDir:
            medium: Memory
            sizeLimit: "50Mi"
        - name: tz-config
          hostPath:
            path: /usr/share/zoneinfo/Asia/Shanghai

Appendix: Native parsing plugin details

In the Processor Configurations section of the Logtail Configuration page, add processors to structure raw logs. To add a processing plugin to an existing collection configuration, follow these steps:

In the navigation pane on the left, choose Logstores and find the target logstore.
Click the icon before its name to expand the logstore.
Click Logtail Configuration. In the configuration list, find the target Logtail configuration and click Manage Logtail Configuration in the Actions column.
On the Logtail configuration page, click Edit.

This section introduces only commonly used processing plugins that cover common log processing use cases. For more features, see Extended processors.

Important

Rules for combining plugins (for LoongCollector / Logtail 2.0 and later):

Native and extended processors can be used independently or combined as needed.
Prioritize native processor because they offer better performance and higher stability.
When native features cannot meet your business needs, add extended processors after the configured native ones for supplementary processing.

Order constraint:

All plugins are executed sequentially in the order they are configured, which forms a processing chain. Note: All native processors must precede any extended processors. After you add an extended processor, you cannot add more native processors.

Regular expression parsing

You can extract log fields using a regular expression and parse the log into key-value pairs. Each field can be independently queried and analyzed.

Example:

Raw log without any processing

Using the regular expression parsing plugin

127.0.0.1 - - [16/Aug/2024:14:37:52 +0800] "GET /wp-admin/admin-ajax.php?action=rest-nonce HTTP/1.1" 200 41 "http://www.example.com/wp-admin/post-new.php?post_type=page" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0"

body_bytes_sent: 41
http_referer: http://www.example.com/wp-admin/post-new.php?post_type=page
http_user_agent: Mozilla/5.0 (Windows NT 10.0; Win64; ×64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0
remote_addr: 127.0.0.1
remote_user: -
request_method: GET
request_protocol: HTTP/1.1
request_uri: /wp-admin/admin-ajax.php?action=rest-nonce
status: 200
time_local: 16/Aug/2024:14:37:52 +0800

Configuration steps: In the Processor Configurations section of the Logtail Configuration page, click Add Processor and select Native Processor > Data Parsing (Regex Mode):

Regular Expression: Used to match logs. You can generate it automatically or enter it manually:
- Automatic generation:
  - Click Auto-generate Regular Expression.
  - In the Log Sample, select the log content that you want to extract.
  - Click Generate Regular Expression.
- Manual input: Manually Enter A Regular Expression based on the log format.
After you configure the expression, click Validate to test whether the regular expression can correctly parse the log content.
Extracted Field: Set the corresponding field name (Key) for the extracted log content (Value).
For more information about the other parameters, see the general configuration parameter descriptions in Scenario 2: Structured logs.

Delimiter-based parsing

You can structure log content using a delimiter to parse it into multiple key-value pairs. Both single-character and multi-character delimiters are supported.

Example:

Raw log without any processing

Fields split by the specified character ,

05/May/2025:13:30:28,10.10.*.*,"POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=****************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=******************************** HTTP/1.1",200,18204,aliyun-sdk-java

ip:10.10.*.*
request:POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=****************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=******************************** HTTP/1.1
size:18204
status:200
time:05/May/2025:13:30:28
user_agent:aliyun-sdk-java

Configuration steps: In the Processor Configurations section of the Logtail Configuration page, click Add Processor, and select Native Processor > Data Parsing (Delimiter Mode):

Delimiter: Specify the character used to split the log content.
Example: For a CSV file, select Custom and enter a comma (,).
Quote: If a field value contains the separator, you must specify a quote character to wrap the field to avoid incorrect splitting.
Extracted Field: Set a field name (Key) for each column in the order that they are separated. The following rules apply:
- The field name can contain only letters, digits, and underscores (_).
- The name must start with a letter or an underscore (_).
- The maximum length is 128 bytes.
For more information about the other parameters, see the general configuration parameter descriptions in Scenario 2: Structured logs.

Standard JSON parsing

You can structure an Object-type JSON log by parsing it into key-value pairs.

Example:

Raw log without any processing

Automatic extraction of standard JSON key-value pairs

{"url": "POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=U0Ujpek********&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=pD12XYLmGxKQ%2Bmkd6x7hAgQ7b1c%3D HTTP/1.1", "ip": "10.200.98.220", "user-agent": "aliyun-sdk-java", "request": {"status": "200", "latency": "18204"}, "time": "05/Jan/2025:13:30:28"}

ip: 10.200.98.220
request: {"status": "200", "latency" : "18204" }
time: 05/Jan/2025:13:30:28
url: POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=U0Ujpek******&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=pD12XYLmGxKQ%2Bmkd6x7hAgQ7b1c%3D HTTP/1.1
user-agent:aliyun-sdk-java

Configuration steps: On the Logtail Configuration page, in the Processor Configurations section, click Add Processor, and select Native Processor > Data Parsing (JSON Mode):

Original Field: The default value is `content`. This field is used to store the raw log content to be parsed.
For more information about the other parameters, see the general configuration parameter descriptions in Scenario 2: Structured logs.

Nested JSON parsing

You can parse a nested JSON log into key-value pairs by specifying the expansion depth.

Example:

Raw log without any processing

Expansion depth: 0, using expansion depth as a prefix

Expansion depth: 1, using expansion depth as a prefix

{"s_key":{"k1":{"k2":{"k3":{"k4":{"k51":"51","k52":"52"},"k41":"41"}}}}}

0_s_key_k1_k2_k3_k41:41
0_s_key_k1_k2_k3_k4_k51:51
0_s_key_k1_k2_k3_k4_k52:52

1_s_key:{"k1":{"k2":{"k3":{"k4":{"k51":"51","k52":"52"},"k41":"41"}}}}

Configuration steps: In the Processor Configurations section of the Logtail Configuration page, click Add Processor, and select Extended Processor > Expand JSON Field:

Original Field: The name of the raw field to be expanded, for example, content.
JSON Expansion Depth: The expansion level of the JSON object. A value of 0 indicates that the object is fully expanded, which is the default. A value of 1 indicates that only the current level is expanded.
Character To Concatenate Expanded Keys: The character used to concatenate field names during JSON expansion. The default is an underscore (_).
JSON Expansion Prefix: Specify a prefix for the field names after JSON expansion.
Expand Array: Enable this option to expand arrays into key-value pairs with indexes.
Example: {"k":["a","b"]} is expanded to {"k[0]":"a","k[1]":"b"}.
To rename the expanded fields, for example, from prefix_s_key_k1 to new_field_name, you can add a Rename Fields plugin to complete the mapping.
For more information about the other parameters, see the general configuration parameter descriptions in Scenario 2: Structured logs.

JSON array parsing

Use the json_extract function to extract JSON objects from a JSON array.

Example:

Raw log without any processing	Extract JSON array structure
`[{"key1":"value1"},{"key2":"value2"}]`	`json1:{"key1":"value1"} json2:{"key2":"value2"}`

Procedure: In the Processor Configurations section of the Logtail Configuration page, switch the Processing Mode to SPL, configure the SPL Statement, and use the json_extract function to extract JSON objects from the JSON array.

Example: Extract elements from the JSON array in the log field content and store the results in new fields json1 and json2.

* | extend json1 = json_extract(content, '$[0]'), json2 = json_extract(content, '$[1]')

Apache log parsing

You can structure log content based on the definitions in the Apache log configuration file by parsing it into multiple key-value pairs.

Example:

Raw log without any processing

Apache Common Log Format combined parsing

1 192.168.1.10 - - [08/May/2024:15:30:28 +0800] "GET /index.html HTTP/1.1" 200 1234 "https://www.example.com/referrer" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.X.X Safari/537.36"

http_referer:https://www.example.com/referrer
http_user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.X.X Safari/537.36
remote_addr:192.168.1.10
remote_ident:-
remote_user:-
request_method:GET
request_protocol:HTTP/1.1
request_uri:/index.html
response_size_bytes:1234
status:200
time_local:[08/May/2024:15:30:28 +0800]

Configuration steps: In the Processing Configuration section of the Logtail Configuration page, click Add Processing Plugin, and select Native Processing Plugin > APACHE Pattern Parsing:

Log Format: combined
Apache Configuration Fields: The system automatically fills in the configuration based on the Log Format.
Important
Make sure that the auto-filled content is exactly the same as the LogFormat defined in the Apache configuration file on the server. The file is usually located at /etc/apache2/apache2.conf.
For more information about the other parameters, see the general configuration parameter descriptions in Scenario 2: Structured logs.

Data masking

Mask sensitive data in logs.

Example:

Raw log without any processing

Masking result

[{'account':'1812213231432969','password':'04a23f38'}, {'account':'1812213685634','password':'123a'}]

[{'account':'1812213231432969','password':'********'}, {'account':'1812213685634','password':'********'}]

Procedure: In the Processor Configurations section of the Logtail Configuration page, click Add Processor and select Native Processor > Data Masking:

Original Field: The field that contains the log content before parsing.
Data Masking Method:
- const: Replaces sensitive content with a constant string.
- md5: Replaces sensitive content with its MD5 hash.
Replacement String: If Data Masking Method is set to const, enter a string to replace the sensitive content.
Content Expression that Precedes Replaced Content: The expression used to find sensitive content, which is configured using RE2 syntax.
Content Expression to Match Replaced Content: The regular expression used to match sensitive content. The expression must be written in RE2 syntax.

Time parsing

Parse the time field in the log and set the parsing result as the log's __time__ field.

Example:

Raw log without any processing	Time parsing
`{"level":"INFO","timestamp":"2025-09-23T19:11:47+0800","cluster":"yilu-cluster-0728","message":"User logged in successfully","userId":"user-123"}`

Procedure: In the Processor Configurations section of the Logtail Configuration page, click Add Processor and select Native Processor > Time Parsing:

Original Field: The field that contains the log content before parsing.
Time Format: Set the time format that corresponds to the timestamps in the log.
Time Zone: Select the time zone for the log time field. By default, this is the time zone of the environment where the LoongCollector (Logtail) process is running.

Appendix: Regular expression limits (Container filtering)

The regular expressions used for Container filtering are based on Go's RE2 engine, which has some syntax limitations compared to other engines such as PCRE. Note the following when you write regular expressions:

1. Named group syntax differences

Go uses the (?P<name>...) syntax to define named groups and does not support the (?<name>...) syntax from PCRE.

Correct example: (?P<year>\d{4})
Incorrect syntax: (?<year>\d{4})

2. Unsupported regular expression features

The following common but complex regular expression features are not available in RE2. You should avoid using them:

Assertions: (?=...), (?!...), (?<=...), or (?<!...)
Conditional expressions: (?(condition)true|false)
Recursive matching: (?R) or (?0)
Subprogram references: (?&name) or (?P>name)
Atomic groups: (?>...)

3. Recommendations

We recommend that you use a tool such as Regex101 to debug regular expressions. You can select the Golang (RE2) mode for validation to ensure compatibility. If you use any of the unsupported syntax, the plugin does not parse or match correctly.