All Products
Search
Document Center

Simple Log Service:Collect host text logs in a single operation

Last Updated:Mar 25, 2026

For use cases like historical file collection, data migration, or batch data processing, traditional incremental log collection is not suited for a one-time collection of existing static files. The Host Text One-time Collection feature lets you deploy collection configurations in bulk to a Machine Group through the Console or an API. The resulting task collects the content of specified static files once and then automatically terminates.

Scope

  • LoongCollector version 3.3 or later.

  • Supports host collection on Linux and Windows, but not container collection.

Collection configuration workflow

  1. Preparation: Create a Project and a Logstore. A Project is a resource management unit used to isolate logs from different services, and a Logstore stores logs.

  2. Configure a machine group (install LoongCollector): Install LoongCollector based on your server type and add it to a machine group. Use the machine group to manage collection nodes, distribute configurations, and monitor server health.

  3. Create and configure a one-time file collection rule:

    1. Global and input configuration: Define the collection configuration name and specify the source and scope of log collection.

    2. Log processing and structuring: Configure processing settings based on your log format.

      • Multiline logs: This feature handles log entries that span multiple lines, such as Java stack traces or Python tracebacks. Use a start-of-line regex to identify the start of each log entry, merging subsequent lines into a single record.

      • Structured parsing: Configure a parser plugin (such as regex, delimiter, or NGINX mode) to parse raw strings into structured key-value pairs. This allows you to query and analyze each field independently.

    3. Log filtering: Configure collection blocklists and content filtering rules to filter for relevant log content, reducing redundant data transmission and storage.

    4. Log categorization: Configure log topics to flexibly categorize logs from different services, servers, or source paths.

  4. Query and analysis configuration: Full-text indexing is enabled by default and supports keyword searches. We recommend enabling a field index on structured fields to improve search efficiency and allow for precise queries and analysis.

  5. Verify collection results: After you complete the configuration, verify that logs are collected successfully. If you encounter issues such as logs not being collected, heartbeat failures, or parsing errors, see the FAQ.

Prerequisites

Before you can collect logs, create a Project and a LogStore. If you already have them, skip this step and proceed to Configure a Machine Group (Install LoongCollector).

Create Project

  1. Log in to the Log Service console.

  2. Click Create Project and configure the following settings:

    • Region: Select the region where your log sources are located. This setting cannot be changed after the Project is created.

    • Project name: Enter a globally unique name for the Project within Alibaba Cloud. The name cannot be changed after creation.

    • Leave the other settings as default and click Create. For more information on other parameters, see Create a Project.

Create LogStore

  1. In the Project list, click your target Project.

  2. In the left-side navigation pane, choose imageLog Storage and click +.

  3. On the Create LogStore page, configure the following settings:

    • LogStore name: Enter a name that is unique within the Project. This name cannot be changed after creation.

    • LogStore type: Select Standard or Query based on your feature requirements.

    • billing method:

      • pay-by-feature: This method bills you independently for resources like storage, indexing, and read/write operations. This method is ideal for small-scale scenarios or when your feature usage is uncertain.

      • pay-by-ingested-data: This method bills you only for the volume of ingested raw data. This method includes 30 days of free storage and free features such as data processing and shipping. It is ideal for business scenarios with a retention period of approximately 30 days or with complex data processing pipelines.

    • data retention period: Specify the number of days to retain logs, from 1 to 3,650. A value of 3,650 signifies permanent retention. The default is 30 days.

    • Leave the other settings as default and click OK. For more information on other settings, see Manage a LogStore.

Step 1: Configure a machine group (install LoongCollector)

After you complete the prerequisites, install LoongCollector on the servers and add them to a machine group.

Note

These installation steps apply only if the log source is an ECS instance in the same account and region as the Log Service Project.

If your ECS instance and Project are not in the same account or region, or if your log source is an on-premises server, see Install and configure LoongCollector.

Procedure

  1. On the imageLogstores page, click image next to the target Logstore name to expand it.

  2. Click Data Import > Logtail Configurations. On the One-time Logtail Configuration tab, click Add Logtail Configuration.

  3. In the Quick Data Import dialog box, click Import Now on the One-time File Collection - Host card.

  4. On the Machine Group Configuration page, configure the following parameters:

    • Use Case: Host Scenario

    • Installation Environment: ECS

    • Configure Machine Group: The next action depends on whether LoongCollector is installed and a machine group exists.

      • If LoongCollector is installed and added to a machine group, select the machine group from the Source Machine Groups list and add it to the Applied Machine Groups list. A new machine group is not required.

      • If LoongCollector is not installed, click Create Machine Group.

        These steps guide you through automatically installing LoongCollector and creating a machine group.
        1. The system automatically lists ECS instances that are in the same region as your Project. Select one or more instances from which you want to collect logs.

        2. Click Install and Create as Machine Group. The system automatically installs LoongCollector on the selected ECS instances.

        3. Enter a Name for the machine group and click OK.

        Note

        If the installation fails or remains in a pending state, verify that the ECS instance is in the same region as the Project.

      • To add a server with LoongCollector already installed to an existing machine group, see the FAQ topic How do I add a server to an existing machine group?

  5. Check heartbeat status: Click Next. The Machine Group Heartbeat Status section appears. Verify that the Heartbeat status is OK, which indicates a successful connection. Then, click Next to continue to the Logtail configuration.

    If the status is FAIL, the initial heartbeat may take a moment to establish. Wait about two minutes, then refresh the status. If the status remains FAIL, see Machine group heartbeat connection fails for troubleshooting.

Step 2: Configure one-time file collection

After you complete the LoongCollector installation and machine group configuration, go to the Logtail configuration page to define log collection and processing rules.

1. Global and input configuration

Define the collection configuration's name and specify the log collection source and scope.

global configuration:

  • Configuration name: Enter a custom name for the collection configuration. The name must be unique within the Project and cannot be modified after creation. The naming rules are as follows:

    • Can contain only lowercase letters, digits, hyphens (-), and underscores (_).

    • Must start and end with a lowercase letter or a digit.

  • Execution timeout: The default is 600 seconds (10 minutes), and the valid range is from 600 to 604,800 seconds (10 minutes to 7 days). If a task exceeds this timeout, the system stops it and does not collect any remaining data.

    Important

    Important: When you update a configuration, its effective period is reset. To avoid duplicate tasks or unexpected data reporting, ensure that the machine group scope is correct and that the previous task execution did not exceed the specified execution timeout.

  • Force task rerun on update : This option is turned off by default.

    • Off:

      • When you update any collection parameters other than execution timeout or the input configuration, the system resumes the current collection progress instead of restarting the task. This ensures collection continuity.

      • If you modify the execution timeout or the input configuration, the system reruns the collection task.

    • On: When you update the collection configuration, the collection task reruns from the beginning. This ensures all data is processed and reported using the new settings. Note: Previously collected data is not deleted. To clear previously collected data, see Log Service soft delete.

input configuration:

  • Type: one-time file collection (available in LoongCollector 3.3 and later).

  • File path: The path from which logs are collected.

    Note

    LoongCollector determines the list of files and their sizes for collection when it retrieves the configuration. It does not collect new files or content appended after this point.

    • Linux: The path must start with a forward slash (/). For example, /data/mylogs/**/*.log collects all files with the .log extension in all subdirectories of /data/mylogs.

    • Windows: The path must start with a drive letter. For example, C:\Program Files\Intel\**\*.Log.

  • Maximum directory monitoring depth: The maximum directory depth that the ** wildcard in the file path can match. The default is 0, which means it monitors only the current directory.


2. Log processing and parsing

Configure log processing rules to convert raw, unstructured logs into structured, searchable data, which improves query and analysis efficiency. We recommend that you first Add Log Sample.

In the processor configuration section of the Logtail configuration page, click Add Log Sample and enter your log content. The system uses this sample to identify the log format and helps generate regular expressions and parsing rules, simplifying the configuration process.

Scenario 1: Processing multi-line logs

Logs such as Java exception stack traces and JSON objects often span multiple lines. In the default collection mode, these logs are split into multiple incomplete records, which causes a loss of context. To prevent this, you can enable multiline mode. By configuring a line-beginning regular expression, you can merge consecutive lines that belong to the same log entry into a single, complete log.

Example:

Unprocessed raw log

In the default mode, the collector treats each line as a separate log, which breaks up the stack trace and causes context loss.

With multiline mode enabled, a line-beginning regular expression identifies complete log entries, preserving the full semantic structure.

image

image

image

Procedure: In the processor configuration section of the Logtail configuration page, enable Multiline Mode.

  • Type: Select Custom or Multiline JSON.

    • Custom: Use this option for logs with a non-fixed format. You must configure a line-beginning regular expression to identify the start of each log entry.

      • Line-beginning regular expression: You can generate this expression automatically or enter it manually. The regular expression must match a complete line of data. For example, the expression for the sample log is \[\d+-\d+-\w+:\d+:\d+,\d+]\s\[\w+]\s.*.

        • To automatically generate the expression, click Auto-generate Regex, select the desired log content in the Log Sample text box, and then click Generate Regex.

        • To manually enter the expression, click Manually Enter Regex, provide your expression, and then click Validate.

    • Multiline JSON: When your raw logs are in standard JSON format, Log Service automatically handles newlines within a single JSON log entry.

  • Action on split failure:

    • Discard: If a block of text does not match the line-beginning rule, it is discarded.

    • Keep as Single Lines: Keeps any text that does not match the rule as individual single-line logs.

Scenario 2: Processing structured logs

Querying and analyzing unstructured or semi-structured text, such as NGINX access logs or application output, can be inefficient. Log Service provides data processors to automatically convert raw logs of different formats into structured data. This provides a solid foundation for subsequent analysis, monitoring, and alerting.

Example:

Unprocessed raw log

Log after structured parsing

192.168.*.* - - [15/Apr/2025:16:40:00 +0800] "GET /nginx-logo.png HTTP/1.1" 0.000 514 200 368 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.*.* Safari/537.36"
body_bytes_sent: 368
http_referer: -
http_user_agent : Mozi11a/5.0 (Nindows NT 10.0; Win64; x64) AppleMebKit/537.36 (KHTML, like Gecko) Chrome/131.0.x.x Safari/537.36
remote_addr:192.168.*.*
remote_user: -
request_length: 514
request_method: GET
request_time: 0.000
request_uri: /nginx-logo.png
status: 200
time_local: 15/Apr/2025:16:40:00

Procedure: In the processor configuration section of the Logtail configuration page:

  1. Add a processor: Click Add Processing Plugin and configure a regex, delimiter, or JSON processor based on your log format. This example uses NGINX logs and selects native processor > NGINX-mode processor.

  2. NGINX log configuration: Copy the complete log_format definition from your NGINX server's configuration file (nginx.conf) and paste it into the text box.

    Example:

    log_format main  '$remote_addr - $remote_user [$time_local] "$request" ''$request_time $request_length ''$status $body_bytes_sent "$http_referer" ''"$http_user_agent"';
    Important

    The format definition here must exactly match the format used to generate the logs on your server. Otherwise, parsing will fail.

  3. Common parameters: The following parameters are common across multiple data processors and serve the same purpose.

    • Source field: Specifies the source field to be parsed. The default is content, which refers to the entire collected log entry.

    • Keep source field on parse failure: Recommended. If the processor fails to parse a log (for example, due to a format mismatch), this option retains the original log content in the specified source field.

    • Keep source field on parse success: If selected, this option retains the original log content after successful parsing.


3. Log filtering

Indiscriminately collecting large volumes of low-value or irrelevant logs, such as DEBUG or INFO level entries, can waste storage resources, increase costs, hinder query performance, and pose data leakage risks. To address these issues, you can use fine-grained filtering to ensure efficient and secure log collection.

Filter by content

Filter logs based on field content, for example, by collecting only logs where the level is WARNING or ERROR.

Example:

Unprocessed raw logs

Collect only WARNING or ERROR logs

{"level":"WARNING","timestamp":"2025-09-23T19:11:40+0800","cluster":"yilu-cluster-0728","message":"Disk space is running low","freeSpace":"15%"}
{"level":"ERROR","timestamp":"2025-09-23T19:11:42+0800","cluster":"yilu-cluster-0728","message":"Failed to connect to database","errorCode":5003}
{"level":"INFO","timestamp":"2025-09-23T19:11:47+0800","cluster":"yilu-cluster-0728","message":"User logged in successfully","userId":"user-123"}
{"level":"WARNING","timestamp":"2025-09-23T19:11:40+0800","cluster":"yilu-cluster-0728","message":"Disk space is running low","freeSpace":"15%"}
{"level":"ERROR","timestamp":"2025-09-23T19:11:42+0800","cluster":"yilu-cluster-0728","message":"Failed to connect to database","errorCode":5003}

Procedure: In the processor configuration section of the Logtail configuration page:

Click Add Processing Plugin and select native processor > Filter Processing.

  • Field name: The log field to filter by.

  • Field value: The regular expression used for filtering. Full-text matching is required; partial keyword matching is not supported.

Filter with a blacklist

Use a collection blacklist to exclude specified directories or files and prevent the upload of irrelevant or sensitive logs.

Procedure: In the Logtail configuration page, go to the input configuration > Other Input Configurations section, enable collection blacklist, and click Add.

Supports exact matching and wildcard matching for directories and filenames. The only supported wildcards are the asterisk (*) and the question mark (?).
  • File path blacklist: The file paths to ignore. Examples:

    • /home/admin/private*.log: Ignores all files in the /home/admin/ directory that start with private and end with .log.

    • /home/admin/private*/*_inner.log: Ignores all files that end with _inner.log in directories that start with private under the /home/admin/ directory.

  • File blacklist: The filenames to ignore during collection. Example:

    • app_inner.log: Ignores all files named app_inner.log.

  • Directory blacklist: Directory paths cannot end with a forward slash (/). Examples:

    • /home/admin/dir1/: The directory blacklist will have no effect.

    • /home/admin/dir*: Ignores all files in subdirectories under the /home/admin/ directory that start with dir.

    • /home/admin/*/dir: Ignores all files in second-level subdirectories named dir under the /home/admin/ directory. For example, files in the /home/admin/a/dir directory are ignored, but files in the /home/admin/a/b/dir directory are collected.

4. Log categorization

When logs from multiple applications or instances share the same format but have different paths (for example, /apps/app-A/run.log and /apps/app-B/run.log), it can be difficult to distinguish their sources after collection. By configuring a log topic, you can logically differentiate logs from various applications, services, or paths, enabling efficient categorization and precise querying within a unified storage destination.

Procedure: In the Global Configuration > Other Global Configurations > Log Topic Type section, select how to generate the topic. You can use one of the following three methods:

  • Machine group topic: When you apply a collection configuration to multiple machine groups, LoongCollector automatically uses the name of the server's machine group as the value for the __topic__ field. This is useful for scenarios where you want to categorize logs by host.

  • Custom: Use the format customized://<your-topic-name>, for example, customized://app-login. This method is suitable for static topics with fixed business identifiers.

  • File path-based extraction: Extracts key information from the full path of a log file to dynamically tag the log source. This is useful when multiple users or applications share the same log filename but reside in different paths. For example, when multiple users or services write logs to different top-level directories but use the same subdirectory and filename, you cannot distinguish the source by filename alone.

    /data/logs
    ├── userA
    │   └── serviceA
    │       └── service.log
    ├── userB
    │   └── serviceA
    │       └── service.log
    └── userC
        └── serviceA
            └── service.log

    In this case, you can configure file path-based extraction and use a regular expression to extract key information from the full path. Log Service then uploads the matched result to the Logstore as the log topic.

    File path-based extraction rules: Based on regular expression capture groups

    When you configure the regular expression, the system automatically determines the output field format based on the number and names of the capture groups. The rules are as follows:

    In a regular expression for a file path, you must escape forward slashes (/).

    Capture group type

    Use case

    Generated field

    Regex example

    Sample matched path

    Sample generated field

    Single capture group (one (.*?))

    When you need only one dimension to distinguish sources (for example, username or environment).

    Generates the __topic__ field.

    \/logs\/(.*?)\/app\.log

    /logs/userA/app.log

    __topic__:userA

    Multiple unnamed capture groups (multiple (.*?))

    When you need multiple dimensions to distinguish sources but do not require semantic labels.

    Generates tag fields where the key is formatted as __topic_{i}__, where {i} is the capture group index.

    \/logs\/(.*?)\/(.*?)\/app\.log

    /logs/userA/svcA/app.log

    __tag__:__topic_1__:userA;

    __tag__:__topic_2__:svcA

    Multiple named capture groups (using (?P<name>.*?))

    When you need multiple dimensions to distinguish sources and want clear, meaningful field names for easy querying and analysis.

    Generates tag fields where the key is the specified capture group name.

    \/logs\/(?P<user>.*?)\/(?P<service>.*?)\/app\.log

    /logs/userA/svcA/app.log

    __tag__:user:userA;

    __tag__:service:svcA

Step 3: Query and analysis

After completing log processing and plugin configuration, click Next to go to the query and analysis configuration page.

  • The full-text index is enabled by default and supports keyword searches on raw log content.

  • To query precisely by field, click Automatic Index Generation after the preview data loads. Log Service then generates a field index based on the first entry in the preview data.

After completing the configuration, click Next to finish the collection process.

Step 4: Verify collection results

After the configuration is applied, click Query / Analysis on the Query and Analysis page of the target Logstore to view the collected log data.

FAQ

Lifecycle of a one-time collection configuration

After a one-time collection configuration is created, it follows the lifecycle below:

image
  • Configuration distribution window: LoongCollector can pull the configuration within 5 minutes of its creation. After 5 minutes, new LoongCollector instances cannot obtain this configuration.

  • Automatic configuration deletion: The configuration is automatically deleted 7 days after creation.

  • Task execution: The collection task must be completed within the execution timeout. If the task exceeds this time, the system forcibly stops it.

One-time vs. legacy file collection

The legacy historical file collection method is no longer recommended. Use the new one-time file collection feature for importing historical data. Compared to the legacy method, which required manually creating configuration files, the new feature significantly improves configuration efficiency, reliability, and observability. The following table provides a detailed comparison:

Item

Legacy method

One-time collection

Configuration method

Create a local_event.json file on each host individually.

Create a configuration in the console or by API and deploy it to a machine group in batches.

File matching

Manually enter file paths and filenames.

Provides a streamlined configuration similar to input_file mode and supports blacklist filtering.

Progress monitoring

No status reporting or local logs.

Uses a checkpoint to track the collection progress, with granularity down to the current offset of each file.

Reliability

Low. It runs as a separate process with no resource controls or checkpoint mechanism.

High. It uses standard pipeline-level resource management, supports flow control to avoid impacting other collection tasks, and enables resumable transfer.

Flexibility

Low. You must use an existing collection configuration.

High. You can customize the collection configuration and even modify it during the task.

Machine group heartbeat is FAIL

  1. Check the user ID. If your server is not an ECS instance, or if your ECS instance and Project belong to different Alibaba Cloud accounts, check whether the correct user ID file exists in the specified directory. If it does not exist, create it manually by using one of the following commands.

    • Linux: Run the cd /etc/ilogtail/users/ && touch <uid> command to create a user ID file.

    • Windows: Go to the C:\LogtailData\users\ directory and create an empty file named <uid>.

  2. Check the machine group identifier. If you used a user-defined identifier when creating the machine group, check whether a file named user_defined_id exists in the specified directory. If it exists, verify that its content matches the user-defined identifier configured for the machine group.

    • Linux:

      # Configure the user-defined identifier. If the directory does not exist, create it manually.
      echo "user-defined-1" > /etc/ilogtail/user_defined_id
    • Windows: Create a file named user_defined_id in the C:\LogtailData directory and write the user-defined identifier to it. If the directory does not exist, create it manually.

  3. If both the user ID and machine group identifier are correct, see Troubleshoot LoongCollector (Logtail) machine group issues for further investigation.

Add a server to a machine group

To add a new server to an existing machine group, such as a newly deployed ECS instance or a self-managed server, follow these steps to associate it with the group and apply its collection configuration.

Important

If a server is added to a machine group more than 5 minutes after the one-time collection configuration is created, it will not receive the configuration. You can view the countdown timer at the top of the collection configuration page for the exact remaining time.

Prerequisites

  • An existing machine group.

  • LoongCollector is installed on the new server.

Procedure

  1. View the machine group identifier of the target machine group.

    1. In the target Project, click imageResource > Machine Groups in the left-side navigation pane.

    2. On the Machine Groups page, click the name of the target machine group.

    3. On the machine group configuration page, view the machine group identifier.

  2. Perform one of the following actions based on the identifier type.

    Note

    A single machine group cannot contain both Linux and Windows servers. Do not configure the same user-defined identifier on both Linux and Windows servers. You can configure multiple user-defined identifiers on a single server by separating them with a line break.

    • Type 1: The machine group identifier is an IP address.

      1. On the server, run the following command to open the app_info.json file and view the value of the ip parameter.

        cat /usr/local/ilogtail/app_info.json
      2. On the configuration page of the target machine group, click Modify and enter the IP address of the server. If you have multiple IP addresses, separate them with line breaks.

      3. After you complete the configuration, click Save and check the heartbeat status. If the status is OK, the system automatically applies the machine group's collection configuration to the server.

        If the heartbeat status is FAIL, see The machine group heartbeat status is FAIL for further troubleshooting.
    • Type 2: The machine group identifier is a user-defined identifier.

      Based on the operating system, write the user-defined identifier string that matches the target machine group to the specified file.

      If the directory does not exist, create it manually. The file path and name are fixed and cannot be customized.
      • Linux: Write the user-defined identifier to the /etc/ilogtail/user_defined_id file.

      • Windows: Write the user-defined identifier to the C:\LogtailData\user_defined_id file.

Appendix: Native processing plugins

Regular expression parsing

Extract log fields using a regular expression and parse the log into key-value pairs. Each field can be independently queried and analyzed.

Example:

Raw log without any processing

Using the regular expression parsing plugin

127.0.0.1 - - [16/Aug/2024:14:37:52 +0800] "GET /wp-admin/admin-ajax.php?action=rest-nonce HTTP/1.1" 200 41 "http://www.example.com/wp-admin/post-new.php?post_type=page" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0"
body_bytes_sent: 41
http_referer: http://www.example.com/wp-admin/post-new.php?post_type=page
http_user_agent: Mozilla/5.0 (Windows NT 10.0; Win64; ×64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0
remote_addr: 127.0.0.1
remote_user: -
request_method: GET
request_protocol: HTTP/1.1
request_uri: /wp-admin/admin-ajax.php?action=rest-nonce
status: 200
time_local: 16/Aug/2024:14:37:52 +0800

Procedure: In the Processor Configurations section of the Logtail Configuration page, click Add Processor and select Native Processor > Data Parsing (Regex Mode):

  • Regular Expression: The expression used to match logs. Generate it automatically or enter it manually:

    • Automatic generation:

      • Click Generate.

      • In the Log Sample, select the log content to extract.

      • Click Generate Regular Expression.

        image

    • Manual entry: Manually Enter Regular Expression based on the log format.

    After configuration, click Validate to test whether the regular expression can correctly parse the log content.

  • Extracted Field: The field name (Key) that corresponds to the extracted log content (Value).

  • For other parameters, see the description of common configuration parameters in Use case 2: Structured logs.

Delimiter parsing

Structure log content using a separator to parse it into multiple key-value pairs. Both single-character and multi-character separators are supported.

Example:

Raw log without any processing

Fields split by the specified character ,

05/May/2025:13:30:28,10.10.*.*,"POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=****************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=******************************** HTTP/1.1",200,18204,aliyun-sdk-java
ip:10.10.*.*
request:POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=****************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=******************************** HTTP/1.1
size:18204
status:200
time:05/May/2025:13:30:28
user_agent:aliyun-sdk-java

Procedure: In the Processor Configurations section of the Logtail Configuration page, click Add Processor and select Native Processor > Data Parsing (Delimiter Mode):

  • Delimiter: Specifies the character used to split log content.

    Example: For a CSV file, select Custom and enter a comma (,).

  • Quote: If a field value contains the separator, you must enclose the field value in quotes to prevent incorrect splitting.

  • Extracted Field: Specify the field name (Key) for each column in the order that they appear. The rules are as follows:

    • Field names can contain only letters, digits, and underscores (_).

    • Must start with a letter or an underscore (_).

    • Maximum length: 128 bytes.

  • For other parameters, see the description of common configuration parameters in Use case 2: Structured logs.

JSON parsing

Structure an Object-type JSON log by parsing it into key-value pairs.

Example:

Raw log without any processing

Automatic extraction of standard JSON key-value pairs

{"url": "POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=U0Ujpek********&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=pD12XYLmGxKQ%2Bmkd6x7hAgQ7b1c%3D HTTP/1.1", "ip": "10.200.98.220", "user-agent": "aliyun-sdk-java", "request": {"status": "200", "latency": "18204"}, "time": "05/Jan/2025:13:30:28"}
ip: 10.200.98.220
request: {"status": "200", "latency" : "18204" }
time: 05/Jan/2025:13:30:28
url: POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=U0Ujpek******&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=pD12XYLmGxKQ%2Bmkd6x7hAgQ7b1c%3D HTTP/1.1
user-agent:aliyun-sdk-java

Procedure: In the Processor Configurations section of the Logtail Configuration page, click Add Processor and select Native Processor > Data Parsing (JSON Mode):

  • Original Field: The field that contains the raw log to be parsed. The default value is content.

  • For other parameters, see the description of common configuration parameters in Use case 2: Structured logs.

Nested JSON parsing

Parse a nested JSON log into key-value pairs by specifying the expansion depth.

Example:

Raw log without any processing

Expansion depth: 0, using expansion depth as a prefix

Expansion depth: 1, using expansion depth as a prefix

{"s_key":{"k1":{"k2":{"k3":{"k4":{"k51":"51","k52":"52"},"k41":"41"}}}}}
0_s_key_k1_k2_k3_k41:41
0_s_key_k1_k2_k3_k4_k51:51
0_s_key_k1_k2_k3_k4_k52:52
1_s_key:{"k1":{"k2":{"k3":{"k4":{"k51":"51","k52":"52"},"k41":"41"}}}}

Procedure: In the Processor Configurations section of the Logtail Configuration page, click Add Processor and select Extended Processor > Expand JSON Field:

  • Original Field: Specifies the name of the source field to expand, such as content.

  • JSON Expansion Depth: The expansion depth of the JSON object, where 0 (the default) indicates full expansion, 1 indicates expansion of the current level, and so on.

  • Character to Concatenate Expanded Keys: The separator for field names when a JSON object is expanded. The default value is an underscore (_).

  • Name Prefix of Expanded Keys: The prefix for field names after JSON expansion.

  • Expand Array: Expands an array into key-value pairs with indexes.

    Example: {"k":["a","b"]} is expanded to {"k[0]":"a","k[1]":"b"}.

    To rename the expanded fields (for example, from prefix_s_key_k1 to new_field_name), add a rename fields plugin afterward to complete the mapping.
  • For other parameters, see the description of common configuration parameters in Use case 2: Structured logs.

JSON array parsing

Use the json_extract function to extract JSON objects from a JSON array.

Example:

Raw log without any processing

Extract JSON array structure

[{"key1":"value1"},{"key2":"value2"}]
json1:{"key1":"value1"}
json2:{"key2":"value2"}

Procedure: In the Processor Configurations section of the Logtail Configuration page, switch the Processing Mode to SPL, configure the SPL Statement, and use the json_extract function to extract JSON objects from the JSON array.

Example: Extract elements from the JSON array in the log field content and store the results in new fields json1 and json2.

* | extend json1 = json_extract(content, '$[0]'), json2 = json_extract(content, '$[1]')

Apache log parsing

Structure the log content into multiple key-value pairs based on the definition in the Apache log configuration file.

Example:

Raw log without any processing

Apache Common Log Format combined parsing

1 192.168.1.10 - - [08/May/2024:15:30:28 +0800] "GET /index.html HTTP/1.1" 200 1234 "https://www.example.com/referrer" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.X.X Safari/537.36"
http_referer:https://www.example.com/referrer
http_user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.X.X Safari/537.36
remote_addr:192.168.1.10
remote_ident:-
remote_user:-
request_method:GET
request_protocol:HTTP/1.1
request_uri:/index.html
response_size_bytes:1234
status:200
time_local:[08/May/2024:15:30:28 +0800]

Procedure: In the Processor Configurations section of the Logtail Configuration page, click Add Processor and select Native Processor > Data Parsing (Apache Mode):

  • The Log Format is combined.

  • The APACHE LogFormat Configuration are automatically populated based on the Log Format.

    Important

    Make sure to verify the auto-filled content to ensure it is exactly the same as the LogFormat defined in your server's Apache configuration file (usually located at /etc/apache2/apache2.conf).

  • For other parameters, see the description of common configuration parameters in Use case 2: Structured logs.

IIS log parsing

Structure the log content into multiple key-value pairs based on the IIS log format definition.

Comparison example:

Raw log

Adaptation for Microsoft IIS server-specific format

#Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
c-ip: cs-username
cs-bytes: sc-substatus
cs-method: cs-method
cs-uri-query: cs-uri-query
cs-uri-stem: cs-uri-stem
cs-username: s-port
date: #Fields:
s-computername: s-sitename
s-ip: s-ip
s-sitename: time
sc-bytes: sc-status
sc-status: c-ip
sc-win32-status: cs (User-Agent)
time: date
time-taken: sc-win32-status

Procedure: In the Processor Configurations section of the Logtail Configuration page, click Add Processor and select Native Processor > Data Parsing (IIS Mode):

  • Log Format: Select the log format for your IIS server.

    • IIS: The log file format for Microsoft Internet Information Services.

    • NCSA: Common Log Format.

    • W3C refers to the W3C Extended Log File Format.

  • IIS Configuration Fields: When you select IIS or NCSA, SLS uses the default IIS configuration fields. When you select W3C, you must set the fields to the value of the logExtFileFlags parameter in your IIS configuration file. For example:

    logExtFileFlags="Date, Time, ClientIP, UserName, SiteName, ComputerName, ServerIP, Method, UriStem, UriQuery, HttpStatus, Win32Status, BytesSent, BytesRecv, TimeTaken, ServerPort, UserAgent, Cookie, Referer, ProtocolVersion, Host, HttpSubStatus"
  • For other parameters, see the description of common configuration parameters in Use case 2: Structured logs.

Data masking

Mask sensitive data in logs.

Example:

Raw log without any processing

Masking result

[{'account':'1812213231432969','password':'04a23f38'}, {'account':'1812213685634','password':'123a'}]
[{'account':'1812213231432969','password':'********'}, {'account':'1812213685634','password':'********'}]

Procedure: In the Processor Configurations section of the Logtail Configuration page, click Add Processor and select Native Processor > Data Masking:

  • Original Field: The field that contains the log content before parsing.

  • Data Masking Method:

    • const: Replaces sensitive content with a constant string.

    • md5: Replaces sensitive content with its MD5 hash.

  • Replacement String: If Data Masking Method is set to const, enter a string to replace the sensitive content.

  • Content Expression that Precedes Replaced Content: The expression used to find sensitive content, which is configured using RE2 syntax.

  • Content Expression to Match Replaced Content: The regular expression used to match sensitive content. The expression must be written in RE2 syntax.

Time parsing

Parse the time field in the log and set the parsing result as the log's __time__ field.

Example:

Raw log without any processing

Time parsing

{"level":"INFO","timestamp":"2025-09-23T19:11:47+0800","cluster":"yilu-cluster-0728","message":"User logged in successfully","userId":"user-123"}

image

Procedure: In the Processor Configurations section of the Logtail Configuration page, click Add Processor and select Native Processor > Time Parsing:

  • Original Field: The field that contains the log content before parsing.

  • Time Format: Set the time format that corresponds to the timestamps in the log.

  • Time Zone: Select the time zone for the log time field. By default, this is the time zone of the environment where the LoongCollector (Logtail) process is running.