This topic describes how Logtail collects logs. The log collection process consists of the following steps: monitor log files, read log files, process logs, filter logs, aggregate logs, and sends logs.

Procedure

Logtail performs the following steps to collect log data:

  1. Monitor log files
  2. Read log files
  3. Process logs
  4. Filter logs
  5. Aggregate logs
  6. Send logs

Monitor log files

After you install Logtail on servers and create a Logtail configuration in the Log Service console, the configuration is synchronized to the servers in real time. Logtail monitors log files of the servers based on the configuration. Logtail scans log directories and files based on the log file path and the maximum directory depth that you specify for monitoring in the configuration.

If the log files of the servers in a machine group are not updated after the Logtail configuration is applied to the machine group, the log files are considered historical log files. Logtail does not collect historical log files. If log files are updated, Logtail reads and collects the files, and then sends the log files to Log Service. For more information about how to collect historical log files, see Import historical logs.

Logtail registers event listeners to monitor directories from which log files are collected. The event listeners poll the log files in the directories on a regular basis. This ensures that logs are collected at the earliest opportunity in a stable manner. For Linux servers, Inotify is used to monitor directories and poll log files.

Read log files

After Logtail detects updated log files, Logtail reads the log files.
  • The first time Logtail reads a log file, Logtail can read up to 1,024 KB of data in the log file by default.
    • If the file size is less than 1,024 KB, Logtail reads data from the beginning of the file.
    • If the file size is greater than 1,024 KB, Logtail reads the last 1,024 KB of data in the file.
    Note Log Service allows you to specify the data size that Logtail can read from a log file the first time Logtail reads the file.
    • Console mode: Modify the First Collection Size parameter in the Advanced Options section on the Logtail Config page. For more information, see Advanced Options.
    • API mode: Modify the tail_size_kb parameter in the Logtail configuration. For more information, see advanced.
  • If a log file is read before, Logtail reads the file from the previous checkpoint.
  • Logtail can read up to 512 KB of data at a time. Make sure that the size of each log in a log file does not exceed 512 KB.
Note If you change the system time on the server, you must restart Logtail. Otherwise, the log time becomes incorrect and logs are dropped.

Process logs

When Logtail reads a log file, Logtail splits each log in the file into multiple lines, parses the log, and then configures the time field for the log.
  • Split a log into multiple lines

    If you specify a regular expression to match the first line of a log, Logtail splits the log into multiple lines based on the regular expression. If you do not specify a regular expression, a single log line is processed as a log.

  • Parse logs
    Logtail parses each log based on the collection mode that you specify in the Logtail configuration.
    Note If you specify complex regular expressions, Logtail may consume an excessive amount of CPU resources. We recommend that you specify regular expressions that allow Logtail to parse logs in an efficient manner.
    If Logtail fails to parse a log, Logtail handles the failure based on the setting of the Drop Failed to Parse Logs parameter in the Logtail configuration.
    • If you turn on Drop Failed to Parse Logs, Logtail drops the log and reports an error.
    • If you turn off Drop Failed to Parse Logs, Logtail uploads the log. The key of the log is set to raw_log and the value is set to the log content.
  • Configure the time field for a log
    • If you do not configure the time field for a log, the log time is the time when the log is parsed.
    • If you configure the time field for a log, the manner in which the log is processed varies in the following scenarios:
      • If the difference between the time when the log is generated and the current time is within 12 hours, the log time is extracted from the parsed log fields.
      • If the difference between the time when the log is generated and the current time is greater than 12 hours, the log is dropped and an error is reported.

Filter logs

After logs are processed, Logtail filters the logs based on the specified filter conditions.

  • If you do not specify filter conditions in the Filter Configuration field, the logs are not filtered.
  • If you specify filter conditions in the Filter Configuration field, the fields in each log are traversed.

    Logtail collects only the logs that meet the filter conditions.

Aggregate logs

To reduce the number of network requests, Logtail caches the processed and filtered logs for a specified period of time. Then, Logtail aggregates the logs and sends the logs to Log Service. If one of the following conditions is met when data is cached, Logtail sends aggregated logs to Log Service.

  • The aggregation duration exceeds 3 seconds.
  • The number of aggregated logs exceeds 4,096.
  • The total size of aggregated logs exceeds 512 KB.

Send logs

Logtail sends aggregated logs to Log Service. If a log fails to be sent, Logtail retries or no longer sends the log based on the HTTP status code.

HTTP status codeDescriptionHandling method
401The current account does not have the permissions to collect data. You must grant the account the permissions to access data. For more information, see Configure the permission assistant feature. Logtail drops the log packets.
404The project or Logstore that is specified in the Logtail configuration does not exist. Logtail drops the log packets.
403The shard quota is exhausted. Logtail tries again 3 seconds later.
500A server exception occurs. Logtail tries again 3 seconds later.
Note If you want to change the data transmission rate and the maximum number of concurrent connections, you can modify the max_bytes_per_sec and send_request_concurrency parameters in the Logtail startup configuration file. For more information, see Configure the startup parameters of Logtail.