This topic describes how Logtail collects logs. The log collection process includes log monitoring, reading, processing, filtering, aggregating, and uploading.

Monitor log files

After you install Logtail on servers and create Logtail configurations for log collection in the Log Service console, the configurations are synchronized to the servers in real time. Logtail monitors log files of the servers based on the configurations. Logtail scans log directories and files based on the log path and maximum depth of monitored directories that are specified in the configurations.

If the log files of the servers in a machine group are not updated after Logtail configurations for log collection are applied to the machine group, the log files are considered historical log files. Logtail does not collect historical log files. If log files are updated, Logtail reads the files and collect the log files to Log Service. For more information about how to collect historical log files, see Import historical logs.

Logtail registers event listeners to monitor directories from which log files are collected. The event listeners pool the log files under the directories on a regular basis to ensure the timeliness and stability of log collection. In Linux-based servers, Inotify is used to monitor the directories and pool log files.

Read log files

After Logtail detects that log files are updated, it reads the log files.
  • When Logtail reads a log file for the first time, it checks the size of the file.
    • If the file size is less than 1 MB, Logtail reads the file from the beginning of the file.
    • If the file size is greater than 1 MB, Logtail reads from the last 1 MB of data in the file.
  • If Logtail has read the log file, it reads the file from the last checkpoint.
  • Logtail can read up to 512 KB of data at a time. Make sure that the size of every log entry in a log file is limited to 512 KB.
Note If you change the system time on the server, you must restart Logtail. Otherwise, the log time is incorrect and logs are dropped.

Process log entries

When Logtail reads a log file, it splits a log entry in the file into multiple lines, parses the log entry, and sets the time field for the log entry.
  • Split each log entry into multiple lines

    If you specify a regular expression to match the first line of a log entry, Logtail splits the log entry into multiple lines based on the regular expression. If such a regular expression is not specified, a single log line is processed as a log entry.

  • Parse log entries
    Logtail parses every log entry based on the collection mode that is specified in the Logtail configurations for log collection.
    Note If you configure complex regular expressions, Logtail may consume excessive CPU resources. Therefore, we recommend that you configure efficient regular expressions.
    If a log entry fails to be parsed, Logtail handles the log entry based on whether the Drop Failed to Parse Logs switch is turned on in the Logtail configurations for log collection:
    • If the Drop Failed to Parse Logs switch is turned on, Logtail drops the log entry and reports an error.
    • If the Drop Failed to Parse Logs switch is turned off, Logtail uploads the log entry. The key of the log entry is set to raw_log and the value is set to the log content.
  • Set the time field for a log entry
    • If you do not set the time field for a log entry, the log time is the time when the log entry is parsed.
    • If you set the time field for a log entry:
      • If the difference between the log generation time and the current time is less than 12 hours, the log time is extracted from the parsed log fields.
      • If the difference between the log time and the current time is greater than 12 hours, the log entry is dropped and an error is reported

Filter log entries

After Logtail process log entries, it filters the log entries based on the specified filter conditions.

  • If you do not specify filter conditions in the Filter Configuration field, the log entries are not filtered.
  • If you specify filter conditions in the Filter Configuration field, the fields in every log entry are traversed.

    Only the log entries that meet the filter conditions are collected.

Aggregate log entries

To reduce the number of network requests, Logtail caches the processed and filtered log entries for a period of time. Then, Logtail aggregates these log entries and send the log entries to Log Service.

If one of the following conditions is met during the caching process, log entries are aggregated and sent to Log Service:
  • The aggregation duration exceeds 3 seconds.
  • The number of aggregated log entries exceeds 4,096.
  • The total size of aggregated log entries exceeds 512 KB.

Send log entries

Logtail sends the aggregated log entries to Log Service. You can set the max_bytes_per_sec and send_request_concurrency parameters in the Logtail startup configuration file to specify the maximum transmission rate of log data and concurrent requests. For more information, see Configure Logtail startup parameters.

If a log entry fails to be sent, Logtail retries or stops sending the log entry based on the error code.
Error code Description Handling method
401 Logtail is not authorized to collect data. Logtail drops the log data.
404 The project or Logstore that is specified in Logtail configurations for log collection does not exist. Logtail drops the log data.
403 The shard quota is exhausted. After 3 seconds, Logtail tries again.
500 A server exception occurs. After 3 seconds, Logtail tries again.