This topic describes how Logtail collects logs. The log collection process includes monitoring, reading, processing, filtering, aggregating, and uploading.

Procedure

Logtail uses the following procedure to collect data:

  1. Monitor log files
  2. Read log files
  3. Process log entries
  4. Filter log entries
  5. Aggregate log entries
  6. Send logs

Monitor log files

After you install Logtail on servers and create Logtail configurations for log collection in the Log Service console, the configurations are synchronized to the servers in real time. Logtail monitors log files of the servers based on the configurations. Logtail scans log directories and files based on the log path and maximum depth of monitored directories that are specified in the configurations.

If the log files of the servers in a machine group are not updated after Logtail configurations for log collection are applied to the machine group, the log files are considered historical log files. Logtail does not collect historical log files. If log files are updated, Logtail reads and collects the files, and then sends the log files to Log Service. For more information about how to collect historical log files, see Import historical logs.

Logtail registers event listeners to monitor directories from which log files are collected. The event listeners pool the log files in the directories on a regular basis to ensure the timeliness and stability of log collection. For Linux-based servers, Inotify is used to monitor the directories and pool log files.

Read log files

After Logtail detects that log files are updated, Logtail reads the log files.
  • The first time Logtail reads a log file, Logtail checks the size of the file.
    • If the file size is less than 1 MB, Logtail reads the file from the beginning of the file.
    • If the file size is greater than 1 MB, Logtail reads the file from the last 1 MB of data in the file.
  • If Logtail read the log file, Logtail reads the file from the previous checkpoint.
  • Logtail can read up to 512 KB of data at a time. Make sure that the size of every log entry in a log file does not exceed 512 KB.
Note If you change the system time on the server, you must restart Logtail. Otherwise, the log time becomes invalid and logs are dropped.

Process log entries

When Logtail reads a log file, Logtail splits a log entry in the file into multiple lines, parses the log entry, and then specifies the time field for the log entry.
  • Split a log entry into multiple lines

    If you specify a regular expression to match the first line of a log entry, Logtail splits the log entry into multiple lines based on the regular expression. If the regular expression is not specified, a single log line is processed as a log entry.

  • Parse log entries
    Logtail parses each log entry based on the collection mode that is specified in the Logtail configurations for log collection.
    Note If you configure complex regular expressions, Logtail may consume an excessive number of CPU resources. We recommend that you configure efficient regular expressions.
    If a log entry fails to be parsed, Logtail fixes the issue based on the setting of the Drop Failed to Parse Logs parameter in the Logtail configurations for log collection.
    • If the Drop Failed to Parse Logs switch is turned on, Logtail drops the log entry and reports an error.
    • If the Drop Failed to Parse Logs switch is turned off, Logtail uploads the log entry. The key of the log entry is set to raw_log and the value is set to the log content.
  • Specify the time field for a log entry
    • If you do not specify the time field for a log entry, the log time is the time when the log entry is parsed.
    • If you specify the time field for a log entry:
      • If the difference between the time when the log is generated and the current time is less than 12 hours, the log time is extracted from the parsed log fields.
      • If the difference between the time when the log is generated and the current time is greater than 12 hours, the log entry is dropped and an error is reported.

Filter log entries

After log entries are processed, Logtail filters the log entries based on the specified filter conditions.

  • If you do not specify filter conditions in the Filter Configuration field, the log entries are not filtered.
  • If you specify filter conditions in the Filter Configuration field, the fields in each log entry are traversed.

    Only the log entries that meet the filter conditions are collected.

Aggregate log entries

To reduce the number of network requests, Logtail caches the processed and filtered log entries for a period of time. Then, Logtail aggregates the log entries and send the log entries to Log Service. If one of the following conditions is met when data is cached, Logtail sends aggregated logs to Log Service.

  • The aggregation duration exceeds 3 seconds.
  • The number of aggregated logs exceeds 4,096.
  • The total size of aggregated logs exceeds 512 KB.

Send logs

Logtail sends the aggregated log entries to Log Service. If a log entry fails to be sent, Logtail retries or stops sending the log entry based on the error code.

Error code Description Handling method
401 Logtail is not authorized to collect data. Logtail drops the log packets.
404 The project or Logstore that is specified in the Logtail configurations for log collection does not exist. Logtail drops the log packets.
403 The shard quota is exhausted. After 3 seconds, Logtail tries again.
500 A server exception occurs. After 3 seconds, Logtail tries again.
Note If you want to modify the data sending rate and the maximum number of concurrent connections, you can modify the max_bytes_per_sec and send_request_concurrency parameters in the Logtail startup configuration file. For more information, see Set the startup parameters of Logtail.