This topic describes how Logtail collects logs. The log collection process includes monitoring, reading, processing, filtering, aggregating, and uploading.
Logtail uses the following procedure to collect data:
Monitor log files
After you install Logtail on servers and create Logtail configurations for log collection in the Log Service console, the configurations are synchronized to the servers in real time. Logtail monitors log files of the servers based on the configurations. Logtail scans log directories and files based on the log path and maximum depth of monitored directories that are specified in the configurations.
If the log files of the servers in a machine group are not updated after Logtail configurations for log collection are applied to the machine group, the log files are considered historical log files. Logtail does not collect historical log files. If log files are updated, Logtail reads and collects the files, and then sends the log files to Log Service. For more information about how to collect historical log files, see Import historical logs.
Logtail registers event listeners to monitor directories from which log files are collected. The event listeners pool the log files in the directories on a regular basis to ensure the timeliness and stability of log collection. For Linux-based servers, Inotify is used to monitor the directories and pool log files.
Read log files
- The first time Logtail reads a log file, Logtail checks the size of the file.
- If the file size is less than 1 MB, Logtail reads the file from the beginning of the file.
- If the file size is greater than 1 MB, Logtail reads the file from the last 1 MB of data in the file.
- If Logtail read the log file, Logtail reads the file from the previous checkpoint.
- Logtail can read up to 512 KB of data at a time. Make sure that the size of every log entry in a log file does not exceed 512 KB.
Process log entries
- Split a log entry into multiple lines
If you specify a regular expression to match the first line of a log entry, Logtail splits the log entry into multiple lines based on the regular expression. If the regular expression is not specified, a single log line is processed as a log entry.
- Parse log entries
Logtail parses each log entry based on the collection mode that is specified in the Logtail configurations for log collection.Note If you configure complex regular expressions, Logtail may consume an excessive number of CPU resources. We recommend that you configure efficient regular expressions.If a log entry fails to be parsed, Logtail fixes the issue based on the setting of the Drop Failed to Parse Logs parameter in the Logtail configurations for log collection.
- If the Drop Failed to Parse Logs switch is turned on, Logtail drops the log entry and reports an error.
- If the Drop Failed to Parse Logs switch is turned off, Logtail uploads the log entry. The key of the log entry is set to raw_log and the value is set to the log content.
- Specify the time field for a log entry
- If you do not specify the time field for a log entry, the log time is the time when the log entry is parsed.
- If you specify the time field for a log entry:
- If the difference between the time when the log is generated and the current time is less than 12 hours, the log time is extracted from the parsed log fields.
- If the difference between the time when the log is generated and the current time is greater than 12 hours, the log entry is dropped and an error is reported.
Filter log entries
After log entries are processed, Logtail filters the log entries based on the specified filter conditions.
- If you do not specify filter conditions in the Filter Configuration field, the log entries are not filtered.
- If you specify filter conditions in the Filter Configuration field, the fields in each log entry are traversed.
Only the log entries that meet the filter conditions are collected.
Aggregate log entries
To reduce the number of network requests, Logtail caches the processed and filtered log entries for a period of time. Then, Logtail aggregates the log entries and send the log entries to Log Service. If one of the following conditions is met when data is cached, Logtail sends aggregated logs to Log Service.
- The aggregation duration exceeds 3 seconds.
- The number of aggregated logs exceeds 4,096.
- The total size of aggregated logs exceeds 512 KB.
Logtail sends the aggregated log entries to Log Service. If a log entry fails to be sent, Logtail retries or stops sending the log entry based on the error code.
|Error code||Description||Handling method|
|401||Logtail is not authorized to collect data.||Logtail drops the log packets.|
|404||The project or Logstore that is specified in the Logtail configurations for log collection does not exist.||Logtail drops the log packets.|
|403||The shard quota is exhausted.||After 3 seconds, Logtail tries again.|
|500||A server exception occurs.||After 3 seconds, Logtail tries again.|