This topic describes how Logtail collects logs. The log collection process consists of the following steps: monitor log files, read log files, process logs, filter logs, aggregate logs, and sends logs.
Logtail performs the following steps to collect log data:
Monitor log files
After you install Logtail on servers and create a Logtail configuration in the Log Service console, the configuration is synchronized to the servers in real time. Logtail monitors log files of the servers based on the configuration. Logtail scans log directories and files based on the log file path and the maximum directory depth that you specify for monitoring in the configuration.
If the log files of the servers in a machine group are not updated after the Logtail configuration is applied to the machine group, the log files are considered historical log files. Logtail does not collect historical log files. If log files are updated, Logtail reads and collects the files, and then sends the log files to Log Service. For more information about how to collect historical log files, see Import historical logs.
Logtail registers event listeners to monitor directories from which log files are collected. The event listeners poll the log files in the directories on a regular basis. This ensures that logs are collected at the earliest opportunity in a stable manner. For Linux servers, Inotify is used to monitor directories and poll log files.
Read log files
- The first time Logtail reads a log file, Logtail can read up to 1,024 KB of data in
the log file by default.
Note Log Service allows you to specify the data size that Logtail can read from a log file the first time Logtail reads the file.
- If the file size is less than 1,024 KB, Logtail reads data from the beginning of the file.
- If the file size is greater than 1,024 KB, Logtail reads the last 1,024 KB of data in the file.
- If a log file is read before, Logtail reads the file from the previous checkpoint.
- Logtail can read up to 512 KB of data at a time. Make sure that the size of each log in a log file does not exceed 512 KB.
- Split a log into multiple lines
If you specify a regular expression to match the first line of a log, Logtail splits the log into multiple lines based on the regular expression. If you do not specify a regular expression, a single log line is processed as a log.
- Parse logs
Logtail parses each log based on the collection mode that you specify in the Logtail configuration.Note If you specify complex regular expressions, Logtail may consume an excessive amount of CPU resources. We recommend that you specify regular expressions that allow Logtail to parse logs in an efficient manner.If Logtail fails to parse a log, Logtail handles the failure based on the setting of the Drop Failed to Parse Logs parameter in the Logtail configuration.
- If you turn on Drop Failed to Parse Logs, Logtail drops the log and reports an error.
- If you turn off Drop Failed to Parse Logs, Logtail uploads the log. The key of the log is set to raw_log and the value is set to the log content.
- Configure the time field for a log
- If you do not configure the time field for a log, the log time is the time when the log is parsed.
- If you configure the time field for a log, the manner in which the log is processed
varies in the following scenarios:
- If the difference between the time when the log is generated and the current time is within 12 hours, the log time is extracted from the parsed log fields.
- If the difference between the time when the log is generated and the current time is greater than 12 hours, the log is dropped and an error is reported.
After logs are processed, Logtail filters the logs based on the specified filter conditions.
- If you do not specify filter conditions in the Filter Configuration field, the logs are not filtered.
- If you specify filter conditions in the Filter Configuration field, the fields in each log are traversed.
Logtail collects only the logs that meet the filter conditions.
To reduce the number of network requests, Logtail caches the processed and filtered logs for a specified period of time. Then, Logtail aggregates the logs and sends the logs to Log Service. If one of the following conditions is met when data is cached, Logtail sends aggregated logs to Log Service.
- The aggregation duration exceeds 3 seconds.
- The number of aggregated logs exceeds 4,096.
- The total size of aggregated logs exceeds 512 KB.
Logtail sends aggregated logs to Log Service. If a log fails to be sent, Logtail retries or no longer sends the log based on the HTTP status code.
|HTTP status code||Description||Handling method|
|401||Logtail is not authorized to collect data.||Logtail drops the log packets.|
|404||The project or Logstore that is specified in the Logtail configuration does not exist.||Logtail drops the log packets.|
|403||The shard quota is exhausted.||Logtail tries again 3 seconds later.|
|500||A server exception occurs.||Logtail tries again 3 seconds later.|