Logtail performs the following steps to collect server logs.
- For more information about the Logtail collection principles, visit Yunqi community.
- After the Logtail collection configuration is applied to a server group, log files that do not contain modification events on the servers in the server group are considered historical files. Logtail does not collect historical files in normal running mode. If you want to collect historical logs, see Import historical log files.
After Logtail is installed on a server and the Logtail collection configuration is created based on a specific data source, the Logtail collection configuration is delivered from the server to Logtail in real time. Logtail then starts to monitor files based on the collection configuration.
- Logtail scans log directories and files that comply with the specified file naming
rules based on the configured log path and maximum monitoring directory depth.
To ensure log collection efficiency and stability, Logtail registers event monitoring for the collection directory and performs polling periodically. The directory is Inotify in Linux and ReadDirectoryChangesW in Windows.
- If Logtail finds that the log files compliant with the file naming rules in the specified directory are not modified after the configuration is applied, Logtail does not collect these files. If modification events are generated for log files, Logtail triggers the collection process and reads these files.
- Logtail checks the size of a log file when it reads the file for the first time.
- If the file size is less than 1 MB, Logtail reads from the beginning of the file.
- If the file size is greater than 1 MB, Logtail reads from the last 1 MB of data in the file.
- If Logtail rereads a file, it continues to read it from the last checkpoint.
- Logtail can read up to 512 KB of data at a time. Therefore, you must limit the log size to 512 KB.
- Divide a log into lines:
If the Logtail collection configuration specifies a regular expression at the beginning of the line, the log data block read by Logtail at a time is divided into multiple lines based on the beginning configured for the line. If the beginning of the line is not set, each data block is processed as a log.
- Parse a log:
Logtail parses each log based on the collection configuration, such as regular expressions, delimiters, and JSON.Note A complicated regular expression may lead to high CPU utilization. Therefore, we recommend that you use an efficient regular expression.
- Handle failed parsing:
Logtail determines the best method to use to resolve failed parsing based on whether the discard logs that fail to be parsed feature is enabled in the collection configuration.
- If this feature is enabled, Logtail discards the logs that fail to be parsed and reports a parsing error.
- If this feature is disabled, Logtail uploads the raw logs that fail to be parsed with the key set to raw_log and the value set to the log content.
- Set the time field of a log:
- If the time field of the log is not specified, the log time is the current parsing time.
- If the time field of the log is specified:
- The log time is extracted from the parsed log fields if the difference between the recorded time of the log and the current time is less than 12 hours.
- The log is discarded and an error is reported if the difference between the recorded time of the log and the current time is greater than 12 hours.
After processing logs, Logtail filters them based on Filter Configuration specified in the collection configuration.
- If Filter Configuration is specified in the collection configuration, Logtail skips to the next step without filtering the logs.
- If Filter Configuration is specified in the collection configuration, Logtail traverses and verifies all
the fields of each log.
- Logtail collects the logs that contain all the fields configured by the filter and comply with the settings.
- Logtail does not collect logs that do not comply with Filter Configuration.
After logs are filtered based on Filter Configuration, Logtail sends the logs that comply with the configuration to Log Service. To reduce the number of network requests for sending data, Logtail caches the processed and filtered logs for a period of time. Then, it aggregates and packages these logs before it sends them to Log Service.
- Log aggregation lasts more than three seconds.
- The number of aggregated logs exceeds 4,096.
- The total size of aggregated logs exceeds 512 KB.
Logtail aggregates and sends the collected logs to Log Service. You can configure
send_request_concurrency parameters in the startup parameter settings to adjust the log data sending rate and the maximum concurrency. Logtail ensures
the sending rate and concurrency not greater than the configured upper thresholds.
|401||Logtail is not authorized to collect data.||Logtail discards log packets.|
|404||The specified project or Logstore does not exist in the Logtail collection configuration.||Logtail discards log packets.|
|403||The shard quota is exhausted.||Logtail waits for three seconds and retries.|
|500||A server exception has occurred.||Logtail waits for three seconds and retries.|
|Network timeout||A network connection error has occurred.||Logtail waits for three seconds and retries.|