Logtail text file collection reference

Last Updated: Oct 21, 2017

Process of text-file log collection

Logtail collects text-file logs based on the following process.

Specify the file path > specify the procedure for separating log lines > extract log fields > specify the log time

Log line separation method

An access log (for example, Nginx access log) occupies a line. Individual logs are separated by linefeeds. Two access logs are shown as follows.

  1. 10.1.1.1 - - [13/Mar/2016:10:00:10 +0800] "GET / HTTP/1.1" 0.011 180 404 570 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; 360se)"
  2. 10.1.1.1 - - [13/Mar/2016:10:00:11 +0800] "GET / HTTP/1.1" 0.011 180 404 570 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; 360se)"

For Java applications, a program log spans several lines. The beginning of a log is used to distinguish the beginning of the line. A Java program log is shown as follows.

  1. [2016-03-18T14:16:16,000] [INFO] [SessionTracker] [SessionTrackerImpl.java:148] Expiring sessions
  2. 0x152436b9a12aecf, 50000
  3. 0x152436b9a12aed2, 50000
  4. 0x152436b9a12aed1, 50000
  5. 0x152436b9a12aed0, 50000

The beginning of the Java log is a fixed time format. The regular expression is \[\d+-\d+-\w+:\d+:\d+,\d+]\s.*.

easy

Log field extraction

According to the Log Service data models, a log contains one or more key–value pairs. To extract specified fields for analysis, you need to set a regular expression. If log content is not processed, the log can be considered as a key–value pair. For the preceding access log:

  • When fields are extracted

    1. Regular expression:(\S+)\s-\s-\s\[(\S+)\s[^]]+]\s"(\w+).*
    2. Extracted content: 1) 10.1.1.1; 2) 13/Mar/2016:10:00; 3) GET
  • When fields are not extracted

    1. Regular expression:(.*)
    2. Extracted content:1) 10.1.1.1 - - [13/Mar/2016:10:00:10 +0800] "GET / HTTP/1.1" 0.011 180 404 570 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; 360se)"

Log time

According to the Log Service data models, each log must have a time field in UNIX timestamp format. The log time can be set to the system time at which Logtail captures the log or time in the log content. For the preceding access log:

  • Time in the log content

    1. Time:13/Mar/2016:10:00:10
    2. Time expression:%d/%b/%Y:%H:%M:%S
  • Log capture time

    1. Time: Timestamp when the log is captured
Thank you! We've received your feedback.