This topic describes how to use the Grok method in LOG domain specific language (DSL) to efficiently and quickly parse syslog messages in different formats.

Overview

Syslog is an industry standard protocol that can be used to record device logs. Syslog is widely used for message logging in UNIX-like operating systems. Syslog messages can be recorded in local files or sent to message receivers over the network, usually syslog servers. A server can centrally store syslog messages from multiple devices or parse the content of these messages. Syslog is commonly used in network management tools, security management systems, and log audit systems.

Problems

For a long time, syslog messages are random in format due to a lack of syslog standards. In some cases, syslog messages may fail to be parsed as they have no format. Instead, they are considered as character strings. Therefore, it is important to correctly parse syslog messages in different formats.

Syslog protocol standards

Two common syslog protocols exist in the industry: RFC 5424 issued in 2009 and RFC 3164 issued in 2001. The following section describes the differences between the two protocols. Understanding their differences will help you better use the Grok method to parse syslog messages.
  • RFC 5424
    The RFC 5424 protocol contains the following fields. For more information, see RFC 5424 - The Syslog Protocol.
    PRI VERSION SP TIMESTAMP SP HOSTNAME SP APP-NAME SP PROCID SP MSGID
    The following examples describe these fields.
    """
    Example1:
    <34>1 2019-07-11T22:14:15.003Z aliyun.example.com ali - ID47 - BOM'su root' failed for lonvick on /dev/pts/8
    """
    PRI -- 34
    VERSION -- 1
    TIMESTAMP -- 2019-07-11T22:14:15.003Z
    HOSTNAME -- aliyun.example.com
    APP-NAME -- ali
    PROCID -- None
    MSGID -- ID47
    MESSAGE -- 'su root' failed for lonvick on /dev/pts/8
    """
    Example2:
    <165>1 2019-07-11T22:14:15.000003-07:00 192.0.2.1 myproc 8710 - - %% It's time to make the do-nuts.
    """
    PRI -- 165
    VERSION -- 1
    TIMESTAMP -- 2019-07-11T05:14:15.000003-07:00
    HOSTNAME -- 192.0.2.1
    APP-NAME -- myproc
    PROCID -- 8710
    STRUCTURED-DATA -- "-"
    MSGID -- "-"
    MESSAGE -- "%% It's time to make the do-nuts."
    """
    Example3: - with STRUCTURED-DATA
    <165>1 2019-07-11T22:14:15.003Z aliyun.example.com
               evntslog - ID47 [exampleSDID@32473 iut="3" eventSource=
               "Application" eventID="1011"] BOMAn application
               event log entry...
    """
    PRI -- 165
    VERSION -- 1
    TIMESTAMP -- 2019-07-11T22:14:15.003Z
    HOSTNAME -- aliyun.example.com
    APP-NAME -- evntslog
    PROCID -- "-"
    MSGID -- ID47
    STRUCTURED-DATA -- [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"]
    MESSAGE -- An application event log entry...
  • RFC 3164
    The RFC 3164 protocol contains the following fields. For more information, see RFC 3164 - The BSD Syslog Protocol.
    PRI HEADER[TIME HOSTNAME] MSG
    The following examples describe these fields.
    """
    <30>Oct 9 22:33:20 hlfedora auditd[1787]: The audit daemon is exiting.
    """
    PRI -- 30
    HEADER
    - TIME -- Oct 9 22:33:20
    - HOSTNAME -- hlfedora
    MSG
    - TAG -- auditd[1787]
    - Content --The audit daemon is exiting.

Solution

Parse syslog messages in common formats by using the Grok method

The following section describes how to use the Grok method to parse syslog messages in common formats. Syslog messages are collected in a Logstore in text format. For more information about Grok rules, see Grok patterns.
  • TraditionalFormat
    • Log content
      May  5 10:20:57 iZbp1a65x3r1vhpe94fi2qZ systemd: Started System Logging Service.
    • LOG DSL rules
      """
      The log before processing is as follows:
        receive_time: 1558663265
        __topic__:
        content: May  5 10:20:57 iZbp1a65x3r1vhpe94fi2qZ systemd: Started System Logging Service.
      The log after processing is as follows:
        receive_time: 1558663265
        __topic__:
        content: May  5 10:20:57 iZbp1a65x3r1vhpe94fi2qZ systemd: Started System Logging Service.
        timestamp: May  5 10:20:57
        logsource: iZbp1a65x3r1vhpe94fi2qZ
        program: systemd
        message: Started System Logging Service.
      """
      e_regex('content', grok('%{SYSLOGBASE} %{GREEDYDATA:message}'))
  • FileFormat
    • Log content
      2019-05-06T09:26:07.874593+08:00 iZbp1a65x3r1vhpe94fi2qZ root: 834753
    • LOG DSL rules
      """
      The log before processing is as follows:
        receive_time: 1558663265
        __topic__:
        content: 2019-05-06T09:26:07.874593+08:00 iZbp1a65x3r1vhpe94fi2qZ root: 834753
      The log after processing is as follows:
        receive_time: 1558663265
        __topic__:
        content: 2019-05-06T09:26:07.874593+08:00 iZbp1a65x3r1vhpe94fi2qZ root: 834753
        timestamp: 2019-05-06T09:26:07.874593+08:00
        hostname: iZbp1a65x3r1vhpe94fi2qZ
        program: root
        message: 834753
      """
      e_regex('content',grok('%{TIMESTAMP_ISO8601:timestamp} %{SYSLOGHOST:hostname} %{SYSLOGPROG} %{GREEDYDATA:message}'))
  • RSYSLOG_SyslogProtocol23Format
    • Log content
      <13>1 2019-05-06T11:50:16.015554+08:00 iZbp1a65x3r1vhpe94fi2qZ root - - - twish
    • LOG DSL rules
      """
      The log before processing is as follows:
        receive_time: 1558663265
        __topic__:
        content: <13>1 2019-05-06T11:50:16.015554+08:00 iZbp1a65x3r1vhpe94fi2qZ root - - - twish
      The log after processing is as follows:
        receive_time: 1558663265
        __topic__:
        content: <13>1 2019-05-06T11:50:16.015554+08:00 iZbp1a65x3r1vhpe94fi2qZ root - - - twish
        priority: 13
        version: 1
        timestamp: 2019-05-06T11:50:16.015554+08:00
        hostname: iZbp1a65x3r1vhpe94fi2qZ
        program: root
        message: twish
      """
      e_regex('content',grok('%{POSINT:priority}>%{NUMBER:version} %{TIMESTAMP_ISO8601:timestamp} %{syslogHOST:hostname} %{PROG:program} - - - %{GREEDYDATA:message}'))
  • RSYSLOG_DebugFormat
    • Log content
      2019-05-06T14:29:37.558854+08:00 iZbp1a65x3r1vhpe94fi2qZ root: environment
    • LOG DSL rules
      """
      The log before processing is as follows:
        receive_time: 1558663265
        __topic__:
        content: 2019-05-06T14:29:37.558854+08:00 iZbp1a65x3r1vhpe94fi2qZ root: environment
      The log after processing is as follows:
        receive_time: 1558663265
        __topic__:
        content: 2019-05-06T14:29:37.558854+08:00 iZbp1a65x3r1vhpe94fi2qZ root: environment
        timestamp: 2019-05-06T14:29:37.558854+08:00 
        hostname: iZbp1a65x3r1vhpe94fi2qZ
        program: root
        message: environment
      """
      e_regex('content',grok('%{TIMESTAMP_ISO8601:timestamp} %{SYSLOGHOST:hostname} %{SYSLOGPROG} %{GREEDYDATA:message}'))
The preceding examples describe how to parse common log formats of syslog configuration files. The following section describes two FluentSyslog formats.
  • FluentRFC5424
    • Log content
      <16>1 2013-02-28T12:00:00.003Z 192.168.0.1 fluentd 11111 ID24224 [exampleSDID@20224 iut='3' eventSource='Application' eventID='11211] Hi, from Fluentd!
    • LOG DSL rules
      """
      The log before processing is as follows:
        receive_time: 1558663265
        __topic__:
        content: <16>1 2019-02-28T12:00:00.003Z 192.168.0.1 aliyun 11111 ID24224 [exampleSDID@20224 iut='3' eventSource='Application' eventID='11211] Hi, from Fluentd!
      The log after processing is as follows:
        receive_time: 1558663265
        __topic__:
        content: <16>1 2019-02-28T12:00:00.003Z 192.168.0.1 aliyun 11111 ID24224 [exampleSDID@20224 iut='3' eventSource='Application' eventID='11211] Hi, from aliyun!
        priority: 16
        version: 1
        timestamp: 2019-02-28T12:00:00.003Z
        hostname: 192.168.0.1
        ident: aliyun
        pid: 1111
        msgid: ID24224
        extradata: [exampleSDID@20224 iut='3' eventSource='Application' eventID='11211]
        message: Hi, from aliyun!
      """
      e_regex('content',grok('%{POSINT:priority}>%{NUMBER:version} %{TIMESTAMP_ISO8601:timestamp} %{syslogHOST:hostname} %{WORD:ident} %{NUMBER:pid} %{USERNAME:msgid} (? P<extradata>(\[(. *)\]|[^ ])) %{GREEDYDATA:message}'))
  • FluentRFC3164
    • Log content
      <6>Feb 28 12:00:00 192.168.0.1 fluentd[11111]: [error] Syslog test
    • LOG DSL rules
      """
      The log before processing is as follows:
        receive_time: 1558663265
        __topic__:
        content: <6>Feb 28 12:00:00 192.168.0.1 aliyun[11111]: [error] Syslog test
      The log after processing is as follows:
        receive_time: 1558663265
        __topic__:
        content: <6>Feb 28 12:00:00 192.168.0.1 aliyun[11111]: [error] Syslog test
        priority: 6
        timestamp: Feb 28 12:00:00
        hostname: 192.168.0.1
        ident: aliyun
        pid: [1111]
        level: [error]
        message: Syslog test
      """
      e_regex('content', grok('%{POSINT:priority}>%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:hostname} %{WORD:ident}(? P<pid>(\[(\d+)\]|[^:])): (? P<level>(\[(\w+)\]|[^ ])) %{GREEDYDATA:message}'))
You can further parse the priority field in the log content to obtain facility and severity information. For more information about log parsing using RFC 5424, see e_syslogrfc. Example:
"""
The log before processing is as follows:
  receive_time: 1558663265
  __topic__:
  content: <13>1 2019-05-06T11:50:16.015554+08:00 iZbp1a65x3r1vhpe94fi2qZ root - - - twish
  priority: 13
  version: 1
  timestamp: 2019-05-06T11:50:16.015554+08:00
  hostname: iZbp1a65x3r1vhpe94fi2qZ
  program: root
  message: twish
The log after processing is as follows:
  receive_time: 1558663265
  __topic__:
  content: <13>1 2019-05-06T11:50:16.015554+08:00 iZbp1a65x3r1vhpe94fi2qZ root - - - twish
  priority: 13
  version: 1
  timestamp: 2019-05-06T11:50:16.015554+08:00
  hostname: iZbp1a65x3r1vhpe94fi2qZ
  program: root
  message: twish
  _facility_: 1
  _severity_: 5
  _severitylabel_: Notice: normal but significant condition
  _facilitylabel_: user-level messages
"""
e_syslogrfc("priority","SYSLOGRFC5424")