This topic describes the syntax and parameters of the Grok function. This topic also provides examples on how to use the function.

Description

Regular expression functions are complicated. We recommend that you use the Grok function instead of regular expression functions. For more information, see Regular expression functions. You can use the Grok function together with regular expression functions. Examples:
e_match("content", grok(r"\w+: (%{IP})"))  # The Grok pattern matches the abc: 192.168.0.0 or xyz: 192.168.1.1 pattern of log data. 
e_match("content", grok(r"\w+: (%{IP})", escape=True)) # The Grok pattern does not match the abc: 192.168.0.0 pattern of log data but matches the \w+: 192.168.0.0 pattern of log data. 
The Grok function extracts specified values based on a regular expression.
  • Syntax
    grok(pattern, escape=False, extend=None)
  • Grok syntax
    %{SYNTAX} 
    %{SYNTAX:NAME}
    In the Grok syntax, SYNTAX specifies a predefined regular expression, and NAME specifies a group. Examples:
    "%{IP}"               # Equivalent to r"(?:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
    "%{IP:source_id}"     # Equivalent to r"(?P<source_id>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
    ("%{IP}")             # Equivalent to r"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
    The Grok function supports the following grouping modes:
    • Capturing group mode
      Some Grok patterns support named capturing groups. You can use only the %{SYNTAX} syntax for these Grok patterns. These Grok patterns are commonly used to parse statements. For more information, see the "Log formats" section in Grok patterns. Examples:
      "%{SYSLOGBASE}"        
      "%{COMMONAPACHELOG}" 
      "%{COMBINEDAPACHELOG}"
      "%{HTTPD20_ERRORLOG}"
      "%{HTTPD24_ERRORLOG}"
      "%{HTTPD_ERRORLOG}"
      ...
    • Non-capturing group mode
      Some Grok patterns support non-capturing groups. Examples:
      "%{INT}"    
      "%{YEAR}"
      "%{HOUR}"
      ...
  • Parameters
    Parameter Type Required Description
    pattern String Yes The Grok syntax. For more information, see Grok patterns.
    escape Bool No Specifies whether to escape special characters that are included in regular expressions in non-Grok patterns. Default value: False.
    extend Dict No The custom Grok expression.

Examples

  • Example 1: Extract the date and reference content.
    • Raw log
      content: 2019 June 24 "I am iron man"
    • Transformation rule
      e_regex('content',grok('%{YEAR:year} %{MONTH:month} %{MONTHDAY:day} %{QUOTEDSTRING:motto}'))
    • Result
      content: 2019 June 24 "I am iron man"
      year: 2019
      month: June
      day: 24
      motto: "I am iron man"
  • Example 2: Extract an HTTP request log.
    • Raw log
      content: 10.0.0.0 GET /index.html 15824 0.043
    • Transformation rule
      e_regex('content',grok('%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}'))
    • Result
      content: 10.0.0.0 GET /index.html 15824 0.043
      client: 10.0.0.0
      method: GET
      request: /index.html
      bytes: 15824
      duration: 0.043
  • Example 3: Extract an Apache log.
    • Raw log
      content: 127.0.0.1 - - [13/Apr/2015:17:22:03 +0800] "GET /router.php HTTP/1.1" 404 285 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
    • Transformation rule
      e_regex('content',grok('%{COMBINEDAPACHELOG}'))
    • Result
      content: 127.0.0.1 - - [13/Apr/2015:17:22:03 +0800] "GET /router.php HTTP/1.1" 404 285 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
      clientip: 127.0.0.1
      ident: -
      auth: -
      timestamp: 13/Apr/2015:17:22:03 +0800
      verb: GET
      request: /router.php
      httpversion: 1.1
      response: 404
      bytes: 285
      referrer: "-"
      agent: "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
  • Example 4: Extract a log in the default syslog format.
    • Raw log
      content: May 29 16:37:11 sadness logger: hello world
    • Transformation rule
      e_regex('content',grok('%{SYSLOGBASE} %{DATA:message}'))
    • Result
      content: May 29 16:37:11 sadness logger: hello world
      timestamp: May 29 16:37:11
      logsource: sadness
      program: logger
      message: hello world
  • Example 5: Escape special characters.
    • Raw log
      content: Nov  1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3
    • Transformation rule
      e_regex('content',grok(r'%{SYSLOGBASE} pid %{NUMBER:pid} \(%{WORD:program}\), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}'))

      The transformation rule contains special characters parentheses (), which are included in regular expressions. If you do not want to escape the parentheses (), set the escape parameter to True. Example:

      e_regex('content',grok('%{SYSLOGBASE} pid %{NUMBER:pid} (%{WORD:program}), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}', escape=True))
    • Result
      content: Nov  1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3
      timestamp: Nov  1 21:14:23
      logsource: scorn
      program: expect
      pid: 84558
      uid: 30206
      signal: 3
  • Example 6: Extract a log by using a custom Grok expression.
    • Raw log
      content: Beijing-1104,gary 25 "never quit"
    • Transformation rule
      e_regex('content',grok('%{ID:user_id},%{WORD:name} %{INT:age} %{QUOTEDSTRING:motto}',extend={'ID': '%{WORD}-%{INT}'}))
    • Result
      content: Beijing-1104,gary 25 "never quit"
      user_id: Beijing-1104
      name: gary
      age: 25
      motto: "never quit"
  • Example 7: Match JSON data.
    • Raw log
      content: 2019-10-29 16:41:39,218 - INFO: owt.AudioFrameConstructor - McsStats: {"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
    • Transformation rule
      e_regex('content',grok('%{EXTRACTJSON}'))
    • Result
      content: 2019-10-29 16:41:39,218 - INFO: owt.AudioFrameConstructor - McsStats: {"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
      json:{"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
  • Example 8: Parse a log in the World Wide Web Consortium (W3C) format.
    • Raw log
      content: 2018-12-26 00:00:00 W3SVC2 application001 192.168.0.0 HEAD / - 8000 - 10.0.0.0 HTTP/1.0 - - - - 404 0 64 0 19 0
    • Transformation rule

      Fields that are not supported by the W3C format are displayed as hyphens (-). Therefore, hyphens (-) are used in Grok patterns to match the fields.

      e_regex("content",grok('%{DATE:data} %{TIME:time} %{WORD:s_sitename} %{WORD:s_computername} %{IP:s_ip} %{WORD:cs_method} %{NOTSPACE:cs_uri_stem} - %{NUMBER:s_port} - %{IP:c_ip} %{NOTSPACE:cs_version} - - - - %{NUMBER:sc_status} %{NUMBER:sc_substatus} %{NUMBER:sc_win32_status} %{NUMBER:sc_bytes} %{NUMBER:cs_bytes} %{NUMBER:time_taken}'))
    • Result
      content: 2018-12-26 00:00:00 W3SVC2 application001 192.168.0.0 HEAD / - 8000 - 10.0.0.0 HTTP/1.0 - - - - 404 0 64 0 19 0 
      data: 18-12-26
      time: 00:00:00
      s_sitename: W3SVC2
      s_computername: application001
      s_ip: 192.168.0.0
      cs_method: HEAD 
      cs_uri_stem: /
      s_port: 8000
      c_ip: 10.0.0.0
      cs_version: HTTP/1.0
      sc_status: 404
      sc_substatus: 0
      sc_win32_status: 64 
      sc_bytes: 0 
      cs_bytes: 19 
      time_taken: 0