This topic describes the syntax and parameters of the Grok function. This topic also provides several examples of the Grok function.

We recommend that you use the Grok function instead of Regular expression functions. You can also integrate the Grok function and regular expression functions. Examples:
e_match("content", grok(r "\w +: (%{IP})")) # The Grok pattern matches the abc: 192.168.0.0 or xyz: 192.168.1.1 pattern of log data.
e_match("content", grok(r "\w +: (%{IP})",escape=True)) # The Grok pattern does not match the abc: 192.168.0.0 pattern of log data. The Grok pattern matches the \w +: 192.168.0.0 pattern of log data.

Description

The Grok function extracts specified values based on a regular expression.
  • Function format
    grok(pattern, escape=False, extend=None)
  • Grok syntax
    %{SYNTAX} 
    %{SYNTAX:NAME}
    In the Grok syntax, SYNTAX indicates a predefined regular expression, and NAME indicates a naming group. Examples:
    "%{IP}"               # Equivalent to r"(?:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
    "%{IP:source_id}"     # Equivalent to r"(? P<source_id>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
    ("%{IP}")             # Equivalent to r"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
    Some Grok patterns are captured based on the predefined naming group. Therefore, you can use only the %{SYNTAX} syntax. These Grok patterns are commonly used in statement parsing. For more information, see the Log formats section in Grok patterns. Examples:
    "%{SYSLOGBASE}"        
    "%{COMMONAPACHELOG}" 
    "%{COMBINEDAPACHELOG}"
    "%{HTTPD20_ERRORLOG}"
    "%{HTTPD24_ERRORLOG}"
    "%{HTTPD_ERRORLOG}"
    ...
    Some Grok patterns contain non-capturing groups. Examples:
    "%{INT}"    
    "%{YEAR}"
    "%{HOUR}"
    ...
  • Parameters
    Parameter Type Required Description
    pattern String Yes The Grok syntax. For more information, see Grok patterns.
    escape Boolean No Specifies whether to escape special characters that are related to regular expressions in other non-Grok patterns. Default value: False.
    extend Dict No The custom Grok expression.

Examples

  • Example 1: Extract the date and reference content.
    • Raw log:
      content: 2019 June 24 "I am iron man"
    • Transformation rule:
      e_regex('content',grok('%{YEAR:year} %{MONTH:month} %{MONTHDAY:day} %{QUOTEDSTRING:motto}'))
    • Transformation result:
      content: 2019 June 24 "I am iron man"
      year: 2019
      month: June
      day: 24
      motto: "I am iron man"
  • Example 2: Extract an HTTP request log.
    • Raw log:
      content: 55.3.244.1 GET /index.html 15824 0.043
    • Transformation rule:
      e_regex('content',grok('%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}'))
    • Transformation result:
      content: 55.3.244.1 GET /index.html 15824 0.043
      client: 10.0.0.0
      method: GET
      request: /index.html
      bytes: 15824
      duration: 0.043
  • Example 3: Extract an Apache log.
    • Raw log:
      content: 127.0.0.1 - - [13/Apr/2015:17:22:03 +0800] "GET /router.php HTTP/1.1" 404 285 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
    • Transformation rule:
      e_regex('content',grok('%{COMBINEDAPACHELOG}'))
    • Transformation result:
      content: 127.0.0.1 - - [13/Apr/2015:17:22:03 +0800] "GET /router.php HTTP/1.1" 404 285 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
      clientip: 127.0.0.1
      ident: -
      auth: -
      timestamp: 13/Apr/2015:17:22:03 +0800
      verb: GET
      request: /router.php
      httpversion: 1.1
      response: 404
      bytes: 285
      referrer: "-"
      agent: "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
  • Example 4: Set a log to the default Syslog format.
    • Raw log:
      content: May 29 16:37:11 sadness logger: hello world
    • Transformation rule:
      e_regex('content',grok('%{SYSLOGBASE} %{DATA:message}'))
    • Transformation result:
      content: May 29 16:37:11 sadness logger: hello world
      timestamp: May 29 16:37:11
      logsource: sadness
      program: logger
      message: hello world
  • Example 5: Escape special characters.
    • Raw log:
      content: Nov  1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3
    • Transformation rule:
      e_regex('content',grok(r'%{SYSLOGBASE} pid %{NUMBER:pid} \(%{WORD:program}\), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}'))

      The transformation rules contain parentheses (). If you do not use escape characters, add the escape=True parameter, as shown in the following example:

      e_regex('content',grok('%{SYSLOGBASE} pid %{NUMBER:pid} (%{WORD:program}), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}', escape=True))
    • Transformation result:
      content: Nov  1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3
      timestamp: Nov  1 21:14:23
      logsource: scorn
      program: expect
      pid: 84558
      uid: 30206
      signal: 3
  • Example 6: Customize a Grok expression.
    • Raw log:
      content: Beijing-1104,gary 25 "never quit"
    • Transformation rule:
      e_regex('content',grok('%{ID:user_id},%{WORD:name} %{INT:age} %{QUOTEDSTRING:motto}',extend={'ID': '%{WORD}-%{INT}'}))
    • Transformation result:
      content: Beijing-1104,gary 25 "never quit"
      user_id: Beijing-1104
      name: gary
      age: 25
      motto: "never quit"
  • Example 7: Match JSON data.
    • Raw log:
      content: 2019-10-29 16:41:39,218 - INFO: owt.AudioFrameConstructor - McsStats: {"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
    • Transformation rule:
      e_regex('content',grok('%{EXTRACTJSON}'))
    • Transformation result:
      content: 2019-10-29 16:41:39,218 - INFO: owt.AudioFrameConstructor - McsStats: {"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
      json:{"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
  • Example 8: Parse a W3C log.
    • Raw log:
      content: 2018-12-26 00:00:00 W3SVC2 application001 192.168.0.0 HEAD / - 8000 - 10.0.0.0 HTTP/1.0 - - - - 404 0 64 0 19 0
    • Transformation rule:

      Unavailable fields in the W3C log are displayed as hyphens (-). In Grok expressions, hyphens (-) are used to match these fields.

      e_regex("content",grok('%{DATE:data} %{TIME:time} %{WORD:s_sitename} %{WORD:s_computername} %{IP:s_ip} %{WORD:cs_method} %{NOTSPACE:cs_uri_stem} - %{NUMBER:s_port} - %{IP:c_ip} %{NOTSPACE:cs_version} - - - - %{NUMBER:sc_status} %{NUMBER:sc_substatus} %{NUMBER:sc_win32_status} %{NUMBER:sc_bytes} %{NUMBER:cs_bytes} %{NUMBER:time_taken}'))
    • Transformation result:
      content: 2018-12-26 00:00:00 W3SVC2 application001 192.168.0.0 HEAD / - 8000 - 10.0.0.0 HTTP/1.0 - - - - 404 0 64 0 19 0 
      data: 18-12-26
      time: 00:00:00
      s_sitename: W3SVC2
      s_computername: application001
      s_ip: 192.168.0.0
      cs_method: HEAD 
      cs_uri_stem: /
      s_port: 8000
      c_ip: 10.0.0.0
      cs_version: HTTP/1.0
      sc_status: 404
      sc_substatus: 0
      sc_win32_status: 64 
      sc_bytes: 0 
      cs_bytes: 19 
      time_taken: 0