This topic describes the syntax of the Grok function and provides parameter description and function examples.

We recommend that you use the Grok function instead of regular expression functions because they are more complex.
Note You can also use the Grok function with regular expression functions. Examples:
# The Grok pattern matches the abc: 192.168.1.1 or xyz: 192.168.2.2 pattern of log data.
e_match("content", grok(r"\w+: (%{IP})"))
# The Grok pattern matches the \w+: 192.168.1.1 pattern of log data, but does not match the abc: 192.168.1.1 pattern of log data.
e_match("content", grok(r"\w+: (%{IP})", escape=True))
  • Feature

    This function extracts a specific value based on a regular expression.

  • Syntax
    grok(pattern, escape=False, extend=None)
  • Grok syntax
    %{SYNTAX} 
    %{SYNTAX:NAME}
    In the Grok syntax, SYNTAX indicates the predefined regular expression, and NAME indicates the naming group. Example:
    "%{IP}"               # Equivalent to r"(?:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
    "%{IP:source_id}"     # Equivalent to r"(? P<source_id>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
    ("%{IP}")             # Equivalent to r"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
    Some Grok patterns are captured with the predefined naming group. Therefore, you can only use the %{SYNTAX} syntax. Such Grok patterns are commonly used in statement parsing. Example:
    "%{SYSLOGBASE}"        
    "%{COMMONAPACHELOG}" 
    "%{COMBINEDAPACHELOG}"
    "%{HTTPD20_ERRORLOG}"
    "%{HTTPD24_ERRORLOG}"
    "%{HTTPD_ERRORLOG}"
    ...
    For more information, see the Log formats section in Grok patterns.
    In addition, some Grok patterns contain non-capture groups. Example:
    "%{INT}"    
    "%{YEAR}"
    "%{HOUR}"
    ...
  • Parameters
    Parameter Type Required Description
    pattern String Yes The Grok pattern. For more information, see Grok patterns.
    escape Boolean No Specifies whether to escape special characters related to regular expressions in other non-Grok patterns. Default value: False.
    extend Dict No The custom Grok expression.
  • Examples
    • Example 1: Extract the date and reference content.
      Raw log:
      content: 2019 June 24 "I am iron man"
      Transformation rule:
      e_regex('content',grok('%{YEAR:year} %{MONTH:month} %{MONTHDAY:day} %{QUOTEDSTRING:motto}'))
      Transformation result:
      content: 2019 June 24 "I am iron man"
      year: 2019
      month: June
      day: 24
      motto: "I am iron man"
    • Example 2: Extract an HTTP request log.
      Raw log:
      content: 55.3.244.1 GET /index.html 15824 0.043
      Transformation rule:
      e_regex('content',grok('%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}'))
      Transformation result:
      content: 55.3.244.1 GET /index.html 15824 0.043
      client: 55.3.244.1
      method: GET
      request: /index.html
      bytes: 15824
      duration: 0.043
    • Example 3: Extract an Apache log.
      Raw log:
      content: 127.0.0.1 - - [13/Apr/2015:17:22:03 +0800] "GET /router.php HTTP/1.1" 404 285 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
      Transformation rule:
      e_regex('content',grok('%{COMBINEDAPACHELOG}'))
      Transformation result:
      content: 127.0.0.1 - - [13/Apr/2015:17:22:03 +0800] "GET /router.php HTTP/1.1" 404 285 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
      clientip: 127.0.0.1
      ident: -
      auth: -
      timestamp: 13/Apr/2015:17:22:03 +0800
      verb: GET
      request: /router.php
      httpversion: 1.1
      response: 404
      bytes: 285
      referrer: "-"
      agent: "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
    • Example 4: Set a log to the default Syslog format.
      Raw log:
      content: May 29 16:37:11 sadness logger: hello world
      Transformation rule:
      e_regex('content',grok('%{SYSLOGBASE} %{DATA:message}'))
      Transformation result:
      content: May 29 16:37:11 sadness logger: hello world
      timestamp: May 29 16:37:11
      logsource: sadness
      program: logger
      message: hello world
    • Example 5: Escape special characters.
      Raw log:
      content: Nov  1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3
      Transformation rule:
      e_regex('content',grok(r'%{SYSLOGBASE} pid %{NUMBER:pid} \(%{WORD:program}\), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}'))
      """You can write the command as follows because it contains parentheses (( )):"""
      e_regex('content',grok('%{SYSLOGBASE} pid %{NUMBER:pid} (%{WORD:program}), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}', escape=True))
      Transformation result:
      content: Nov  1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3
      timestamp: Nov  1 21:14:23
      logsource: scorn
      program: expect
      pid: 84558
      uid: 30206
      signal: 3
    • Example 6: Customize a Grok expression.
      Raw log:
      content: Beijing-1104,gary 25 "never quit"
      Transformation rule:
      e_regex('content',grok('%{ID:user_id},%{WORD:name} %{INT:age} %{QUOTEDSTRING:motto}',extend={'ID': '%{WORD}-%{INT}'}))
      Transformation result:
      content: Beijing-1104,gary 25 "never quit"
      user_id: Beijing-1104
      name: gary
      age: 25
      motto: "never quit"