This topic describes the syntax and parameters of the Grok function. This topic also provides several examples of the Grok function.
We recommend that you use the Grok function instead of Regular expression functions. You can also integrate the Grok function and regular expression functions. Examples:
e_match("content", grok(r "\w +: (%{IP})")) # The Grok pattern matches the abc: 192.168.0.0 or xyz: 192.168.1.1 pattern of log data.
e_match("content", grok(r "\w +: (%{IP})",escape=True)) # The Grok pattern does not match the abc: 192.168.0.0 pattern of log data. The Grok pattern matches the \w +: 192.168.0.0 pattern of log data.
Description
The Grok function extracts specified values based on a regular expression.
- Function format
grok(pattern, escape=False, extend=None)
- Grok syntax
%{SYNTAX} %{SYNTAX:NAME}
In the Grok syntax, SYNTAX indicates a predefined regular expression, and NAME indicates a naming group. Examples:"%{IP}" # Equivalent to r"(?:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})" "%{IP:source_id}" # Equivalent to r"(? P<source_id>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})" ("%{IP}") # Equivalent to r"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
Some Grok patterns are captured based on the predefined naming group. Therefore, you can use only the %{SYNTAX} syntax. These Grok patterns are commonly used in statement parsing. For more information, see the Log formats section in Grok patterns. Examples:"%{SYSLOGBASE}" "%{COMMONAPACHELOG}" "%{COMBINEDAPACHELOG}" "%{HTTPD20_ERRORLOG}" "%{HTTPD24_ERRORLOG}" "%{HTTPD_ERRORLOG}" ...
Some Grok patterns contain non-capturing groups. Examples:"%{INT}" "%{YEAR}" "%{HOUR}" ...
- Parameters
Parameter Type Required Description pattern String Yes The Grok syntax. For more information, see Grok patterns. escape Boolean No Specifies whether to escape special characters that are related to regular expressions in other non-Grok patterns. Default value: False. extend Dict No The custom Grok expression.
Examples
- Example 1: Extract the date and reference content.
- Raw log:
content: 2019 June 24 "I am iron man"
- Transformation rule:
e_regex('content',grok('%{YEAR:year} %{MONTH:month} %{MONTHDAY:day} %{QUOTEDSTRING:motto}'))
- Transformation result:
content: 2019 June 24 "I am iron man" year: 2019 month: June day: 24 motto: "I am iron man"
- Raw log:
- Example 2: Extract an HTTP request log.
- Raw log:
content: 55.3.244.1 GET /index.html 15824 0.043
- Transformation rule:
e_regex('content',grok('%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}'))
- Transformation result:
content: 55.3.244.1 GET /index.html 15824 0.043 client: 10.0.0.0 method: GET request: /index.html bytes: 15824 duration: 0.043
- Raw log:
- Example 3: Extract an Apache log.
- Raw log:
content: 127.0.0.1 - - [13/Apr/2015:17:22:03 +0800] "GET /router.php HTTP/1.1" 404 285 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
- Transformation rule:
e_regex('content',grok('%{COMBINEDAPACHELOG}'))
- Transformation result:
content: 127.0.0.1 - - [13/Apr/2015:17:22:03 +0800] "GET /router.php HTTP/1.1" 404 285 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" clientip: 127.0.0.1 ident: - auth: - timestamp: 13/Apr/2015:17:22:03 +0800 verb: GET request: /router.php httpversion: 1.1 response: 404 bytes: 285 referrer: "-" agent: "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
- Raw log:
- Example 4: Set a log to the default Syslog format.
- Raw log:
content: May 29 16:37:11 sadness logger: hello world
- Transformation rule:
e_regex('content',grok('%{SYSLOGBASE} %{DATA:message}'))
- Transformation result:
content: May 29 16:37:11 sadness logger: hello world timestamp: May 29 16:37:11 logsource: sadness program: logger message: hello world
- Raw log:
- Example 5: Escape special characters.
- Raw log:
content: Nov 1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3
- Transformation rule:
e_regex('content',grok(r'%{SYSLOGBASE} pid %{NUMBER:pid} \(%{WORD:program}\), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}'))
The transformation rules contain parentheses (). If you do not use escape characters, add the escape=True parameter, as shown in the following example:
e_regex('content',grok('%{SYSLOGBASE} pid %{NUMBER:pid} (%{WORD:program}), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}', escape=True))
- Transformation result:
content: Nov 1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3 timestamp: Nov 1 21:14:23 logsource: scorn program: expect pid: 84558 uid: 30206 signal: 3
- Raw log:
- Example 6: Customize a Grok expression.
- Raw log:
content: Beijing-1104,gary 25 "never quit"
- Transformation rule:
e_regex('content',grok('%{ID:user_id},%{WORD:name} %{INT:age} %{QUOTEDSTRING:motto}',extend={'ID': '%{WORD}-%{INT}'}))
- Transformation result:
content: Beijing-1104,gary 25 "never quit" user_id: Beijing-1104 name: gary age: 25 motto: "never quit"
- Raw log:
- Example 7: Match JSON data.
- Raw log:
content: 2019-10-29 16:41:39,218 - INFO: owt.AudioFrameConstructor - McsStats: {"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
- Transformation rule:
e_regex('content',grok('%{EXTRACTJSON}'))
- Transformation result:
content: 2019-10-29 16:41:39,218 - INFO: owt.AudioFrameConstructor - McsStats: {"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"} json:{"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
- Raw log:
- Example 8: Parse a W3C log.
- Raw log:
content: 2018-12-26 00:00:00 W3SVC2 application001 192.168.0.0 HEAD / - 8000 - 10.0.0.0 HTTP/1.0 - - - - 404 0 64 0 19 0
- Transformation rule:
Unavailable fields in the W3C log are displayed as hyphens (-). In Grok expressions, hyphens (-) are used to match these fields.
e_regex("content",grok('%{DATE:data} %{TIME:time} %{WORD:s_sitename} %{WORD:s_computername} %{IP:s_ip} %{WORD:cs_method} %{NOTSPACE:cs_uri_stem} - %{NUMBER:s_port} - %{IP:c_ip} %{NOTSPACE:cs_version} - - - - %{NUMBER:sc_status} %{NUMBER:sc_substatus} %{NUMBER:sc_win32_status} %{NUMBER:sc_bytes} %{NUMBER:cs_bytes} %{NUMBER:time_taken}'))
- Transformation result:
content: 2018-12-26 00:00:00 W3SVC2 application001 192.168.0.0 HEAD / - 8000 - 10.0.0.0 HTTP/1.0 - - - - 404 0 64 0 19 0 data: 18-12-26 time: 00:00:00 s_sitename: W3SVC2 s_computername: application001 s_ip: 192.168.0.0 cs_method: HEAD cs_uri_stem: / s_port: 8000 c_ip: 10.0.0.0 cs_version: HTTP/1.0 sc_status: 404 sc_substatus: 0 sc_win32_status: 64 sc_bytes: 0 cs_bytes: 19 time_taken: 0
- Raw log: