This topic describes how to extract dynamic key-value pairs from a string.
Functions
Extracting dynamic key-value pairs is a process that extracts and transforms keywords and values. Use the e_kv, e_kv_delimit, and e_regex functions to extract dynamic key-value pairs. The following table describes the functions in different use cases.
Function | Keyword extraction | Value extraction | Keyword transformation | Value transformation |
e_kv | Uses specific regular expressions. | Supports the default character set and specific delimiters such as commas (,) and single quotation marks ("). | Supports prefixes and suffixes. | Supports text escape. |
e_kv_delimit | Uses specific regular expressions. | Uses delimiters. | Supports prefixes and suffixes. | None (default). |
e_regex | Uses custom regular expressions and the default character set. | Custom. | Custom. | Custom. |
In most cases, use the e_kv function and configure specific parameters to extract key-value pairs, especially when you need to extract and escape enclosed characters or backslashes (\). In complicated or advanced use cases, use the e_regex function to extract key-value pairs. In specific use cases, you need to use the e_kv_delimit function to extract key-value pairs.
Extract keywords
Method
When you use the
e_kv,e_kv_delimit, ore_regexfunction to extract keywords, the functions must comply with the extraction constraints. For more information, see Limits on field names for extraction.Example 1
The following example describes the functions to extract keywords and values from the
k1: q=asd&a=1&b=2&__1__=3log.e_kv function
Raw log
k1: q=asd&a=1&b=2&__1__=3Transformation rule
# By default, keywords are extracted by using a specific character set. e_kv("k1")Result
k1: q=asd&a=1&b=2 q: asd a: 1 b: 2NoteThe keyword
__1__is not extracted because it does not comply with the extraction constraints. For more information, see Limits on field names for extraction.
e_kv_delimit function
Raw log
k1: q=asd&a=1&b=2&__1__=3Transformation rule
# After the key-value pair is separated by an ampersand (&), extract the keywords by using the ampersand (&). e_kv_delimit("k1", pair_sep=r"&")Result
k1: q=asd&a=1&b=2 q: asd a: 1 b: 2
e_regex function
Raw log
k1: q=asd&a=1&b=2&__1__=3Transformation rule
# Keywords and values are extracted by using a custom character set. e_regex("k1",r"(\w+)=([a-zA-Z0-9]+)",{r"\1": r"\2"})Result
k1: q=asd&a=1&b=2 q: asd a: 1 b: 2
Example 2
The following example describes the functions to extract keywords from the
content:k1=v1&k2=v2?k3:v3log by using regular expressions.e_kv function
Raw log
content:k1=v1&k2=v2?k3:v3Transformation rule
e_kv("content",sep="(?:=|:)")Result
content:k1=v1&k2=v2?k3:v3 k1: v1 k2: v2 k3: v3NoteWhen a character set is passed to the
pair_sep,kv_sep, orsepparameter, you must use regular expressions that include a non-capturing group in the(?:Character set)format.
e_kv_delimit function
Raw log
content:k1=v1&k2=v2?k3:v3Transformation rule
e_kv_delimit("content",pair_sep=r"&?",kv_sep="(?:=|:)")Result
content:k1=v1&k2=v2?k3:v3 k1: v1 k2: v2 k3: v3
e_regex function
Raw log
content:k1=v1&k2=v2?k3:v3Transformation rule
e_regex("content",r"([a-zA-Z0-9]+)[=|:]([a-zA-Z0-9]+)",{r"\1": r"\2"})Result
content:k1=v1&k2=v2?k3:v3 k1: v1 k2: v2 k3: v3
Example 3
The following example shows how to use the
e_regexfunction to extract keywords from a complex string.Raw log
content :"key1:"value1,"key2:"value2Transformation rule
If double quotation marks (") exist in front of the keywords, you must use the
e_regexfunction.e_regex("content", r"(\w+):\"(\w+)", {r"\1": r"\2"})Result
The log format after DSL orchestration:
content :"key1:"value1,"key2:"value2 key1: value1 key2: value2
Extract values
The following example shows how to use the
e_kvfunction to extract values from a log if clear identifiers exist between dynamic key-value pairs or between keywords and values, such asa=banda="cxxx".Raw log
content1: k1="helloworld",the change world, k2="good"Transformation rule
In this case,
the change worldis not extracted.e_kv("content1") # The syntax of the e_kv_delimit function: A space is required before k2. Therefore, k2 can be parsed only when the pair_sep parameter of the e_kv_delimit function is set to ",\s". e_kv_delimit("content1",kv_sep="=", pair_sep=",\s") # The syntax of the e_regex function. e_regex("str",r"(\w+)=(\"\w+)",{r"\1": r"\2"})Result
The extracted log:
content1: k1="helloworld",the change world, k2="good" k1: helloworld k2: good
The following example shows how to use the
e_kvfunction to extract values from a log that contains the"character in thecontent:k1="v1=1"&k2=v2?k3=v3format.Raw log
content:k1="v1=1"&k2=v2?k3=v3Transformation rule
e_kv("content",sep="=", quote="'")Result
The extracted log:
content: k1='v1=1'&k2=v2?k3=v3 k1: v1=1 k2:v2 k3:v3
If you use the
e_kv_delimitfunction to extract values and the syntax ise_kv_delimit("ctx", pair_sep=r"&?", kv_sep="="), onlyk2: v2andk3: v3can be parsed. The keywordk1="v1in the first key-value pair is discarded because the keyword does not comply with the extraction constraints. For more information, see Limits on field names for extraction.Some key-value pairs separated by delimiters contain special characters but they are not enclosed in specific characters. Use the e_kv_delimit function to extract values from such key-value pairs. Example:
Raw log
content: rats eat rice, oil|chicks eat bugs, rice|kittens eat fish, mice|Transformation rule (recommended)
Use the e_kv_delimit function.
e_kv_delimit("content", pair_sep="|", kv_sep=" eat ")Result (recommended)
The extracted log:
content: rats eat rice, oil|chicks eat bugs, rice|kittens eat fish, mice| kittens: fish, mice chicks: bugs, rice rats: rice, oilTransformation rule (not recommended)
If you use the
e_kvfunction, some log fields cannot be parsed.e_kv("f1", sep="eat")Result (not recommended)
The extracted log:
content: rats eat rice, oil|chicks eat bugs, rice|kittens eat fish, mice| kittens: fish chicks: bugs rats: rice
Transform keywords
The following example shows how to use the
e_kvore_kv_delimitfunction to transform keywords and values by configuring the prefix and suffix parameters in the format ofprefix="", suffix="".Raw log
k1: q=asd&a=1&b=2Transformation rule
e_kv("k1", sep="=", quote='"', prefix="start_", suffix="_end") e_kv_delimit("k1", pair_sep=r"&", kv_sep="=", prefix="start_", suffix="_end") e_regex("k1",r"(\w+)=([a-zA-Z0-9]+)",{r"start_\1_end": r"\2"})Result
Transformed keywords:
k1: q=asd&a=1&b=2 start_q_end: asd start_a_end: 1 start_b_end: 2
The following example shows how to use the
e_regexfunction to transform keywords. The e_regex function has more powerful capabilities in transforming keywords.Transformation rule
e_regex("k1",r"(\w+)=([a-zA-Z0-9]+)",{r"\1_\1": r"\2"})Result
Transformed keywords:
k1: q=asd&a=1&b=2 q_q: asd a_a: 1 a_a: 2
Transform values
The following example shows how to use the
e_kvfunction to transform values if a log is in thek1:"v1\"abc"format and double quotation marks exist in the log content. It is difficult to use the other two functions to transform values in this use case.Raw log
""" In this example, the backlash (\) character is not an escape character. """ content2: k1:"v1\"abc", k2:"v2", k3: "v3"Transformation rule 1
e_kv("content2",sep=":", quote='"')Result 1
The extracted log:
content2: k1:"v1\"abc", k2:"v2", k3: "v3" k1: v1\ k2: v2 k3: v3Transformation rule 2
Use the
e_kvfunction to escape the backlash (\) character by using theescapeparameter. Example:e_kv("content2",sep=":", quote='"',escape=True)Result 2
The extracted log:
content2: k1:"v1\"abc", k2:"v2", k3: "v3" k1: v1"abc k2: v2 k3: v3
The following example shows how to use the
e_kvfunction to transform values if a log is in thea='k1=k2\';k2=k3'format. It is difficult to use the other two functions to transform values in this use case.Raw log
data: i=c10 a='k1=k2\';k2=k3'Transformation rule 1
In the
e_kvfunction, the value of theescapeparameter is False by default.e_kv("data", quote="'")Result 1
The extracted log:
a: k1=k2\ i: c10 k2: k3Transformation rule 2
Use the
e_kvfunction to escape the backlash (\) character by using theescapeparameter. Example:e_kv("data", quote="'", escape=True)Result 2
The extracted log:
data: i=c10 a='k1=k2\';k2=k3' i: c10 a: k1=k2';k2=k3
The following example shows how to use the
e_regexfunction to transform complex key-value pairs.Raw log
content: rats eat rice|chicks eat bugs|kittens eat fish|Transformation rule
e_regex("content", r"\b(\w+) eat ([^\|]+)", {r"\1": r"\2 by \1"})Result
The extracted log:
content: rats eat rice|chicks eat bugs|kittens eat fish| kittens: fish by kittens chicks: bugs by chicks rats: rice by rats
Case studies
In this example, your company needs to extract the URL data from your website logs. Specify transformation rules as needed.
Initial transformation
Requirements
Requirement 1: Parse a log and extract the
proto,domain, andparamfields from the log.Requirement 2: Expand the key-value pairs in the
paramfield.
Raw log
__source__: 192.168.0.100 __tag__:__client_ip__: 192.168.0.200 __tag__:__receive_time__: 1563517113 __topic__: request: https://example.com/video/getlist/s?ver=3.2.3&app_type=supplier&os=Android8.1.0Functions
General orchestration
# Parse the request field. e_regex('request',grok("%{URIPROTO:uri_proto}://(?:%{USER:user}(?::[^@]*)?@)?(?:%{URIHOST:uri_domain})?(?:%{URIPATHPARAM:uri_param})?")) # Parse the uri_param field. e_regex('uri_param',grok("%{GREEDYDATA:uri_path}\?%{GREEDYDATA:uri_query}")) # Expand the key-value pairs. e_kv("uri_query")Specific orchestration and the transformation results
Use the Grok function to parse the
requestfield.You can also use regular expressions to parse the request field. For more information, see Grok function and Grok patterns.
e_regex('request',grok("%{URIPROTO:uri_proto}://(?:%{USER:user}(?::[^@]*)?@)?(?:%{URIHOST:uri_domain})?(?:%{URIPATHPARAM:uri_param})?"))Result:
uri_domain: example.com uri_param: /video/getlist/s?ver=3.2.3&app_type=supplier&os=Android8.1.0 uri_proto: httpsUse the Grok function to parse the
uri_paramfield.e_regex('uri_param',grok("%{GREEDYDATA:uri_path}\?%{GREEDYDATA:uri_query}"))Result:
uri_path: /video/getlist/s uri_query: ver=3.2.3&app_type=supplier&os=Android8.1.0Extract the
uri_paramfield.e_kv("uri_query")Result:
app_type: supplier os: Android8.1.0 ver: 3.2.3
Result
Preview the transformed log:
__source__: 192.168.0.100 __tag__:__client_ip__: 192.168.0.200 __tag__:__receive_time__: 1563517113 __topic__: request: https://example.com/video/getlist/s?ver=3.2.3&app_type=supplier&os=Android8.1.0 uri_domain: example.com uri_path: /video/getlist/s uri_proto: https uri_query: ver=3.2.3&app_type=supplier&os=Android8.1.0 app_type: supplier os: Android8.1.0 ver: 3.2.3If you want to only parse the
requestfield, use the e_kv function. Example:e_kv("request")Preview the transformed log:
__source__: 192.168.0.100 __tag__:__client_ip__: 192.168.0.200 __tag__:__receive_time__: 1563517113 __topic__: request: https://example.com/video/getlist/s?ver=3.2.3&app_type=supplier&os=Android8.1.0 app_type: supplier os: Android8.1.0 ver: 3.2.3Advanced transformation
If you want to extract dynamic fields, such as the
ver,app_type, andosfields, use regular expressions or the e_kv_delimit function. Example:Use regular expressions.
e_regex("url", r"\b(\w+)=([^=&]+)", {r"\1": r"\2"})Use the
e_kv_delmitfunction.e_kv_delimit("url", pair_sep=r"?&")
Conclusion
Most URLs can be parsed by using the preceding functions. Use the
e_kvfunction to parse URLs from raw logs.