This topic describes how to extract key-value pairs from a string.

In the following example, two methods are used to parse a log entry about a URL. The log entry is as follows:
request:  https://yz.m.sm.cn/s?ver=3.2.3&app_type=supplier&os=Android8.1.0
Requirements:
  • Parse the proto, domain, and param fields from the log entry.
  • Expand the key-value pairs in the param field.
Raw log entry:
__source__:  10.43.xx.xx
__tag__:__client_ip__:  12.120.xx.xx
__tag__:__receive_time__:  1563517113
__topic__:  
request:  https://yz.m.sm.cn/video/getlist/s?ver=3.2.3&app_type=supplier&os=Android8.1.0

DSL orchestration

  1. Use Grok to parse the request field. You can also use regular expressions to parse this field. For more information, see Grok function and Grok patterns.
    e_regex('request',grok("%{URIPROTO:uri_proto}://(?:%{USER:user}(?::[^@]*)? @)?(?:%{URIHOST:uri_domain})?(?:%{URIPATHPARAM:uri_param})?"))
    Preview the transformed log:
    uri_domain:  yz.m.sm.cn
    uri_param:  /video/getlist/s? ver=3.2.3&app_type=supplier&os=Android8.1.0
    uri_proto:  https
  2. Use Grok to parse the uri_param field.
    e_regex('uri_param',grok("%{GREEDYDATA:uri_path}\? %{GREEDYDATA:uri_query}"))
    Preview the transformed log:
    uri_path:  /video/getlist/s
    uri_query:  ver=3.2.3&app_type=supplier&os=Android8.1.0
  3. Extract fields from the uri_query log entry:
    e_kv("uri_query")
    Preview the transformed log:
    app_type:  supplier
    os:  Android8.1.0
    ver:  3.2.3
  4. Configure the DSL orchestration rule as follows:
    # Parse the request field.
    e_regex('request',grok("%{URIPROTO:uri_proto}://(?:%{USER:user}(?::[^@]*)? @)?(?:%{URIHOST:uri_domain})?(?:%{URIPATHPARAM:uri_param})?"))
    # Parse the uri_param field.
    e_regex('uri_param',grok("%{GREEDYDATA:uri_path}\? %{GREEDYDATA:uri_query}"))
    # Expand the key-value pairs.
    e_kv("uri_query")
    Preview the transformed log:
    __source__:  10.43.xx.xx
    __tag__:__client_ip__:  12.120.xx.xx
    __tag__:__receive_time__:  1563517113
    __topic__:  
    request:  https://yz.m.sm.cn/video/getlist/s?ver=3.2.3&app_type=supplier&os=Android8.1.0
    uri_domain:  yz.m.sm.cn
    uri_path:  /video/getlist/s
    uri_proto:  https
    uri_query:  ver=3.2.3&app_type=supplier&os=Android8.1.0
    app_type:  supplier
    os:  Android8.1.0
    ver:  3.2.3
    If you only need to expand the key-value pairs in the param field, you can use the e_kv function for the request field, for example:
    e_kv("request")
    Preview the transformed log:
    __source__:  10.43.xx.xx
    __tag__:__client_ip__:  12.120.xx.xx
    __tag__:__receive_time__:  1563517113
    __topic__:  
    request:  https://yz.m.sm.cn/video/getlist/s?ver=3.2.3&app_type=supplier&os=Android8.1.0
    app_type:  supplier
    os:  Android8.1.0
    ver:  3.2.3

Other methods

The following examples describe other methods to extract the ver, app_type, and os fields from url: https://yz.m.sm.cn/video/getlist/s?ver=3.2.3&app_type=supplier&os=Android8.1.0.
  • Use the following regular expression to extract the fields:
    e_regex("url", r"\b(\w+)=([^=&]+)", {r"\1": r"\2"})
  • Use the following e_kv_delmit function to extract the fields:
    e_kv_delimit("url", pair_sep=r"? &")
The preceding methods can be used to parse most URLs. However, we recommend that you use the e_kv function to parse the example URL because this method is simpler.

Method comparison

Method Keyword extraction Value extraction Keyword transformation Value transformation
e_kv Use specific regular expressions. The default character set and specific delimiters (or ") are supported. Prefixes and suffixes are supported. Text escape is supported.
e_kv_delimit Use specific regular expressions. Use delimiters. Prefixes and suffixes are supported. None.
e_regex Use custom regular expressions and default character set filtering. Custom. Arbitrary. Arbitrary.
  • Keyword extraction
    The e_kv, e_kv_delimit, and e_regex functions comply with field name extraction constraints during keyword extraction. For more information, see Constraints on field name extraction.
    • Example 1
      The following example describes three methods to extract keywords and values from the k1: q=asd&a=1&b=2&__1__=3 log entry.
      # By default, keywords are extracted by using a specific character set.
      e_kv("k1")
      
      # After the key-value is separated with an ampersand (&), keywords are extracted by using the ampersand (&).
      e_kv_delimit("k1", pair_sep=r"&")
      
      # Keywords and values are extracted by using a custom character set.
      e_regex("k1",r"(\w+)=([a-zA-Z0-9]+)",{r"\1": r"\2"})
      After a DSL orchestration, the log format is as follows:
      k1: q=asd&a=1&b=2
      q: asd
      a: 1
      b: 2
      Note The __1__ keyword is not extracted because it does not comply with field name extraction constraints. For more information, see Constraints on field name extraction.
    • Example 2
      The following example describes three methods to extract keywords from the content:k1=v1&k2=v2? k3:v3 log entry by using a specific set of regular expressions:
      e_kv("content",sep="(?:=|:)")
      e_kv_delimit("content",pair_sep=r"&?",kv_sep="(?:=|:)")
      e_regex("content",r"([a-zA-Z0-9]+)[=|:]([a-zA-Z0-9]+)",{r"\1": r"\2"})
      Note When the character set is passed to the pari_sep, kv_sep, or sep field, regular expressions that include a non-capturing group is used in the format of (?:character set).
      After a DSL orchestration, the log format is as follows:
      content:k1=v1&k2=v2? k3:v3
      k1: v1
      k2: v2
      k3: v3
    • Example 3
      In this example, the e_regex function is used to extract keywords from the following string:
      content :"ak_id:"LTAiscW,"ak_key:"rsd7r8f
      If the " character exists before the target keywords, the e_regex function must be used to extract these keywords.
      e_regex("str",r'(\w+):(\"\w+)',{r"\1":r"\2"})
      After a DSL orchestration, the log format is as follows:
      content :"ak_id:"LTAiscW,"ak_key:"rsd7r8f
      ak_id: LTAiscW
      ak_key: rsd7r8f
  • Value extraction
    Clear identifiers exist between dynamic key-value pairs or between keywords and values.
    • We recommend that you use the e_kv function to extract values from log entries if the log entries are saved in the a=b or a="cxxx" format, for example:
      content1:  k="helloworld",the change world, k2="good"
      In this example, the extracted data does not include the change world.
      e_kv("content1")
      # The syntax of the e_kv_delimit. A space character exists before k2. Therefore, k2 can be parsed only when the pair_sep parameter of the e_kv_delimit function is set to ",\s".
      e_kv_delimit("content1",kv_sep="=", pair_sep=",\s")
      # The syntax of the e_regex function.
      e_regex("str",r"(\w+)=(\"\w+)",{r"\1": r"\2"})
      The extracted log entry is as follows:
      content1:  k="helloworld",the change world, k2="good"
      k1: helloworld
      k2: good
    • We recommend that you use the e_kv function to extract values from log entries if the log entries are saved in the content:k1="v1=1"&k2=v2? k3=v3 format and contain double quotation marks (""), for example:
      e_kv("content",sep="=", quote="'")
      The extracted log entry is as follows:
      content: k1='v1=1'&k2=v2? k3=v3
      k1: v1=1
      k2:v2
      k3:v3

      If you use the e_kv_delimit function to extract values and the syntax is e_kv_delimit("ctx", pair_sep=r"&?", kv_sep="="), only k2: v2 and k3: v3 can be parsed. The keyword k1="v1 in the first key-value pair does not comply with field name extraction constraints and is dropped. For more information, see Constraints on field name extraction.

    • Some key-value pairs separated with delimiters contain special characters but are not enclosed in specific characters. We recommend that you use e_kv_delimit function to extract values from such key-value pairs, for example:
      content:  rats eat rice, oil|chicks eat bugs, rice|kittens eat fish, mice|
      In this example, the e_kv_delimit function is used.
      e_kv_delimit("content", pair_sep="|", kv_sep=" eat ")
      The parsed log entry is as follows:
      content:  rats eat rice, oil|chicks eat bugs, rice|kittens eat fish, mice|
      kittens:  fish, mice
      chicks:  bugs, rice
      rats:  rice, oil
      If you use the e_kv function, some log entry data cannot be parsed.
      e_kv("f1", sep="eat")
      The parsed log entry is as follows:
      content:  rats eat rice, oil|chicks eat bugs, rice|kittens eat fish, mice|
      kittens:  fish
      chicks:  bugs
      rats:  rice
  • Keyword transformation
    • You can use the e_kv and e_kv_delimit functions to transform keywords and values by setting the prefix and suffix parameters in the format of prefix="", suffix="".
      Raw log entry:
      k1: q=asd&a=1&b=2
      DSL Orchestration:
      e_kv("k1", sep="=", quote='"', prefix="start_", suffix="_end")
      e_kv_delimit("k1", pair_sep=r"&", kv_sep="=", prefix="start_", suffix="_end")
      e_regex("k1",r"(\w+)=([a-zA-Z0-9]+)",{r"start_\1_end": r"\2"})
      Log data is transformed into keywords in the following format:
      k1: q=asd&a=1&b=2
      start_q_end: asd
      start_a_end: 1
      start_b_end: 2
      You can also use the e_regex function to transform the log entry, for example:
      e_regex("k1",r"(\w+)=([a-zA-Z0-9]+)",{r"\1_\1": r"\2"})
      Log data is transformed into keywords in the following format:
      k1: q=asd&a=1&b=2
      q_q: asd
      a_a: 1
      a_a: 2
    • You can also use the e_regex function to transform the log entry, for example:
      e_regex("k1",r"(\w+)=([a-zA-Z0-9]+)",{r"\1_\1": r"\2"})
      Log data is transformed into keywords in the following format:
      k1: q=asd&a=1&b=2
      q_q: asd
      a_a: 1
      a_a: 2
  • Value transformation
    • If the log format is k1:"v1\"abc" and values contain double quotation marks (""), you can only use the e_kv function to extract key-value pairs.
      """
      In this example, the \ character is not an escape character.
      """
      content2:  k1:"v1\"abc", k2:"v2", k3: "v3"
      The syntax of the e_kv function is as follows:
      e_kv("content2",sep=":", quote='"')
      The extracted log entry is as follows:
      content2:  k1:"v1\"abc", k2:"v2", k3: "v3"
      k1: v1\
      k2: v2
      k3: v3
      You can use the e_kv function to escape the \ character by using the escape parameter, for example:
      e_kv("content2",sep=":", quote='"',escape=True)
      The extracted log entry is as follows:
      content2:  k1:"v1\"abc", k2:"v2", k3: "v3"
      k1: v1"abc
      k2: v2
      k3: v3
    • If the log format is a='k1=k2\';k2=k3', you can only use the e_kv function to extract key-value pairs.
      data: i=c10 a='k1=k2\';k2=k3'
      In the e_kv function, the escape parameter is False by default. The syntax of the e_kv function is as follows:
      e_kv("data", quote="'")
      The extracted log entry is as follows:
      a:  k1=k2\
      i:  c10
      k2:  k3
      You can use the e_kv function to escape the \ character by using the escape parameter, for example:
      e_kv("data", quote="'", escape=True)
      The extracted log entry is as follows:
      data: i=c10 a='k1=k2\';k2=k3'
      i: c10
      a: k1=k2';k2=k3
    • Complicated transformation of key-value pairs
      Example log entry:
      content:  rats eat rice|chicks eat bugs|kittens eat fish|
      Transform the log entry by using the following e_regex function:
      e_regex("content", r"\b(\w+) eat ([^\|]+)", {r"\1": r"\2 by \1"})
      The transformed log entry is as follows:
      content:  rats eat rice|chicks eat bugs|kittens eat fish|
      kittens:  fish by kittens
      chicks:  bugs by chicks
      rats:  rice by rats

Conclusion

You can use the e_kv function to extract key-value pairs in most cases, especially when you need to extract and escape enclosed characters or backslashes (\). In complicated scenarios, you can use the e_regex function to extract key-value pairs. In some cases, it is simpler to extract key-value pairs by using the e_kv_delemit function.