All Products
Search
Document Center

Simple Log Service:Instructions for extracting semi-structured data

Last Updated:Aug 18, 2025

This topic describes how to use instructions to extract semi-structured data and provides relevant examples.

parse-regexp

Extracts information from a specified field that matches regular expression groups.

Important
  • The data type of the extracted data is VARCHAR. If an extracted field has the same name as a field in the input data, see Value retention and overwriting for the value retention policy.

  • You cannot perform operations on the __time__ and __time_ns_part__ time fields. For more information, see Time fields.

Syntax

| parse-regexp <field>, <pattern> as <output>, ...

Parameters

Parameter

Type

Required

Description

field

Field

Yes

The name of the source field to extract.

The input data must contain this field. The field must be of the VARCHAR type and its value cannot be null. Otherwise, the extraction operation is not performed.

pattern

Regexp

Yes

The regular expression. The RE2 syntax is supported.

output

Field

No

The name of the field used to store the extraction result.

Examples

  • Example 1: Perform exploratory matching sequentially.

    • SPL statement

      *
      | parse-regexp content, '(\S+)' as ip -- Generates the ip field: 10.0.0.0.
      | parse-regexp content, '\S+\s+(\w+)' as method -- Generates the method field: GET.
    • Input data

      content: '10.0.0.0 GET /index.html 15824 0.043'
    • Output

      content: '10.0.0.0 GET /index.html 15824 0.043'
      ip: '10.0.0.0'
      method: 'GET'
  • Example 2: Perform a full pattern match using non-named regular expression capturing.

    • SPL statement

      * | parse-regexp content, '(\S+)\s+(\w+)' as ip, method
    • Input data

      content: '10.0.0.0 GET /index.html 15824 0.043'
    • Output

      content: '10.0.0.0 GET /index.html 15824 0.043'
      ip: '10.0.0.0'
      method: 'GET'

parse-csv

Extracts data in CSV format from a specified field.

Important
  • The data type of the extracted data is VARCHAR. If an extracted field has the same name as a field in the input data, see Value retention and overwriting for the value retention policy.

  • You cannot perform operations on the __time__ and __time_ns_part__ time fields. For more information, see Time fields.

Syntax

| parse-csv -delim=<delim> -quote=<quote> -strict <field> as <output>, ...

Parameters

Parameter

Type

Required

Description

delim

String

No

The separator character for the data content. It can be one to three valid ASCII characters.

You can use escape characters to represent special characters. For example, \t represents a tab character, \11 represents the ASCII character with the octal ordinal number 11, and \x09 represents the ASCII character with the hexadecimal ordinal number 09.

You can also use a multi-character separator, such as $$$, ^_^.

The default value is a comma (,).

quote

Char

No

The quote character for the data content. It is a single valid ASCII character used when the data content contains the separator.

Examples include a double quotation mark ("), a single quotation mark ('), and an invisible character (0x01).

By default, no quote character is used.

Important

This parameter takes effect only when the delim parameter is a single character. The value of this parameter cannot be the same as the value of the delim parameter.

strict

Bool

No

Specifies whether to enable strict matching when the number of values in the data content does not match the number of fields specified in output.

  • False: Non-strict matching. The maximum matching policy is used.

    • If the number of values is greater than the number of fields, the extra values are not output.

    • If the number of fields is greater than the number of values, empty strings are output for the extra fields.

  • True: Strict matching. No fields are output.

This feature is disabled by default. To enable it, add this parameter.

field

Field

Yes

The name of the source field to parse.

The data content must include this field. The field must be of the VARCHAR type and its value cannot be null. Otherwise, the extraction operation is not performed.

output

Field

Yes

The name of the field used to store the parsed data content.

Examples

  • Example 1: Simple data matching.

    • SPL statement

      * | parse-csv content as x, y, z
    • Input data

      content: 'a,b,c'
    • Output

      content: 'a,b,c'
      x: 'a'
      y: 'b'
      z: 'c'
  • Example 2: Use the default double quotation mark (") as the quote character to match content that contains special characters.

    • SPL statement

      * | parse-csv content as ip, time, host
    • Input data

      content: '192.168.0.100,"10/Jun/2019:11:32:16,127 +0800",example.aliyundoc.com'
    • Output

      content: '192.168.0.100,"10/Jun/2019:11:32:16,127 +0800",example.aliyundoc.com'
      ip: '192.168.0.100'
      time: '10/Jun/2019:11:32:16,127 +0800'
      host: 'example.aliyundoc.com'
  • Example 3: Use a multi-character separator.

    • SPL statement

      * | parse-csv -delim='||' content as time, ip, req
    • Input data

      content: '05/May/2022:13:30:28||127.0.0.1||POST /put?a=1&b=2'
    • Output

      content: '05/May/2022:13:30:28||127.0.0.1||POST /put?a=1&b=2'
      time: '05/May/2022:13:30:28'
      ip: '127.0.0.1'
      req: 'POST /put?a=1&b=2'

parse-json

Extracts the first-layer key-value pairs from a specified field in JSON format.

Important
  • The data type of the extracted data is VARCHAR. If an extracted field has the same name as a field in the input data, see Value retention and overwriting for the value retention policy.

  • You cannot perform operations on the __time__ and __time_ns_part__ time fields. For more information, see Time fields.

Syntax

| parse-json -mode=<mode> -path=<path> -prefix=<prefix> <field>

Parameters

Parameter

Type

Required

Description

mode

String

No

Specifies the value mode for the result if a new field has the same name as a field in the input data. The default value is overwrite.

path

JSONPath

No

Specifies the JSON path in the field content to locate the content to be extracted.

The default value is an empty string, which indicates that the full content of the specified field is extracted directly.

prefix

String

No

The prefix for the result fields after the JSON structure is expanded. The default value is an empty string.

field

Field

Yes

The name of the source field to parse.

The input data must contain this field, its value cannot be null, and one of the following conditions must be met. Otherwise, the extraction operation is not performed.

  • The type is JSON.

  • The type is VARCHAR, and the value is a valid JSON string.

Examples

  • Example 1: Extract all key-value pairs from the y field.

    • SPL statement

      * | parse-json y
    • Input data

      x: '0'
      y: '{"a": 1, "b": 2}'
    • Output

      x: '0'
      y: '{"a": 1, "b": 2}'
      a: '1'
      b: '2'
  • Example 2: Extract the value of the body key from the content field, and then extract all of its key-value pairs.

    • SPL statement

      * | parse-json -path='$.body' content
    • Input data

      content: '{"body": {"a": 1, "b": 2}}'
    • Output

      content: '{"body": {"a": 1, "b": 2}}'
      a: '1'
      b: '2'
  • Example 3: Set the field value output mode to preserve to retain the original values of existing fields.

    • SPL statement

      * | parse-json -mode='preserve' y
    • Input data

      a: 'xyz'
      x: '0'
      y: '{"a": 1, "b": 2}'
    • Output

      x: '0'
      y: '{"a": 1, "b": 2}'
      a: 'xyz'
      b: '2'

parse-kv

Extracts key-value pairs from a specified field.

Important
  • The data type of the extracted data is VARCHAR. If an extracted field has the same name as a field in the input data, see Value retention and overwriting for the value retention policy.

  • You cannot perform operations on the __time__ and __time_ns_part__ time fields. For more information, see Time fields.

Syntax

Extraction by separator

Extracts key-value pairs based on specified separators.

| parse-kv -mode=<mode> -prefix=<prefix> -greedy <field>, <delim>, <kv-sep>

Extraction by regular expression

Extracts key-value pairs based on a specified regular expression.

| parse-kv -regexp -mode=<mode> -prefix=<prefix> <field>, <pattern>

Parameters

Extraction by separator

Parameter

Type

Required

Description

mode

String

No

If the corresponding destination field already exists in the input data, you can select a data overwriting mode.

The default value is overwrite. For more information, see Field extraction check and overwrite modes.

prefix

String

No

The prefix for the names of the output fields that contain the extraction results. The default value is an empty string.

greedy

Bool

No

Enables greedy matching for field values.

  • Disabled: Stops matching the field value when a delim is encountered.

  • Enabled: Matches all content before the next key-value pair as the field value.

field

Field

Yes

The name of the source field to parse.

  1. If this field does not exist in the data entry or its value is null, the entry is not processed.

  2. If no key-value pairs are matched in the data content, the entry is not processed.

delim

Char

Yes

The separator character between different key-value pairs. It can be one to five valid ASCII characters, such as ^_^.

You cannot specify a substring of kv-sep.

kv-sep

Char

Yes

The character that connects the key and value within a key-value pair. It can be one to five valid ASCII characters, such as #$#.

You cannot specify a substring of delim.

Extraction by regular expression

Parameter

Type

Required

Description

regexp

Bool

Yes

Enables the regular expression extraction mode.

mode

String

No

If the corresponding destination field already exists in the input data, you can select a data overwriting mode.

The default value is overwrite. For more information, see Field extraction check and overwrite modes.

prefix

String

No

The prefix for the names of the output fields that contain the extraction results. The default value is an empty string.

field

Field

Yes

The name of the source field to extract.

The input data must contain this field. The field must be of the VARCHAR type and its value cannot be null. Otherwise, the extraction operation is not performed.

pattern

RegExpr

Yes

A regular expression that contains two capturing groups. The first capturing group extracts the field name, and the second capturing group extracts the field value. The RE2 syntax is supported.

Examples

  • Example 1: Use multi-character separators to extract labels from SLS metric data as data fields

    • SPL statement

      * | parse-kv -prefix='__labels__.' __labels__, '|', '#$#'
    • Input data

      __name__: 'net_in'
      __value__: '231461.57374215033'
      __time_nano__: '1717378679274117026'
      __labels__: 'cluster#$#sls-etl|hostname#$#iZbp17raa25u0xi4wifopeZ|interface#$#veth02cc91d2|ip#$#192.168.22.238'
    • Output data

      __name__: 'net_in'
      __value__: '231461.57374215033'
      __time_nano__: '1717378679274117026'
      __labels__: 'cluster#$#sls-etl|hostname#$#iZbp17raa25u0xi4wifopeZ|interface#$#veth02cc91d2|ip#$#192.168.22.238'
      __labels__.cluster: 'sls-etl'
      __labels__.hostname: 'iZbp17raa25u0xi4wifopeZ'
      __labels__.interface: 'veth02cc91d2'
      __labels__.ip: '192.168.22.238'
  • Example 2: Enable greedy matching mode to extract key-value pairs from access logs.

    • SPL statement

      * | parse-kv -greedy content, ' ', '='
    • Input data

      content: 'src=127.0.0.1 dst=192.168.0.0 bytes=125 msg=connection refused body=this is test time=2024-05-21T00:00:00'
    • Output data

      content: 'src=127.0.0.1 dst=192.168.0.0 bytes=125 msg=connection refused body=this is test time=2024-05-21T00:00:00'
      src: '127.0.0.1'
      dst: '192.168.0.0'
      bytes: '125'
      msg: 'connection refused'
      body: 'this is test'
      time: '2024-05-21T00:00:00'
  • Example 3: Use the regular expression extraction mode to handle complex key-value pair delimiters and key-value separators.

    • SPL statement

      * | parse-kv -regexp content, '([^&?]+)(?:=|:)([^&?]+)'
    • Input data

      content: 'k1=v1&k2=v2?k3:v3'
      k1: 'xyz'
    • Output data

      content: 'k1=v1&k2=v2?k3:v3'
      k1: 'v1'
      k2: 'v2'
      k3: 'v3'
  • Example 4: Set the field value output mode to preserve to retain the original values of existing fields.

    • SPL statement

      * | parse-kv -regexp -mode='preserve' content, '([^&?]+)(?:=|:)([^&?]+)'
    • Input data

      content: 'k1=v1&k2=v2?k3:v3'
      k1: 'xyz'
    • Output

      content: 'k1=v1&k2=v2?k3:v3'
      k1: 'xyz'
      k2: 'v2'
      k3: 'v3'