This topic describes the Simple Log Service Processing Language (SPL) instructions.
Parameter types
The table below describes the data types for parameters used in SPL instructions.
Parameter type | Description |
Bool | The parameter specifies a Boolean value. This type of parameter is a switch in SPL instructions. |
Char | The parameter specifies an ASCII character. You must use single quotation marks ('') to enclose the character. For example, |
Integer | The parameter specifies an integer value. |
String | The parameter specifies a string. You must use single quotation marks ('') to enclose the string. For example, |
RegExp | The parameter specifies a regular expression. The RE2 syntax is supported. You must use single quotation marks ('') to enclose the regular expression. For example, For more information, see Syntax. |
JSONPath | The parameter specifies a JSON path. You must use single quotation marks ('') to enclose the JSON path. For example, For more information, see JsonPath. |
Field | The parameter specifies a field name. For example, If the field name contains special characters other than letters, digits, and underscores, you must use double quotation marks ("") to enclose the field name. For example, Note For more information about case sensitivity of field names, see SPL function definitions in different scenarios. |
FieldPattern | The parameter specifies a field name or a combination of a field name and a wildcard character. An asterisk (*) can be used as a wildcard character, which matches zero or multiple characters. You must use double quotation marks ("") to enclose the field name or the combination. For example, Note For more information about case sensitivity of field names, see SPL function definitions in different scenarios. |
SPLExp | The parameter specifies an SPL expression. |
SQLExp | The parameter specifies an SQL expression. |
SPL instruction list
Instruction category | Instruction name | Description |
Control instructions | Defines a named dataset. For more information about SPL datasets, see SPL datasets. | |
Field processing instructions | Retains the fields that match the specified pattern and renames the specified fields. During instruction execution, all retain-related expressions are executed before rename-related expressions. | |
Removes the fields that match the specified pattern and retains all other fields as they are. | ||
Renames the specified fields and retains all other fields as they are. | ||
Expands a first-level JSON object for the specified field and returns multiple result entries. | ||
SQL calculation instructions on structured data | Creates fields based on the result of SQL expression-based data calculation. For more information about the SQL functions that are supported, see SQL functions supported by SPL. | |
This instruction filters data based on the result of SQL expression-based data calculation. Data that matches the specified SQL expression is retained. For more information about the SQL functions that are supported by the where instruction, see SQL functions supported by SPL. | ||
Semi-structured data extraction instructions | This instruction extracts the information that matches groups in the specified regular expression from the specified field. | |
Extracts information in the CSV format from the specified field. | ||
Extracts the first-layer JSON information from the specified field. | ||
This instruction extracts the key-value pair information from the specified field. |
Control instructions
.let
Defines a named dataset as the input for subsequent SPL expressions. For detailed information on SPL datasets, see SPL datasets.
Syntax
.let <dataset>=<spl-expr>
Parameter description
Parameter | Type | Required | Description |
dataset | String | Yes | The name of the dataset. The name can contain letters, digits, and underscores, and must start with a letter. The name is case-sensitive. |
spl-expr | SPLExp | Yes | The SPL expression that is used to generate the dataset. |
Examples
Example 1: Filter and classify access logs by status codes before exporting them.
SPL statement
-- Define the processing result of SPL as a named dataset src, which is used as the input of subsequent SPL expressions .let src = * | where status=cast(status as BIGINT); -- Use the named dataset src as the input, filter the data whose status field is 5xx, and define the dataset err. The dataset is not exported .let err = $src | where status >= 500 | extend msg='ERR'; -- Use the named dataset src as the input, filter the data whose status field is 2xx, and define the dataset ok. The dataset is not exported .let ok = $src | where status >= 200 and status < 300 | extend msg='OK'; -- Export the named datasets err and ok $err; $ok;
Input data
# Entry 1 status: '200' body: 'this is a test' # Entry 2 status: '500' body: 'internal error' # Entry 3 status: '404' body: 'not found'
Output data
# Entry 1: The dataset is err status: '500' body: 'internal error' msg: 'ERR' # Entry 2: The dataset is ok status: '200' body: 'this is a test' msg: 'OK'
Field processing instructions
project
Retains fields that match a specified pattern and renames specified fields. All expressions related to retaining fields are executed before those related to renaming during the execution of the instruction.
By default, the fields __time__ and __time_ns_part__ are retained and cannot be renamed or overwritten. For more information, see Time fields.
Syntax
| project -wildcard <field-pattern>, <output>=<field>, ...
Parameter description
Parameter | Type | Required | Description |
wildcard | Bool | No | Specifies whether to enable the wildcard match mode. By default, the exact match mode is used. If you want to enable the wildcard match mode, you must configure this parameter. |
field-pattern | FieldPattern | Yes | The name of the field to retain, or a combination of a field and a wildcard character. All matched fields are processed. |
output | Field | Yes | The new name of the field to rename. You cannot rename multiple fields to the same name. Important If the new field name is the same as an existing field name in the input data, see Retention and overwrite of old and new values. |
field | Field | Yes | The original name of the field to rename.
|
Examples
Example 1: Retain a field.
* | project level, err_msg
Example 2: Rename a field.
* | project log_level=level, err_msg
Example 3: Retain the field that exactly matches
__tag__:*
.* | project "__tag__:*"
project-away
Removes fields that match a specified pattern, retaining all other fields unchanged.
By default, the fields __time__ and __time_ns_part__ are retained. For more information, see Time fields.
Syntax
| project-away -wildcard <field-pattern>, ...
Parameter description
Parameter | Type | Required | Description |
wildcard | Bool | No | Specifies whether to enable the wildcard match mode. By default, the exact match mode is used. If you want to enable the wildcard match mode, you must configure this parameter. |
field-pattern | FieldPattern | Yes | The name of the field to remove, or a combination of a field and a wildcard character. All matched fields are processed. |
project-rename
Renames specified fields while retaining all other fields unchanged.
By default, the fields __time__ and __time_ns_part__ are retained and cannot be renamed or overwritten. For more information, see Time fields.
Syntax
| project-rename <output>=<field>, ...
Parameter description
Parameter | Type | Required | Description |
output | Field | Yes | The new name of the field to rename. You cannot rename multiple fields to the same name. Important If the new field name is the same as an existing field name in the input data, see Retention and overwrite of old and new values. |
field | Field | Yes | The original name of the field to rename.
|
Example
Rename specified fields.
* | project-rename log_level=level, log_err_msg=err_msg
expand-values
Expands a first-level JSON object in a specified field, resulting in multiple output entries.
The output fields are of VARCHAR data type. If an output field's name matches an existing field name in the input data, refer to Retention and overwrite of old and new values.
You cannot manipulate the fields
__time__
and__time_ns_part__
. For additional details, see Time fields.This function is supported in scenarios such as data transformation of new versions. For information on SPL functions in different scenarios, see SPL function definitions in different scenarios.
Syntax
| expand-values -path=<path> -limit=<limit> -keep <field> as <output>
Parameter description
Parameter | Type | Required | Description |
path | JSONPath | No | The JSON path in the specified field. The JSON path is used to identify the information that you want to expand. This parameter is empty by default, which specifies that the complete data of the specified field is expanded. |
limit | Integer | No | The maximum number of entries that can be obtained after expanding a JSON object for the specified field. The value is an integer from 1 to 10. Default value: 10. |
keep | Bool | No | Specifies whether to retain the original field after the expand operation is performed. By default, the original field is not retained. If you want to retain the original field, you must configure this parameter. |
field | Field | Yes | The original name of the field to expand. The data type of the field must be |
output | Filed | No | The name of the new field that is obtained after the expand operation is performed. If you do not configure this parameter, the output data is exported to the original field. You can expand a first-level JSON object for a field based on the following logic: JSON array: expands an array by element. JSON dictionary: expands a dictionary by key-value pair. Other JSON types: returns the original values. Invalid JSON: returns |
Examples
Example 1: Expand an array to return multiple result entries.
SPL statement
* | expand-values y
Input data
x: 'abc' y: '[0,1,2]'
Output data, including three entries
# Entry 1 x: 'abc' y: '0' # Entry 2 x: 'abc' y: '1' # Entry 3 x: 'abc' y: '2'
Example 2: Expand a dictionary to return multiple result entries.
SPL statement
* | expand-values y
Input data
x: 'abc' y: '{"a": 1, "b": 2}'
Output data, including two entries
# Entry 1 x: 'abc' y: '{"a": 1}' # Entry 2 x: 'abc' y: '{"b": 2}'
Example 3: Expand content that matches a specified JSONPath expression and export to a new field.
SPL statement
* | expand-values -keep content -path='$.body' as body
Input data
content: '{"body": [0, {"a": 1, "b": 2}]}'
Output data, including two entries
# Entry 1 content: '{"body": [1, 2]}' body: '0' # Entry 2 content: '{"body": [1, 2]}' body: '{"a": 1, "b": 2}'
SQL calculation instructions on structured data
extend
Generates fields based on SQL expression-based data calculations. For a list of supported SQL functions, see SQL functions supported by SPL.
Syntax
| extend <output>=<sql-expr>, ...
Parameter description
Parameter | Type | Required | Description |
output | Field | Yes | The name of the field to create. You cannot create the same field to store the results of multiple expressions. Important If the new field name is the same as an existing field name in the input data, the new field overwrites the existing field based on the data type and value. |
sql-expr | SQLExpr | Yes | The data processing expression. Important For more information about null value processing, see Null value processing in SPL expressions. |
Examples
Example 1: Apply a computation expression.
* | extend Duration = EndTime - StartTime
Example 2: Utilize a regular expression.
* | extend server_protocol_version=regexp_extract(server_protocol, '\d+')
Example 3: Extract JSONPath content and convert a field's data type.
SPL statement
* | extend a=json_extract(content, '$.body.a'), b=json_extract(content, '$.body.b') | extend b=cast(b as BIGINT)
Input data
content: '{"body": {"a": 1, "b": 2}}'
Output data
content: '{"body": {"a": 1, "b": 2}}' a: '1' b: 2
where
Filters data based on SQL expression-based calculations. Data matching the specified SQL expression is retained. For a list of SQL functions supported by the where instruction, see SQL functions supported by SPL.
Syntax
| where <sql-expr>
Parameter description
Parameter | Type | Required | Description |
sql-expr | SQLExp | Yes | The SQL expression. Data that matches this expression is retained. Important For more information about null value processing in SQL expressions, see Null value processing in SPL expressions. |
Examples
Example 1: Filter data based on field content.
* | where userId='123'
Example 2: Filter data using a regular expression that matches based on a field name.
* | where regexp_like(server_protocol, '\d+')
Example 3: Convert a field's data type to match all server error data.
* | where cast(status as BIGINT) >= 500
Semi-structured data extraction instructions
parse-regexp
Extracts information matching groups in a specified regular expression from a field.
The output fields are of VARCHAR data type. If an output field's name matches an existing field name in the input data, refer to Retention and overwrite of old and new values.
Fields __time__ and __time_ns_part__ cannot be operated on. For more information, see Time fields.
Syntax
| parse-regexp <field>, <pattern> as <output>, ...
Parameter description
Parameter | Type | Required | Description |
field | Field | Yes | The original name of the field from which you want to extract information. The input data must include this field, and its type must be |
pattern | Regexp | Yes | The regular expression. The RE2 syntax is supported. |
output | Field | No | The name of the output field that you want to use to store the extraction result of the regular extraction. |
Examples
Example 1: Use exploratory match mode.
SPL statement
* | parse-regexp content, '(\S+)' as ip -- Generate the ip: 10.0.0.0 field. | parse-regexp content, '\S+\s+(\w+)' as method -- Generate the method: GET field.
Input data
content: '10.0.0.0 GET /index.html 15824 0.043'
Output data
content: '10.0.0.0 GET /index.html 15824 0.043' ip: '10.0.0.0' method: 'GET'
Example 2: Use full pattern match mode with unnamed capturing groups in the regular expression.
SPL statement
* | parse-regexp content, '(\S+)\s+(\w+)' as ip, method
Input data
content: '10.0.0.0 GET /index.html 15824 0.043'
Output data
content: '10.0.0.0 GET /index.html 15824 0.043' ip: '10.0.0.0' method: 'GET'
parse-csv
Extracts CSV-formatted information from a specified field.
The output fields are of VARCHAR data type. If an output field's name matches an existing field name in the input data, refer to Retention and overwrite of old and new values.
Fields __time__ and __time_ns_part__ cannot be operated on. For more information, see Time fields.
Syntax
| parse-csv -delim=<delim> -quote=<quote> -strict <field> as <output>, ...
Parameter description
Parameter | Type | Required | Description |
delim | String | No | The delimiter of the input data. You can specify one to three valid ASCII characters. You can use escape characters to indicate special characters. For example, \t indicates the tab character, \11 indicates the ASCII character whose serial number corresponds to the octal number 11, and \x09 indicates the ASCII character whose serial number corresponds to the hexadecimal number 09. You can also use a combination of multiple characters as the delimiter. For example, Default value: comma (,). |
quote | Char | No | The quote of the input data. You can specify a single valid ASCII character. If the input data contains delimiters, you must specify a quote. For example, you can specify double quotation marks (""), single quotation marks (''), or an unprintable character (0x01). By default, quotes are not used. Important This parameter takes effect only if you set the delim parameter to a single character. You must specify different values for the quote and delim parameters. |
strict | Bool | No | Specifies whether to enable strict pairing if the number of values in the input data is different from the number of fields specified in
Default value: False. If you want to enable strict paring, configure this parameter. |
field | Field | Yes | The name of the field to parse. The data content must include this field. The type must be |
output | Field | Yes | The name of the field that you want to use to store the parsing result of the input data. |
Examples
Example 1: Match data in simple mode.
SPL statement
* | parse-csv content as x, y, z
Input data
content: 'a,b,c'
Output data
content: 'a,b,c' x: 'a' y: 'b' z: 'c'
Example 2: Use double quotes as the quote character to match data containing special characters.
SPL statement
* | parse-csv content as ip, time, host
Input data
content: '192.168.0.100,"10/Jun/2019:11:32:16,127 +0800",example.aliyundoc.com'
Output data
content: '192.168.0.100,"10/Jun/2019:11:32:16,127 +0800",example.aliyundoc.com' ip: '192.168.0.100' time: '10/Jun/2019:11:32:16,127 +0800' host: 'example.aliyundoc.com'
Example 3: Use a combination of multiple characters as separators.
SPL statement
* | parse-csv -delim='||' content as time, ip, req
Input data
content: '05/May/2022:13:30:28||127.0.0.1||POST /put?a=1&b=2'
Output data
content: '05/May/2022:13:30:28||127.0.0.1||POST /put?a=1&b=2' time: '05/May/2022:13:30:28' ip: '127.0.0.1' req: 'POST /put?a=1&b=2'
parse-json
Extracts first-layer JSON information from a specified field.
The output fields are of VARCHAR data type. If an output field's name matches an existing field name in the input data, refer to Retention and overwrite of old and new values.
Fields __time__ and __time_ns_part__ cannot be operated on. For more information, see Time fields.
Syntax
| parse-json -mode=<mode> -path=<path> -prefix=<prefix> <field>
Parameter description
Parameter | Type | Required | Description |
mode | String | No | The mode that is used to extract information when the name of the output field is the same as an existing field name in the input data. The default value is overwrite. |
path | JSONPath | No | The JSON path in the specified field. The JSON path is used to locate the information that you want to extract. The default value is an empty string. If you use the default value, the complete data of the specified field is extracted. |
prefix | String | No | The prefix of the fields that are generated by expanding a JSON structure. The default value is an empty string. |
field | Field | Yes | The name of the field that you want to parse. Make sure that this field is included in the input data and the field value is a non-null value and meets one of the following conditions. Otherwise, the extract operation is not performed.
|
Examples
Example 1: Extract all keys and values from the y field.
SPL statement
* | parse-json y
Input data
x: '0' y: '{"a": 1, "b": 2}'
Output data
x: '0' y: '{"a": 1, "b": 2}' a: '1' b: '2'
Example 2: Extract the value of the body key from the content field as different fields.
SPL statement
* | parse-json -path='$.body' content
Input data
content: '{"body": {"a": 1, "b": 2}}'
Output data
content: '{"body": {"a": 1, "b": 2}}' a: '1' b: '2'
Example 3: Extract information in preserve mode, retaining the original value for existing fields.
SPL statement
* | parse-json -mode='preserve' y
Input data
a: 'xyz' x: '0' y: '{"a": 1, "b": 2}'
Output data
x: '0' y: '{"a": 1, "b": 2}' a: 'xyz' b: '2'
parse-kv
Extracts key-value pair information from a specified field.
The output fields are of VARCHAR data type. If an output field's name matches an existing field name in the input data, refer to Retention and overwrite of old and new values.
Fields __time__ and __time_ns_part__ cannot be operated on. For more information, see Time fields.
Syntax
| parse-kv -mode=<mode> -prefix=<prefix> -regexp <field>, <pattern>
Parameter
Parameter | Type | Required | Description |
mode | String | No | If the output field name is the same as an existing field name in the input data, you can select an overwrite mode based on your business requirements. The default value is overwrite. For more information, see Field extraction check and overwrite mode. |
prefix | String | No | The prefix of the output fields. The default value is an empty string. |
regexp | Bool | Yes | Specifies whether to enable the regular extraction mode. |
field | Field | Yes | The original name of the field from which you want to extract information. The input data must include this field, the type must be |
pattern | RegExpr | Yes | The regular expression that contains two capturing groups. One capturing group extracts the field name and the other capturing group extracts the field value. RE2 regular expressions are supported. |
Examples
Example 1: Extract key-value pairs in regular extraction mode when delimiters between pairs and keys and values vary.
SPL statement
* | parse-kv -regexp content, '([^&?]+)(?:=|:)([^&?]+)'
Input data
content: 'k1=v1&k2=v2?k3:v3' k1: 'xyz'
Output data
content: 'k1=v1&k2=v2?k3:v3' k1: 'v1' k2: 'v2' k3: 'v3'
Example 2: Extract information in preserve mode, retaining the original value for existing fields.
SPL statement
* | parse-kv -regexp -mode='preserve' content, '([^&?]+)(?:=|:)([^&?]+)'
Input data
content: 'k1=v1&k2=v2?k3:v3' k1: 'xyz'
Output data
content: 'k1=v1&k2=v2?k3:v3' k1: 'xyz' k2: 'v2' k3: 'v3'
Example 3: Extract information from complex unstructured data in regular extraction mode, where values are digits or strings enclosed in double quotes.
SPL statement
* | parse-kv -regexp content, '([^&?]+)(?:=|:)([^&?]+)'
Input data
content: 'verb="GET" URI="/healthz" latency="45.911µs" userAgent="kube-probe/1.30+" audit-ID="" srcIP="192.168.123.45:40092" contentType="text/plain; charset=utf-8" resp=200'
Output data
content: 'verb="GET" URI="/healthz" latency="45.911µs" userAgent="kube-probe/1.30+" audit-ID="" srcIP="192.168.123.45:40092" contentType="text/plain; charset=utf-8" resp=200' verb: 'GET' URI: '/healthz' latency: '45.911µs' userAgent: 'kube-probe/1.30+' audit-ID: '' srcIP: '192.168.123.45:40092' contentType: 'text/plain; charset=utf-8' resp: '200'