Introduction to SPL usage - Simple Log Service - Alibaba Cloud Documentation Center

SPL statement optimization

Multiple SPL statements can achieve the same data processing result. Writing concise and efficient SPL statements simplifies maintenance and improves performance. The following table provides some suggestions.

Suggestion	Before optimization	After optimization
Combine consecutive where clauses into one.	`* \| where request_method = 'POST' \| where cast(status as integer) >= 500`	`* \| where request_method = 'POST' and cast(status as integer) >= 500`
Combine consecutive extend clauses into one.	`* \| extend k1 = 'v1' \| extend k2 = 'v2' \| extend k3 = 'v3'`	`* \| extend k1 = 'v1', k2 = 'v2', k3 = 'v3'`
Replace extend + project-away with project-rename.	`* \| extend method = request_method \| project-away request_method`	`* \| project-rename method = request_method`
Modify field values in place instead of using extend to create new fields, unless necessary.	`* \| extend newContent = json_extract(content, 'abc \d+') \| project-away content`	`* \| extend content = json_extract(content, 'abc \d+')`

Time fields

During SPL execution, the data type of time fields in Simple Log Service logs is always INTEGER or BIGINT. These fields include the data timestamp field __time__ and the nanosecond part of the data timestamp field __time_ns_part__.

To update the data timestamp, use the extend instruction and ensure the new value is an INTEGER or BIGINT. Other instructions cannot operate on time fields. Their behavior is as follows:

project, project-away, and project-rename: These instructions retain time fields by default. You cannot rename or overwrite them.
parse-regexp and parse-json: If the extraction result includes time fields, they are ignored.

Example

Extract time field values from an existing time string.

SPL statement

* 
| parse-regexp time, '([\d\-\s:]+)\.(\d+)' as ts, ms
| extend ts=date_parse(ts, '%Y-%m-%d %H:%i:%S')
| extend __time__=cast(to_unixtime(ts) as INTEGER)
| extend __time_ns_part__=cast(ms as INTEGER) * 1000000
| project-away ts, ms

Input data
```
time: '2023-11-11 01:23:45.678'
```

Output result

__time__: 1699637025
__time_ns_part__: 678000000
time: '2023-11-11 01:23:45.678'

Field names with special characters

If a field name contains spaces or other special characters, enclose it in double quotation marks (") when you reference it. For example, if a field is named A B, which contains a space, reference it as "A B" in an SPL statement:

* | where "A B" like '%error%'

Case-insensitive field names

In SLS scan queries, field names referenced in SPL instructions are case-insensitive. For example, if a log contains a field named Method, you can reference it as method or METHOD in an SPL instruction.

Important

This applies to the scan query feature of Simple Log Service. For more information, see Scan queries.

Example

Use a case-insensitive field name in a where clause.

SPL statement
```
* | where METHOD like 'Post%'
```
Input data
```
Method: 'PostLogstoreLogs'
```
Output result
```
Method: 'PostLogstoreLogs'
```

Handle field name conflicts

During log upload or SPL execution, case-sensitive processing can cause field name conflicts. For example, a raw log might contain both Method and method fields. SPL resolves these conflicts differently depending on the scenario.

To avoid these situations, you can standardize the field names in your raw logs.

Conflicts in input data

If a raw log contains fields with names that are the same when case is ignored, such as Status and status, SPL randomly selects one field for input and discards the other. For example:

SPL statement

* | extend status_cast = cast(status as bigint)

Input data
```
Status: '200'
status: '404'
```

Processing result

Possibility 1: The value of the Status field is retained.

Status: '200' -- The first column is retained, and the second column is discarded.
status_cast: '200'

Possibility 2: The value of the status field is retained.

status: '404' -- The second column is retained, and the first column is discarded.
Status_cast: '404'

Conflicts in output results

Scenario 1: Raw data field conflicts

During SPL execution, fields with names that are the same when case is ignored may be generated. SPL randomly selects one of them for the output. For example, if a log field contains a JSON string, using the parse-json instruction might create fields with conflicting names:

SPL statement
```
* | parse-json content
```

Input data

content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}'

Output result

Possibility 1: The Method field is retained.

content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}'
Method: 'PostLogs' -- The Method field is retained.
status: '200'

Possibility 2: The method field is retained.

content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}'
method: 'GetLogs' -- The method field is retained.
status: '200'

Scenario 2: Conflicts with newly generated data fields

To avoid ambiguity, SPL preserves the case of new field names that are explicitly generated by instructions. This applies to field names generated by the extend instruction and field names explicitly specified using as in parse-regexp and parse-csv instructions.

For example, if you use extend to create a new field Method, the resulting field name remains Method.

SPL statement
```
* | extend Method = 'Post'
```
Input data
```
Status: '200'
```
Output result
```
Status: '200'
Method: 'Post'
```

Handle SLS reserved field conflicts

Important

This applies to the real-time consumption and scan query features of Simple Log Service.

For a complete list of reserved fields, see Reserved fields. SPL reads data from the LogGroup structure as input. For more information about the LogGroup definition, see Data encoding. If raw data written to Simple Log Service is not encoded in the standard LogGroup format, some reserved fields may appear in LogContent instead of their standard location. SPL handles these reserved fields as follows:

For the __source__, __topic__, __time__, and __time_ns_part__ fields, SPL reads their values from the standard LogGroup structure and ignores any fields with the same names in LogContent.
For tag fields with the __tag__: prefix, SPL first attempts to read their values from the standard LogGroup structure. If a value is not found, SPL reads it from LogContent. For example, for the __tag__:ip field, SPL first attempts to read the field with the key ip from the LogTag list. If the field does not exist, SPL then reads the log field with the key __tag__:ip from the custom log fields in LogContent.

The line field for full-text search

Important

This applies to the SLS scan query feature.

To filter raw logs in the console or by using the GetLogstoreLogs API operation, use the __line__ field.

Example

Search for the keyword error in logs.
```
* | where __line__ like '%error%'
```
If a log contains a field named `__line__`, enclose the field name in backticks to reference it.
```
* | where `__line__` ='20'
```

Value retention and overwrite policies

When an SPL instruction is executed and an output field has the same name as an existing input field, the field value is determined by the following policy:

Important

These policies do not apply to the extend instruction. For extend, if a field name conflicts, the new value is always used.

Inconsistent data types between old and new values

The original value of the input field is retained.

Example

Example 1: A renamed field from a project instruction has a conflicting name.

SPL statement

* 
| extend status=cast(status as BIGINT) -- Convert the type of the status field to BIGINT.
| project code=status -- The new value's type (BIGINT) differs from the old value's type (VARCHAR), so the old value is kept.

Input data
```
status: '200'
code: 'Success'
```
Output result
```
code: 'Success'
```

Example 2: An extracted field from a parse-json instruction has a conflicting name.

SPL statement

* 
| extend status=cast(status as BIGINT) -- Convert the type of the status field to BIGINT.
| parse-json content -- The old type of status is BIGINT and the new type is VARCHAR. The old value is retained.

Input data

status: '200'
content: '{"status": "Success", "body": "this is test"}'

Output result

content: '{"status": "Success", "body": "this is test"}'
status: 200
body: 'this is test'

Consistent data types between old and new values

If the input value is null, the new value is used. Otherwise, the behavior is determined by the mode parameter in the instruction, as defined in the following table.

Important

If the mode parameter is not specified in the instruction, its default value is overwrite.

Mode	Description
overwrite	Overwrites the old value with the new value.
preserve	Retains the old value and discards the new value.

Example

Example 1: A renamed field from a project instruction has a conflicting name and the same type. The default mode is overwrite.

SPL statement

* | project code=status -- The old and new types of code are both VARCHAR. The new value is used based on the overwrite mode.

Input data
```
status: '200'
code: 'Success'
```
Output result
```
code: '200'
```

Example 2: An extracted field from a parse-json instruction has a conflicting name and the same type. The default mode is overwrite.

SPL statement

* | parse-json content -- The old and new types of status are both VARCHAR. The new value is used based on the overwrite mode.

Input data

status: '200'
content: '{"status": "Success", "body": "this is test"}'

Output result

content: '{"status": "Success", "body": "this is test"}'
status: 'Success'
body: 'this is test'

Example 3: An extracted field from a parse-json instruction has a conflicting name and the same type. The mode is set to preserve.

SPL statement

* | parse-json -mode='preserve' content -- The old and new types of status are both VARCHAR. The old value is retained based on the preserve mode.

Input data

status: '200'
content: '{"status": "Success", "body": "this is test"}'

Output result

content: '{"status": "Success", "body": "this is test"}'
status: '200'
body: 'this is test'

Data type conversion

Initial type

When SPL processes data, all input fields are initially of type VARCHAR, except for log time fields. If subsequent processing requires a different data type, you must perform explicit type conversion.

Example

To filter access logs with a status code of 5xx, you must convert the status field to the BIGINT type before the comparison.

* -- The initial type of the status field is VARCHAR.
| where cast(status as BIGINT) >= 500 -- Convert the type of the status field to BIGINT, and then perform the comparison.

Type retention

After you use the extend instruction to convert the data type of a field, subsequent processing retains the converted type.

Example

* -- A Logstore is used as input data. Except for time fields, all fields are initially of the VARCHAR type.
| where __source__='127.0.0.1' -- Filter on the __source__ field.
| extend status=cast(status as BIGINT) -- Convert the type of the status field to BIGINT.
| project status, content
| where status>=500 -- The type of the status field remains BIGINT, so it can be directly compared with the number 500.

Handle null values in SPL expressions

Generating null values

During SPL data processing, null values are generated in the following scenarios:

If a field used in an SPL expression does not exist in the input data, its value is treated as null during calculations.
If an exception occurs during the calculation of an SPL expression, the result is null. For example, a cast type conversion fails or an array index is out of bounds.

Example

If a field does not exist, its value is treated as null in calculations.

SPL statement

* | extend withoutStatus=(status is null)

Input data

# Entry 1
status: '200'
code: 'Success'

# Entry 2
code: 'Success'

Output result

# Entry 1
status: '200'
code: 'Success'
withoutStatus: false

# Entry 2
code: 'Success'
withoutStatus: true

If an exception occurs during calculation, the result is null.

SPL statement

*
| extend code=cast(code as BIGINT) -- Failed to convert the code field to BIGINT.
| extend values=json_parse(values)
| extend values=cast(values as ARRAY(BIGINT))
| extend last=arr[10] -- Array index out of bounds.

Input data

status: '200'
code: 'Success'
values: '[1,2,3]'

Output result

status: '200'
code: null
values: [1, 2, 3]
last: null

Eliminating null values

To eliminate null values during calculations, use the COALESCE expression. It returns the first non-null value from a list of expressions. You can also specify a default value to use if all expressions evaluate to null.

Example

Read the last element of an array. If the array is empty, the default value is 0.

SPL statement

*
| extend values=json_parse(values)
| extend values=cast(values as ARRAY(BIGINT))
| extend last=COALESCE(values[3], values[2], values[1], 0)

Input data

# Entry 1
values: '[1, 2, 3]'

# Entry 2
values: '[]'

Output result

# Entry 1
values: [1, 2, 3]
last: 3

# Entry 2
values: []
last: 0

Character escaping

Single quotation marks

Single quotation marks are used to enclose a string literal. If the string literal contains a single quotation mark, you must use an additional single quotation mark to escape it.

Example

SPL statement

*
| extend user = 'Alice'
| extend phone = 'Alice''s Phone'

Output result
```
user: Alice
phone: Alice's Phone
```

Double quotation marks

Double quotation marks are used to enclose a field name. If the field name contains a double quotation mark, you must use an additional double quotation mark to escape it.

Example

SPL statement

*
| extend user_name = 'Alice'
| extend "user name" = 'Alice'
| extend "user""name" = 'Alice'

Output result

user_name: Alice
user name: Alice
user"name: Alice

Other special characters

Example 1

In SPL, the backslash (\) is not an escape character and is therefore preserved as is.

SPL statement

*
| extend a = 'foo\tbar'
| extend b = 'foo\nbar'

Output result
```
a: foo\tbar
b: foo\nbar
```

Example 2

If a string must include special characters, such as a tab character or a line feed, you can use the chr function to concatenate strings.

SPL statement

*
| extend a = concat('foo', chr(9), 'bar')
| extend b = concat('foo', chr(10), 'bar')

Output result
```
a: foo	bar
b: foo
bar
```

Error handling

Syntax error

Syntax errors occur when an SPL statement is malformed, such as an incorrect instruction name, a keyword reference error, or a type specification error. When a syntax error occurs, SPL does not process any data. Correct the statement based on the error message.

Data error

A data error occurs when a function or conversion fails during SPL execution. SPL sets the resulting field to null. Because data errors can occur on any row, SPL randomly samples and returns only some error messages. You can ignore these errors or modify the SPL statement based on the data content.

Data errors do not stop the entire execution process. The SPL statement still returns results, but the field where the error occurred is set to null.

Execution timeout

An SPL statement contains various instructions, and their running time varies depending on the data scenario. If the total running time exceeds the default timeout period, the execution stops and a timeout error is returned with an empty result. The default timeout period varies for scan queries, real-time consumption, and Logtail collection.

To resolve this error, simplify your SPL statement to reduce its complexity, for example, by optimizing complex regular expressions or reducing the number of pipelines.

Memory limit exceeded

An SPL statement contains various instructions, and their memory consumption varies depending on the data scenario. SPL execution is limited to a specific memory quota. If this quota is exceeded, the execution fails and a memory limit exceeded error is returned with an empty result. The default memory quota varies for scan queries, real-time consumption, and Logtail collection.

To resolve this error, simplify your SPL statement to reduce its complexity, reduce the number of pipelines, or check whether the raw data size is too large.

Suggestion	Before optimization	After optimization
Combine consecutive where clauses into one.	`* \| where request_method = 'POST' \| where cast(status as integer) >= 500`	`* \| where request_method = 'POST' and cast(status as integer) >= 500`
Combine consecutive extend clauses into one.	`* \| extend k1 = 'v1' \| extend k2 = 'v2' \| extend k3 = 'v3'`	`* \| extend k1 = 'v1', k2 = 'v2', k3 = 'v3'`
Replace extend + project-away with project-rename.	`* \| extend method = request_method \| project-away request_method`	`* \| project-rename method = request_method`
Modify field values in place instead of using extend to create new fields, unless necessary.	`* \| extend newContent = json_extract(content, 'abc \d+') \| project-away content`	`* \| extend content = json_extract(content, 'abc \d+')`

SPL statement optimization

Handle special fields

Time fields

Example

Field names with special characters

Case-insensitive field names

Example

Handle field name conflicts

Conflicts in input data

Conflicts in output results

Handle SLS reserved field conflicts

The __line__ field for full-text search

Example

Value retention and overwrite policies

Inconsistent data types between old and new values

Example

Consistent data types between old and new values

Example

Data type conversion

Initial type

Example

Type retention

Example

Handle null values in SPL expressions

Generating null values

Example

Eliminating null values

Example

Character escaping

Single quotation marks

Example

Double quotation marks

Example

Other special characters

Example 1

Example 2

Error handling

Syntax error

Data error

Execution timeout

Memory limit exceeded

The line field for full-text search