This topic describes how to use Structured Process Language (SPL) in different scenarios.
SPL statement optimization
You can often write SPL statements in several ways to achieve the same data processing result. Concise and efficient SPL statements simplify maintenance and improve performance. The following table provides some suggestions.
|
Suggestion |
Before optimization |
After optimization |
|
Combine consecutive where clauses into one. |
|
|
|
Combine consecutive extend clauses into one. |
|
|
|
Replace extend + project-away with project-rename. |
|
|
|
Modify field values in place instead of using extend to create new fields, unless necessary. |
|
|
Handle special fields
Time fields
During SPL execution, the data type of the time field of a Simple Log Service log is always INTEGER or BIGINT. These fields include the data timestamp field __time__ and the nanosecond part of the data timestamp field __time_ns_part__.
To update the data timestamp, use the extend instruction and ensure the new value is an INTEGER or BIGINT. Other instructions cannot operate on time fields. Their behavior is as follows:
-
project, project-away, and project-rename: These instructions retain time fields by default. You cannot rename or overwrite them.
-
parse-regexp and parse-json: If the extraction result includes time fields, they are ignored.
Example
Extract time field values from an existing time string.
-
SPL statement
* | parse-regexp time, '([\d\-\s:]+)\.(\d+)' as ts, ms | extend ts=date_parse(ts, '%Y-%m-%d %H:%i:%S') | extend __time__=cast(to_unixtime(ts) as INTEGER) | extend __time_ns_part__=cast(ms as INTEGER) * 1000000 | project-away ts, ms -
Input data
time: '2023-11-11 01:23:45.678' -
Output result
__time__: 1699637025 __time_ns_part__: 678000000 time: '2023-11-11 01:23:45.678'
Field names with special characters
If a field name contains spaces or other special characters, enclose it in double quotation marks (") when you reference it. For example, if a field is named A B, which contains a space, you can reference it as "A B" in an SPL statement. The following example shows how to do this:
* | where "A B" like '%error%'
Case-insensitive field names
In SLS scan queries, field names referenced in SPL instructions are case-insensitive. For example, if a log contains a field named Method, you can reference it as method or METHOD in an SPL instruction.
This applies to the scan query feature of Simple Log Service. For more information, see Scan queries.
Example
Use a case-insensitive field name in a where clause.
-
SPL statement
* | where METHOD like 'Post%' -
Input data
Method: 'PostLogstoreLogs' -
Output result
Method: 'PostLogstoreLogs'
Handle field name conflicts
During log upload or SPL execution, case-sensitive processing can cause field name conflicts. For example, a raw log might contain both Method and method fields. SPL resolves these conflicts differently depending on the scenario.
To avoid these situations, you can standardize the field names in your raw logs.
Conflicts in input data
If a raw log contains fields with names that are the same when case is ignored, such as Status and status, SPL randomly selects one field for input and discards the other. For example:
-
SPL statement
* | extend status_cast = cast(status as bigint) -
Input data
Status: '200' status: '404' -
Processing result
-
Possibility 1: The value of the Status field is retained.
Status: '200' -- The first column is retained, and the second column is discarded. status_cast: '200' -
Possibility 2: The value of the status field is retained.
status: '404' -- The second column is retained, and the first column is discarded. Status_cast: '404'
-
Conflicts in output results
Scenario 1: Raw data field conflicts
During SPL execution, fields with names that are the same when case is ignored may be generated. In this case, SPL randomly selects one of them for the output. For example, if a log field contains a JSON string, using the parse-json instruction might create fields with conflicting names. For example:
-
SPL statement
* | parse-json content -
Input data
content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}' -
Output result
-
Possibility 1: The Method field is retained.
content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}' Method: 'PostLogs' -- The Method field is retained. status: '200' -
Possibility 2: The method field is retained.
content: '{"Method": "PostLogs", "method": "GetLogs", "status": "200"}' method: 'GetLogs' -- The method field is retained. status: '200'
-
Scenario 2: Conflicts with newly generated data fields
To avoid ambiguity, SPL preserves the case of new field names that are explicitly generated by instructions. This applies to field names generated by the extend instruction and field names explicitly specified using as in parse-regexp and parse-csv instructions.
For example, if you use extend to create a new field Method, the resulting field name remains Method.
-
SPL statement
* | extend Method = 'Post' -
Input data
Status: '200' -
Output result
Status: '200' Method: 'Post'
Handle SLS reserved field conflicts
This applies to the real-time consumption and scan query features of Simple Log Service.
For a complete list of reserved fields, see Reserved fields. SPL reads data from the LogGroup structure as input. For more information about the LogGroup definition, see Data encoding. If raw data written to Simple Log Service is not encoded in the standard LogGroup format, some reserved fields may be in LogContent instead of their standard location. SPL handles these reserved fields as follows:
-
For the
__source__,__topic__,__time__, and__time_ns_part__fields, SPL reads their values from the standard LogGroup structure and ignores any fields with the same names in LogContent. -
For tag fields with the
__tag__:prefix, SPL first attempts to read their values from the standard LogGroup structure. If a value is not found, SPL reads it from LogContent. For example, for the__tag__:ipfield, SPL first attempts to read the field with the keyipfrom the LogTag list. If the field does not exist, SPL then reads the log field with the key__tag__:ipfrom the custom log fields in LogContent.
The __line__ field for full-text search
This applies to the SLS scan query feature.
To filter raw logs in the console or when using the GetLogstoreLogs API operation, you can use the __line__ field.
Example
-
Search for the keyword error in logs.
* | where __line__ like '%error%' -
If a log contains a field named
`__line__`, enclose the field name in backticks to reference it.* | where `__line__` ='20'
Value retention and overwrite policies
When an SPL instruction is executed, if an output field has the same name as an existing field in the input data, the policy for determining the field's value is as follows:
The policies for retaining and overwriting field values do not apply to the extend instruction. For the extend instruction, if a field name conflicts, the new value is always used.
Inconsistent data types between old and new values
The original value of the input field is retained.
Example
-
Example 1: A renamed field from a project instruction has a conflicting name.
-
SPL statement
* | extend status=cast(status as BIGINT) -- Convert the type of the status field to BIGINT. | project code=status -- The new value's type (BIGINT) differs from the old value's type (VARCHAR), so the old value is kept. -
Input data
status: '200' code: 'Success' -
Output result
code: 'Success'
-
-
Example 2: An extracted field from a parse-json instruction has a conflicting name.
-
SPL statement
* | extend status=cast(status as BIGINT) -- Convert the type of the status field to BIGINT. | parse-json content -- The old type of status is BIGINT and the new type is VARCHAR. The old value is retained. -
Input data
status: '200' content: '{"status": "Success", "body": "this is test"}' -
Output result
content: '{"status": "Success", "body": "this is test"}' status: 200 body: 'this is test'
-
Consistent data types between old and new values
If the input value is null, the new value is used. Otherwise, the behavior is determined by the mode parameter in the instruction, as defined in the following table.
If the mode parameter is not specified in the instruction, its default value is overwrite.
|
Mode |
Description |
|
overwrite |
Overwrites the old value with the new value. |
|
preserve |
Retains the old value and discards the new value. |
Example
-
Example 1: A renamed field from a project instruction has a conflicting name and the same type. The default mode is overwrite.
-
SPL statement
* | project code=status -- The old and new types of code are both VARCHAR. The new value is used based on the overwrite mode.-
Input data
status: '200' code: 'Success' -
Output result
code: '200'
-
-
Example 2: An extracted field from a parse-json instruction has a conflicting name and the same type. The default mode is overwrite.
-
SPL statement
* | parse-json content -- The old and new types of status are both VARCHAR. The new value is used based on the overwrite mode. -
Input data
status: '200' content: '{"status": "Success", "body": "this is test"}' -
Output result
content: '{"status": "Success", "body": "this is test"}' status: 'Success' body: 'this is test'
-
-
Example 3: An extracted field from a parse-json instruction has a conflicting name and the same type. The mode is set to preserve.
-
SPL statement
* | parse-json -mode='preserve' content -- The old and new types of status are both VARCHAR. The old value is retained based on the preserve mode. -
Input data
status: '200' content: '{"status": "Success", "body": "this is test"}' -
Output result
content: '{"status": "Success", "body": "this is test"}' status: '200' body: 'this is test'
-
Data type conversion
Initial type
When SPL processes data, the initial data type of all input fields is VARCHAR, except for log time fields. If subsequent processing logic involves different data types, you must perform data type conversion.
Example
To filter access logs with a status code of 5xx, you must convert the status field to the BIGINT type before the comparison.
* -- The initial type of the status field is VARCHAR.
| where cast(status as BIGINT) >= 500 -- Convert the type of the status field to BIGINT, and then perform the comparison.
Type retention
During SPL data processing, after you use the extend instruction to convert the data type of a field, subsequent processing logic uses the converted data type.
Example
* -- A Logstore is used as input data. Except for time fields, all fields are initially of the VARCHAR type.
| where __source__='127.0.0.1' -- Filter on the __source__ field.
| extend status=cast(status as BIGINT) -- Convert the type of the status field to BIGINT.
| project status, content
| where status>=500 -- The type of the status field remains BIGINT, so it can be directly compared with the number 500.
Handle null values in SPL expressions
Generating null values
During SPL data processing, null values are generated in the following two scenarios:
-
If a field used in an SPL expression does not exist in the input data, its value is treated as null during calculations.
-
If an exception occurs during the calculation of an SPL expression, the result is null. For example, a cast type conversion fails or an array index is out of bounds.
Example
-
If a field does not exist, its value is treated as null in calculations.
-
SPL statement
* | extend withoutStatus=(status is null) -
Input data
# Entry 1 status: '200' code: 'Success' # Entry 2 code: 'Success' -
Output result
# Entry 1 status: '200' code: 'Success' withoutStatus: false # Entry 2 code: 'Success' withoutStatus: true
-
-
If an exception occurs during calculation, the result is null.
-
SPL statement
* | extend code=cast(code as BIGINT) -- Failed to convert the code field to BIGINT. | extend values=json_parse(values) | extend values=cast(values as ARRAY(BIGINT)) | extend last=arr[10] -- Array index out of bounds. -
Input data
status: '200' code: 'Success' values: '[1,2,3]' -
Output result
status: '200' code: null values: [1, 2, 3] last: null
-
Eliminating null values
To eliminate null values during calculations, you can use the COALESCE expression. This expression returns the first non-null value from a list of expressions. You can also set a default value to use if all expressions evaluate to null.
Example
Read the last element of an array. If the array is empty, the default value is 0.
-
SPL statement
* | extend values=json_parse(values) | extend values=cast(values as ARRAY(BIGINT)) | extend last=COALESCE(values[3], values[2], values[1], 0) -
Input data
# Entry 1 values: '[1, 2, 3]' # Entry 2 values: '[]' -
Output result
# Entry 1 values: [1, 2, 3] last: 3 # Entry 2 values: [] last: 0
Character escaping
Single quotation marks
Single quotation marks are used to enclose a string literal. If the string literal contains a single quotation mark, you must use an additional single quotation mark to escape it.
Example
-
SPL statement
* | extend user = 'Alice' | extend phone = 'Alice''s Phone' -
Output result
user: Alice phone: Alice's Phone
Double quotation marks
Double quotation marks are used to enclose a field name. If the field name contains a double quotation mark, you must use an additional double quotation mark to escape it.
Example
-
SPL statement
* | extend user_name = 'Alice' | extend "user name" = 'Alice' | extend "user""name" = 'Alice' -
Output result
user_name: Alice user name: Alice user"name: Alice
Other special characters
Example 1
In SPL, the backslash (\) is not an escape character and is therefore preserved as is.
-
SPL statement
* | extend a = 'foo\tbar' | extend b = 'foo\nbar' -
Output result
a: foo\tbar b: foo\nbar
Example 2
If a string must include special characters, such as a tab character or a line feed, you can use the chr function to concatenate strings.
-
SPL statement
* | extend a = concat('foo', chr(9), 'bar') | extend b = concat('foo', chr(10), 'bar') -
Output result
a: foo bar b: foo bar
Error handling
Syntax error
Syntax errors occur when an SPL statement is malformed, such as having an incorrect instruction name, a keyword reference error, or a type specification error. When a syntax error occurs, SPL does not process any data. You must correct the statement based on the error message.
Data error
A data error occurs when a function or conversion fails during SPL execution. SPL sets the resulting field to null. Because data errors can occur on any row, SPL randomly samples and returns only some of the error messages. You can ignore these errors or modify the SPL statement based on the data content.
Data errors do not stop the entire execution process. The SPL statement still returns results, but the value of the field where the error occurred is null. You can ignore these errors as needed.
Execution timeout
An SPL statement contains various instructions, and their running time varies depending on the data scenario. If the total running time of an SPL statement exceeds the default timeout period, the execution stops and a timeout error is returned. In this case, the execution result is empty. The default timeout period can vary for scan queries, real-time consumption, and Logtail collection.
If you encounter this error, you can adjust your SPL statement to reduce its complexity, for example, by simplifying complex regular expressions or reducing the number of pipelines.
Memory limit exceeded
An SPL statement contains various instructions, and their memory consumption varies depending on the data scenario. SPL execution is limited to a specific memory quota. If this quota is exceeded, the execution fails and a memory limit exceeded error is returned. In this case, the execution result is empty. The default memory quota can vary for scan queries, real-time consumption, and Logtail collection.
If you encounter this error, you can adjust your SPL statement to reduce its complexity, reduce the number of pipelines, or check if the raw data size is too large.