You can use Search and Process Language (SPL) instructions and SQL functions to filter and clean large volumes of log data and standardize data formats. This topic describes common scenarios for data filtering and cleaning and the operations involved.
Scenario 1: Filter logs (where instruction)
You can use the where instruction to filter logs. The common rule is as follows:
where <bool-expression>The following are examples:
Sub-scenario 1: Filter data based on field content.
Raw logs
#Log 1 __source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214851 __topic__: app class: test_case id: 7992 test_string: <function test1 at 0x1027401e0> #Log 2 __source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214861 __topic__: web class: test_case id: 7992 test_string: <function test1 at 0x1027401e0>SPL statement
Discard logs in which the value of the __topic__ field is app.
* | where __topic__!='app'Output
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214861 __topic__: web class: test_case id: 7992 test_string: <function test1 at 0x1027401e0>
Sub-scenario 2: Filter data using a regular expression to match field names.
Raw logs
#Log 1 __source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214851 __topic__: app class: test_case id: 7992 test_string: <function test1 at 0x1027401e0> server_protocol: test #Log 2 __source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214861 __topic__: web class: test_case id: 7992 test_string: <function test1 at 0x1027401e0> server_protocol: 14861SPL statement
Retain logs in which the value of the server_protocol field is a number.
* | where regexp_like(server_protocol, '^\d+$')Output
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214861 __topic__: web class: test_case id: 7992 test_string: <function test1 at 0x1027401e0> server_protocol: 14861
Scenario 2: Assign values to empty log fields (extend and parse-regexp instructions)
You can use the extend and parse-regexp instructions to process logs. The following are examples:
Sub-scenario 1: Assign a value to a field if the field does not exist or is empty.
* | extend <output>=<expression>, ...Input data
name:SPL statement: Assign a value to the name field
* | extend name='lily'Output
name:lily
Sub-scenario 2: Use a regular expression to extract structured content from a text field.
| parse-regexp -flags=<flags> <field>, <pattern> as <output>, ...Input data
content: '10.0.0.0 GET /index.html 15824 0.043'SPL statement
* | parse-regexp content, '(\S+)' as ip | parse-regexp content, '\S+\s+(\w+)' as methodOutput
content: '10.0.0.0 GET /index.html 15824 0.043' ip: '10.0.0.0' method: 'GET'
Sub-scenario 3: Assign values to multiple fields.
| extend <output>=<expression> | extend <output1>=<expression> | <output2>=<expression>Input data
__source__: 192.168.0.1 __topic__: __tag__: __receive_time__: id: 7990 test_string: <function test1 at 0x1020401e0>SPL statement
Assign values to the __topic__, __tag__, and __receive_time__ fields.
* | extend __topic__='app' | extend __tag__='stu' | extend __receive_time__='1597214851'Output data
__source__: 192.168.0.1 __topic__: app __tag__: stu __receive_time__: 1597214851 id: 7990 test_string: <function test1 at 0x1020401e0>
Scenario 3: Delete and rename fields (project-away and project-rename instructions)
You can use the project-away and project-rename instructions to delete and rename fields.
Sub-scenario 1: Delete specific fields.
| project-away -wildcard-off <field-pattern>, ...Input data
content: 123 age: 23 name: twissSPL statement
* | project-away age, nameOutput
content: 123
Sub-scenario 2: Rename specific fields.
| project-rename <output>=<field>, ...Input data
content: 123 age: 23 name: twissSPL statement
* | project-rename new_age=age, new_name=nameOutput
content: 123 new_age: 23 new_name: twiss
Scenario 4: Convert log parameter types
Sub-scenario 1: Use the concat function to concatenate strings.
Input data
x: 123 y: 100SPL statement
* | extend a=cast(x as bigint) + cast(y as bigint)| extend b=concat(x, y)Output
x: 123 y: 100 a: 223 b: 123100
Sub-scenario 2: Convert a string or datetime value to a standard time format. The following example shows how to use the to_unixtime function to convert the datetime value in the time1 field to a UNIX timestamp.
Raw log
time1: 2020-09-17 9:00:00Transformation rule
Convert the datetime value in the time1 field to a UNIX timestamp.
* | extend time1=cast(time1 as TIMESTAMP) | extend new_time=to_unixtime(time1)Transformation result
time1: 2020-09-17 9:00:00 time2: 1600333200.0
Scenario 5: Fill non-existent log fields with default values (COALESCE expression)
You can use the COALESCE expression to populate non-existent fields with default values.
Input data
server_protocol: 100SPL statement
If the server_protocol field exists, its value is assigned to y. If the server_protocol1 field does not exist, 200 is assigned to x.
* | extend x=COALESCE(server_protocol1, '200') | extend y=COALESCE(server_protocol, '200')Output
server_protocol: 100 x: 200 y: 100
Scenario 6: Evaluate logs and add fields (combined where and extend instructions)
You can use the where and extend instructions together.
* | where <bool-expression> | extend <output>=<expression> |...The following is an example:
Input data
status1: 200 status2: 404SPL statement
* | where status1='200'| extend status1_info='normal' | where status2='404'| extend status2_info='error'Output
status1: 200 status2: 404 status1_info: normal status2_info: error