You can filter and clean massive log data using Simple Log Service Processing Language (SPL) instructions and SQL functions to standardize data formats. This topic describes common scenarios and related operations for filtering and cleaning data.
Scenario 1: Filter logs (where instruction)
Use the where instruction to filter logs. Common transformation rules:
where <bool-expression>
Examples:
Sub-scenario 1: Filter data entries based on field content.
Raw logs
# Log 1 __source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214851 __topic__: app class: test_case id: 7992 test_string: <function test1 at 0x1027401e0> # Log 2 __source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214861 __topic__: web class: test_case id: 7992 test_string: <function test1 at 0x1027401e0>
SPL statement
Discard logs where the __topic__ field is app.
* | where __topic__!='app'
Output
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214861 __topic__: web class: test_case id: 7992 test_string: <function test1 at 0x1027401e0>
Sub-scenario 2: Filter data entries using regular expressions that match field names.
Raw logs
# Log 1 __source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214851 __topic__: app class: test_case id: 7992 test_string: <function test1 at 0x1027401e0> server_protocol: test # Log 2 __source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214861 __topic__: web class: test_case id: 7992 test_string: <function test1 at 0x1027401e0> server_protocol: 14861
SPL statement
Retain fields where server_protocol is a number.
* | where regexp_like(server_protocol, '\d+')
Output
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214861 __topic__: web class: test_case id: 7992 test_string: <function test1 at 0x1027401e0> server_protocol: 14861
Scenario 2: Assign values to missing fields in logs (extend and parse-regexp instructions)
Use the extend and parse-regexp instructions to filter logs. Examples:
Sub-scenario 1: Assign a value to a field if the field does not exist or is empty.
* | extend <output>=<expression>, ...
Input
name:
SPL statement: Assign a value to the name field
* | extend name='lily'
Output
name:lily
Sub-scenario 2: Extract structured content from text fields using regular expressions.
* | parse-regexp -flags=<flags> <field>, <pattern> as <output>, ...
Input
content: '10.0.0.0 GET /index.html 15824 0.043'
SPL statement
* | parse-regexp content, '(\S+)' as ip | parse-regexp content, '\S+\s+(\w+)' as method
Output
content: '10.0.0.0 GET /index.html 15824 0.043' ip: '10.0.0.0' method: 'GET'
Sub-scenario 3: Assign values to multiple fields.
* | extend <output>=<expression> | extend <output1>=<expression> | <output2>=<expression>
Input
__source__: 192.168.0.1 __topic__: __tag__: __receive_time__: id: 7990 test_string: <function test1 at 0x1020401e0>
SPL statement
Assign values to the __topic__, __tag__, and __receive_time__ fields.
* | extend __topic__='app' | extend __tag__='stu' | extend __receive_time__='1597214851'
Output
__source__: 192.168.0.1 __topic__: app __tag__: stu __receive_time__: 1597214851 id: 7990 test_string: <function test1 at 0x1020401e0>
Scenario 3: Delete and rename fields (project-away and project-rename instructions)
Use the project-away and project-rename instructions to delete and rename fields.
Sub-scenario 1: Delete specific fields.
* | project-away -wildcard-off <field-pattern>, ...
Input
content: 123 age: 23 name: twiss
SPL statement
* | project-away age, name
Output
content: 123
Sub-scenario 2: Rename specific fields.
* | project-rename <output>=<field>, ...
Input
content: 123 age: 23 name: twiss
SPL statement
* | project-rename new_age=age, new_name=name
Output
content: 123 new_age: 23 new_name: twiss
Scenario 4: Convert log parameter types
Sub-scenario 1: Call the concat function to concatenate strings.
Input
x: 123 y: 100
SPL statement
* | extend a=cast(x as bigint) + cast(y as bigint)| extend b=concat(x, y)
Output
x: 123 y: 100 a: 223 b: 123100
Sub-scenario 2: Convert strings or datetime values to standard time.
The following example uses the to_unixtime function to convert the datetime value represented by time1 to a UNIX timestamp.
Input
time1: 2020-09-17 9:00:00
SPL statement
Convert the datetime value represented by time1 to a UNIX timestamp.
* | extend time1=cast(time1 as TIMESTAMP) | extend new_time=to_unixtime(time1)
Output
time1: 2020-09-17 9:00:00 time2: 1600333200.0
Scenario 5: Fill default values for non-existent fields in logs (COALESCE expression)
Use the COALESCE expression to fill default values for non-existent fields.
Input
server_protocol: 100
SPL statement
If server_protocol exists, y is assigned the value of server_protocol. If server_protocol1 does not exist, x is assigned the value 200.
* | extend x=COALESCE(server_protocol1, '200') | extend y=COALESCE(server_protocol, '200')
Output
server_protocol: 100 x: 200 y: 100
Scenario 6: Evaluate logs and add fields (where and extend combined instructions)
Use the where and extend combined instructions.
* | where <bool-expression> | extend <output>=<expression> |...
Examples:
Input
status1: 200 status2: 404
SPL statement
* | where status1='200'| extend status1_info='normal' | where status2='404'| extend status2_info='error'
Output
status1: 200 status2: 404 status1_info: normal status2_info: error