All Products
Search
Document Center

Simple Log Service:Data filtering and cleaning

Last Updated:Dec 09, 2025

Filter and clean massive log data and standardize data formats using Structured Process Language (SPL) instructions and SQL functions. This topic describes common data filtering and cleaning use cases and their corresponding operations.

Use case 1: Filter logs (where instruction)

Use the where instruction to filter logs. The basic syntax is as follows:

where <bool-expression>

The following are examples:

Example 1: Filter data entries based on field content.

  • Raw Logs

    #Log 1
    __source__:  192.168.0.1
    __tag__:__client_ip__:  192.168.0.2
    __tag__:__receive_time__:  1597214851
    __topic__: app
    class:  test_case
    id:  7992
    test_string:  <function test1 at 0x1027401e0>
    #Log 2
    __source__:  192.168.0.1
    __tag__:__client_ip__:  192.168.0.2
    __tag__:__receive_time__:  1597214861
    __topic__: web
    class:  test_case
    id:  7992
    test_string:  <function test1 at 0x1027401e0>
  • SPL statement

    Discard logs where the __topic__ field is app.

    * | where __topic__!='app'
  • Output

    __source__:  192.168.0.1
    __tag__:__client_ip__:  192.168.0.2
    __tag__:__receive_time__:  1597214861
    __topic__: web
    class:  test_case
    id:  7992
    test_string:  <function test1 at 0x1027401e0>

Example 2: Use a regular expression to filter data entries based on a field value.

  • Raw Logs

    #Log 1
    __source__:  192.168.0.1
    __tag__:__client_ip__:  192.168.0.2
    __tag__:__receive_time__:  1597214851
    __topic__: app
    class:  test_case
    id:  7992
    test_string:  <function test1 at 0x1027401e0>
    server_protocol: test
    #Log 2
    __source__:  192.168.0.1
    __tag__:__client_ip__:  192.168.0.2
    __tag__:__receive_time__:  1597214861
    __topic__: web
    class:  test_case
    id:  7992
    test_string:  <function test1 at 0x1027401e0>
    server_protocol: 14861
  • SPL statement

    Retain logs where the value of the server_protocol field is a number.

    * | where regexp_like(server_protocol, '^\d+$')
  • Output

    __source__:  192.168.0.1
    __tag__:__client_ip__:  192.168.0.2
    __tag__:__receive_time__:  1597214861
    __topic__: web
    class:  test_case
    id:  7992
    test_string:  <function test1 at 0x1027401e0>
    server_protocol: 14861

Use case 2: Assign values to missing log fields (extend and parse-regexp instructions)

Use the extend or parse-regexp instructions to filter logs. For example:

Example 1: Assign a value to a field if the field does not exist or is empty.

* | extend <output>=<expression>, ...
  • Input data

    name:
  • SPL statement: Assign a value to the name field

    * | extend name='lily'
  • Output

    name:lily

Example 2: Use a regular expression to extract structured content from a text field.

| parse-regexp -flags=<flags> <field>, <pattern> as <output>, ...
  • Input data

    content: '10.0.0.0 GET /index.html 15824 0.043'
  • SPL statement

    * | parse-regexp content, '(\S+)' as ip | parse-regexp content, '\S+\s+(\w+)' as method
  • Output

    content: '10.0.0.0 GET /index.html 15824 0.043'
    ip: '10.0.0.0'
    method: 'GET'

Example 3: Assign values to multiple fields.

| extend <output>=<expression> | extend <output1>=<expression> | <output2>=<expression>
  • Input data

    __source__:  192.168.0.1
    __topic__:
    __tag__:
    __receive_time__:
    id:  7990
    test_string:  <function test1 at 0x1020401e0>
  • SPL statement

    Assign values to the __topic__, __tag__, and __receive_time__ fields.

    * | extend __topic__='app' | extend __tag__='stu' | extend __receive_time__='1597214851'
  • Output

    __source__:  192.168.0.1
    __topic__:  app
    __tag__:  stu
    __receive_time__:  1597214851
    id:  7990
    test_string:  <function test1 at 0x1020401e0>

Use case 3: Delete and rename fields (project-away and project-rename instructions)

Use the project-away and project-rename instructions to delete and rename fields.

Example 1: Delete specific fields.

| project-away -wildcard-off <field-pattern>, ...
  • Input data

    content: 123
    age: 23
    name: twiss
  • SPL statement

    * | project-away age, name
  • Output

    content: 123

Example 2: Rename specific fields.

| project-rename <output>=<field>, ...
  • Input data

    content: 123
    age: 23
    name: twiss
  • SPL statement

    * | project-rename new_age=age, new_name=name
  • Output

    content: 123
    new_age: 23
    new_name: twiss

Use case 4: Convert log parameter types

Example 1: Use the concat function to concatenate strings.

  • Input data

    x: 123
    y: 100
  • SPL statement

    * | extend a=cast(x as bigint) + cast(y as bigint)| extend b=concat(x, y)
  • Output

    x: 123
    y: 100
    a: 223
    b: 123100

Example 2: Convert a string or a datetime value to a standard time format. The following example shows how to use the to_unixtime function to convert the datetime value in the time1 field to a UNIX timestamp.

  • Raw Logs

    time1: 2020-09-17 9:00:00
  • Transformation rules

    Convert the datetime value in the time1 field to a UNIX timestamp.

    * | extend time1=cast(time1 as TIMESTAMP) | extend new_time=to_unixtime(time1)
  • Output

    time1:  2020-09-17 9:00:00
    time2:  1600333200.0

Use case 5: Fill non-existent log fields with default values (COALESCE expression)

Use the COALESCE expression to fill non-existent fields with default values.

  • Input data

    server_protocol: 100
  • SPL statement

    If the server_protocol field exists, its value is assigned to the y field. If the server_protocol1 field does not exist, 200 is assigned to the x field.

    * | extend x=COALESCE(server_protocol1, '200') | extend y=COALESCE(server_protocol, '200')
  • Output

    server_protocol: 100
    x: 200
    y: 100

Use case 6: Evaluate logs and add fields (where and extend combined instructions)

Use the where and extend instructions together.

* | where <bool-expression> | extend <output>=<expression> |...

The following is an example:

  • Input data

    status1: 200
    status2: 404
  • SPL statement

    * | where status1='200'| extend status1_info='normal' | where status2='404'| extend status2_info='error'
  • Output

    status1: 200
    status2: 404
    status1_info: normal
    status2_info: error