All Products
Search
Document Center

Simple Log Service:Cleanse data by using functions

Last Updated:Apr 17, 2024

You can use data transformation functions provided by Simple Log Service to cleanse large amounts of log data. This way, the formats of the log data are standardized. This topic describes how to use functions to cleanse data in various scenarios.

Scenario 1: Filter logs by using the e_keep and e_drop functions

You can filter logs by using the e_drop or e_keep function. You can also filter logs by combining the e_if function and the DROP parameter or combining the e_if_else function and the DROP parameter.

Common transformation rules:

  • e_keep(e_search(...) ): Logs that meet the specified conditions are retained, whereas logs that do not meet the specified conditions are discarded.

  • e_drop(e_search(...) ): Logs that meet the specified conditions are discarded, whereas logs that do not meet the specified conditions are retained.

  • e_if_else(e_search("..."), KEEP, DROP): Logs that meet the specified conditions are retained, whereas logs that do not meet the specified conditions are discarded.

  • e_if(e_search("not ..."), DROP): Logs that meet the specified conditions are discarded, whereas logs that do not meet the specified conditions are retained.

  • e_if(e_search("..."), KEEP): This transformation rule is invalid.

Example:

  • Raw log

    # Log 1
    __source__:  192.168.0.1
    __tag__:__client_ip__:  192.168.0.2
    __tag__:__receive_time__:  1597214851
    __topic__: app 
    class:  test_case
    id:  7992
    test_string:  <function test1 at 0x1027401e0>
    
    # Log 2
    __source__:  192.168.0.1
    class:  produce_case
    id:  7990
    test_string:  <function test1 at 0x1020401e0>
  • Transformation rule

    Discard the logs that do not contain the __topic__ and __tag__:__receive_time__ fields.

    e_if(e_not_has("__topic__"),e_drop())
    e_if(e_not_has("__tag__:__receive_time__"),e_drop())
  • Result

    __source__:  192.168.0.1
    __tag__:__client_ip__:  192.168.0.2
    __tag__:__receive_time__:  1597214851
    __topic__: app 
    class:  test_case
    id:  7992
    test_string:  <function test1 at 0x1027401e0>

Scenario 2: Assign values to empty fields in logs by using the e_set function

You can assign values to empty fields in logs by using the e_set function.

  • Sub-scenario 1: Assign a value to a field if the field does not exist or is empty.

    e_set("result", "......value......", mode="fill")

    For more information about the mode parameter value, see Field extraction check and overwrite modes.

    Example:

    • Raw log

      name:
    • Transformation rule

      e_set("name", "aspara2.0", mode="fill")
    • Result

      name:  aspara2.0
  • Sub-scenario 2: Simplify a regular expression and extract field values by using the Grok function.

    Example:

    • Raw log

      content:"ip address: 192.168.1.1"
    • Transformation rule

      Capture and extract the IP address in the content field by using the Grok function.

      e_regex("content", grok(r"(%{IP})"),"addr")
    • Result

      addr:  192.168.1.1
      content:"ip address: 192.168.1.1"
  • Sub-scenario 3: Assign values to multiple fields.

    e_set("k1", "v1", "k2", "v2", "k3", "v3", ......)

    Example:

    • Raw log

      __source__:  192.168.0.1
      __topic__:
      __tag__:
      __receive_time__:
      id:  7990
      test_string:  <function test1 at 0x1020401e0>
    • Transformation rule

      Assign values to the __topic__, __tag__, and __receive_time__ fields.

      e_set("__topic__","app", "__tag__","stu","__receive_time__","1597214851")
    • Result

      __source__:  192.168.0.1
      __topic__:  app
      __tag__:  stu
      __receive_time__:  1597214851
      id:  7990
      test_string:  <function test1 at 0x1020401e0>

Scenario 3: Delete a field and rename a field by using the e_search, e_rename, and e_compose functions

In most cases, we recommend that you use the e_compose function to evaluate data based on specified conditions and perform operations based on the evaluation result.

Example:

  • Raw log

    content:123
    age:23
    name:twiss
  • Transformation rule

    If the value of the content field is 123, delete the age and name fields. Then, rename the content field to ctx.

    e_if(e_search("content==123"),e_compose(e_drop_fields("age|name"), e_rename("content", "ctx")))
  • Result

    ctx: 123

Scenario 4: Convert the data types of fields in logs by using the v, cn_int, and dt_totimestamp functions

The fields and field values in logs are processed as strings during data transformation. Data of a non-string type is automatically converted to data of the string type. When you call a function, take note of the data types that are supported by the function. For more information, see Syntax overview.

  • Sub-scenario 1: Concatenate strings and sum up data by using the op_add function.

    The op_add function supports the string and numeric types. Therefore, data type conversion is not required.

    Example:

    • Raw log

      a : 1
      b : 2
    • Transformation rule

      e_set("d",op_add(v("a"), v("b")))
      e_set("e",op_add(ct_int(v("a")), ct_int(v("b"))))
    • Result

      a:1
      b:2
      d:12
      e:3
  • Sub-scenario 2: Convert data types by using the Field processing and ct_int functions and call the op_mul function to multiply data.

    Example:

    • Raw log

      a:2
      b:5
    • Transformation rule

      v("a") and v("b") are of the string type. The second parameter of the op_mul function can be only of a numeric type. In this case, you must convert a string to an integer by using the ct_int function and pass the integer to the op_mul function.

      e_set("c",op_mul(ct_int(v("a")), ct_int(v("b"))))
      e_set("d",op_mul(v("a"), ct_int(v("b"))))
    • Result

      a: 2
      b: 5
      c: 10
      d: 22222
  • Sub-scenario 3: Convert a string or datetime to a standard time by using the dt_parse and dt_parsetimestamp functions.

    The dt_totimestamp function supports the datetime object type. The dt_totimestamp function does not support the string type. In this case, you must call the dt_parse function to convert time1 of the string type to the datetime object type. You can also use the dt_parsetimestamp function. The dt_parsetimestamp function supports the datetime object and string types. For more information, see Date and time functions.

    Example:

    • Raw log

      time1: 2020-09-17 9:00:00
    • Transformation rule

      Convert the datetime that is specified by time1 to a UNIX timestamp.

      e_set("time1", "2019-06-03 2:41:26")
      e_set("time2", dt_totimestamp(dt_parse(v("time1")))) or e_set("time2", dt_parsetimestamp(v("time1"))) 
    • Result

      time1:  2019-06-03 2:41:26 
      time2:  1559529686 

Scenario 5: Pass default values to the fields that do not exist in logs by configuring the default parameter

Some expression functions provided by the domain-specific language (DSL) for Simple Log Service have specific requirements for input parameters. If the input parameters do not meet the requirements, the data transformation rules that use the functions return the default values or an error. If a log field is required but is left empty, you can pass the default value to the field by using the op_len function.

Important

If default values are passed to subsequent functions, errors may occur. We recommend that you handle the errors at the earliest opportunity.

  • Raw log

    data_len: 1024
  • Transformation rule

    e_set("data_len", op_len(v("data", default="")))
  • Result

    data: 0
    data_len: 0

Scenario 6: Evaluate logs based on specified conditions and add fields based on the evaluation result by using the e_if and e_switch functions

We recommend that you evaluate logs by using the e_if or e_switch function. For more information, see Flow control functions.

  • e_if function

    e_if(Condition 1, Operation 1, Condition 2, Operation 2, Condition 3, Operation 3, ....)
  • e_switch function

    When you use the e_switch function, you must specify condition-operation pairs. The e_switch function evaluates the conditions in sequence. If a condition is met, its paired operation is performed and the operation result is returned. If a condition is not met, its paired operation is not performed and the next condition is evaluated. If no conditions are met and the default field is specified, the operation that is specified by default is performed and the operation result is returned.

    e_switch(Condition 1, Operation 1, Condition 2, Operation 2, Condition 3, Operation 3, ...., default=None)

Example:

  • Raw log

    status1: 200
    status2: 404
  • e_if function

    • Transformation rule

      e_if(e_match("status1", "200"), e_set("status1_info", "normal"),
           e_match("status2", "404"), e_set("status2_info", "error"))
    • Result

      status1: 200
      status2: 404
      status1_info: normal
      status2_info: error
  • e_switch function

    • Transformation rule

      e_switch(e_match("status1", "200"), e_set("status1_info", "normal"), 
               e_match("status2", "404"), e_set("status2_info", "error"))
    • Result

      The e_switch function evaluates the conditions in sequence. If a condition is met, the operation result is returned and no more conditions are evaluated.

      status1: 200
      status2: 404
      status1_info: normal

Scenario 7: Convert UNIX timestamps to log time values that are accurate to the nanosecond

In some data transformation scenarios, the timestamp of data must be accurate to the nanosecond. If a raw log contains a field whose value is a UNIX timestamp, you can use field processing functions to convert the field value into a log time that is accurate to the nanosecond.

  • Raw log

    {
      "__source__": "1.2.3.4",
      "__time__": 1704983810,
      "__topic__": "test",
      "log_time_nano":"1705043680630940602"
    }
  • Transformation rule

    e_set(
        "__time__", op_div_floor(ct_int(v("log_time_nano")), 1000000000),
    )
    e_set(
        "__time_ns_part__", op_mod(ct_int(v("log_time_nano")), 1000000000),
    )
  • Result

    {
      "__source__": "1.2.3.4",
      "__time__": 1705043680,
      "__time_ns_part__": 630940602,
      "__topic__": "test",
      "log_time_nano":"1705043680630940602"
    }

Scenario 8: Convert UNIX timestamps that follow the ISO 8601 standard to log time values that are accurate to the microsecond

In some data transformation scenarios, high-precision timestamps are required. If a raw log contains a field whose value follows the ISO 8601 standard, you can use field processing functions to convert the field value into a log time that is accurate to the microsecond.

  • Raw log

    {
      "__source__": "1.2.3.4",
      "__time__": 1704983810,
      "__topic__": "test",
      "log_time":"2024-01-11 23:10:43.992847200"
    }
  • Transformation rule

    e_set(
        "__time__", dt_parsetimestamp(v("log_time"), tz="Asia/Shanghai"), mode="overwrite",
    )
    e_set("tmp_ms", dt_prop(v("log_time"), "microsecond"))
    e_set(
        "__time_ns_part__", op_mul(ct_int(v("tmp_ms")), 1000),
    )
  • Result

    {
      "__source__": "1.2.3.4",
      "__time__": 1704985843,
      "__time_ns_part__": 992847000,
      "__topic__": "test",
      "log_time": "2024-01-11 23:10:43.992847200",
      "tmp_ms": "992847"
    }