You can use Simple Log Service data transformation functions to cleanse large volumes of collected log data and standardize the data format. This topic describes common scenarios and related operations for cleansing data by calling functions.
Scenario 1: Filter logs (e_keep and e_drop functions)
You can use the e_drop or e_keep function to filter logs. You can also use the e_if or e_if_else function with the DROP parameter to filter logs.
Common rules are as follows:
e_keep(e_search(...)): Keeps the log if the condition is met. Drops the log if the condition is not met.e_drop(e_search(...)): Drops the log if the condition is met. Keeps the log if the condition is not met.e_if_else(e_search("..."), KEEP, DROP): Keeps the log if the condition is met. Drops the log if the condition is not met.e_if(e_search("not ..."), DROP): Drops the log if the condition is met. Keeps the log if the condition is not met.e_if(e_search("..."), KEEP): This is a meaningless transformation rule.
Example:
Raw logs
#Log 1 __source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214851 __topic__: app class: test_case id: 7992 test_string: <function test1 at 0x1027401e0> #Log 2 __source__: 192.168.0.1 class: produce_case id: 7990 test_string: <function test1 at 0x1020401e0>Transformation rule
Drop logs that do not have the __topic__ field or the __tag__:__receive_time__ field.
e_if(e_not_has("__topic__"),e_drop()) e_if(e_not_has("__tag__:__receive_time__"),e_drop())Transformation result
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214851 __topic__: app class: test_case id: 7992 test_string: <function test1 at 0x1027401e0>
Scenario 2: Assign values to empty log fields (e_set function)
You can use the e_set function to assign values to empty log fields.
Sub-scenario 1: Assign a value to a field if the original field does not exist or is empty.
e_set("result", "......value......", mode="fill")For more information about the values of the mode parameter, see Field extraction check and overwrite modes.
Example:
Raw log
name:Transformation rule
e_set("name", "aspara2.0", mode="fill")Transformation result
name: aspara2.0
Sub-scenario 2: Use the GROK function to simplify regular expressions and extract field content.
Example:
Raw log
content: "ip address: 192.168.1.1"Transformation rule
Use the GROK function to extract the IP address from the content field.
e_regex("content", grok(r"(%{IP})"),"addr")Transformation result
addr: 192.168.1.1 content: "ip address: 192.168.1.1"
Sub-scenario 3: Assign values to multiple fields.
e_set("k1", "v1", "k2", "v2", "k3", "v3", ......)Example:
Raw log
__source__: 192.168.0.1 __topic__: __tag__: __receive_time__: id: 7990 test_string: <function test1 at 0x1020401e0>Transformation rule
Assign values to the __topic__, __tag__, and __receive_time__ fields.
e_set("__topic__","app", "__tag__","stu","__receive_time__","1597214851")Transformation result
__source__: 192.168.0.1 __topic__: app __tag__: stu __receive_time__: 1597214851 id: 7990 test_string: <function test1 at 0x1020401e0>
Scenario 3: Conditionally delete and rename fields (e_search, e_rename, and e_compose functions)
In most cases, use the e_compose function for repeated conditional operations.
Example:
Raw log
content: 123 age: 23 name: twissTransformation rule
First, check if the value of the content field is 123. If it is, delete the age and name fields. Then, rename the content field to ctx.
e_if(e_search("content==123"),e_compose(e_drop_fields("age|name"), e_rename("content", "ctx")))Transformation result
ctx: 123
Scenario 4: Convert log parameter types (v, ct_int, and dt_totimestamp functions)
During data transformation, log fields and their values are always strings. Data of non-string types is automatically converted to the string type. Therefore, when you call a function, pay attention to the parameter types that the function accepts. For more information, see Syntax overview.
Sub-scenario 1: Call the op_add function to concatenate strings and add numbers.
The op_add function accepts both string and numeric types. Therefore, you do not need to convert parameter types.
Example:
Raw log
a : 1 b : 2Transformation rule
e_set("d",op_add(v("a"), v("b"))) e_set("e",op_add(ct_int(v("a")), ct_int(v("b"))))Transformation result
a:1 b:2 d:12 e:3
Sub-scenario 2: Use field operation functions and the ct_int function for type conversion, and then call the op_mul function to multiply numbers.
Example:
Raw log
a:2 b:5Transformation rule
The v("a") and v("b") values are both strings. The second parameter of the op_mul function accepts only numeric types. Therefore, you must use the ct_int function to convert the strings to integers before you pass them to the op_mul function.
e_set("c",op_mul(ct_int(v("a")), ct_int(v("b")))) e_set("d",op_mul(v("a"), ct_int(v("b"))))Transformation result
a: 2 b: 5 c: 10 d: 22222
Sub-scenario 3: Call the dt_parse function and the dt_parsetimestamp function to convert a string or a datetime object to a standard time format.
The dt_totimestamp function accepts a datetime object, not a string. Therefore, you must call the dt_parse function to convert the string value of time1 to a datetime object. You can also directly use the dt_parsetimestamp function, which accepts both datetime objects and strings. For more information, see Date and time functions.
Example:
Raw log
time1: 2020-09-17 9:00:00Transformation rule
Convert the datetime represented by time1 to a UNIX timestamp.
e_set("time1", "2019-06-03 2:41:26") e_set("time2", dt_totimestamp(dt_parse(v("time1")))) or e_set("time2", dt_parsetimestamp(v("time1")))Transformation result
time1: 2019-06-03 2:41:26 time2: 1559529686
Scenario 5: Fill non-existent log fields with default values (pass the default parameter)
Some SLS DSL expression functions have specific requirements for input parameters. If the requirements are not met, the data transformation window reports an error or returns a default value. If a log contains a required but incomplete field, you can specify a default value in the op_len function.
Passing a default value to a subsequent function may cause another error. You must handle the exceptions returned by the function promptly.
Raw log
data_len: 1024Transformation rule
e_set("data_len", op_len(v("data", default="")))Transformation result
data: 0 data_len: 0
Scenario 6: Conditionally check logs and add fields (e_if and e_switch functions)
Use the e_if function or the e_switch function to check logs. For more information, see Flow control functions.
e_if function
e_if(condition1, operation1, condition2, operation2, condition3, operation3, ....)e_switch function
The e_switch function is a combination of condition-operation pairs. The function checks conditions sequentially. If a condition is met, the corresponding operation is performed and the result is returned immediately. If a condition is not met, the function checks the next condition. If no conditions are met and the default parameter is configured, the operation configured for default is executed and its result is returned.
e_switch(condition1, operation1, condition2, operation2, condition3, operation3, ...., default=None)
Example:
Raw log
status1: 200 status2: 404e_if function
Transformation rule
e_if(e_match("status1", "200"), e_set("status1_info", "normal"), e_match("status2", "404"), e_set("status2_info", "error"))Transformation result
status1: 200 status2: 404 status1_info: normal status2_info: error
e_switch function
Transformation rule
e_switch(e_match("status1", "200"), e_set("status1_info", "normal"), e_match("status2", "404"), e_set("status2_info", "error"))Transformation result
If any condition is met, the result is returned, and no further conditions are checked.
status1: 200 status2: 404 status1_info: normal
Scenario 7: Convert data to a nanosecond-level UNIX timestamp
In some scenarios, data transformation in Simple Log Service must support nanosecond-level timestamp precision. If a raw log contains a field in the UNIX timestamp format, you can use field operation functions to parse it into a log time with nanosecond precision.
Raw log
{ "__source__": "1.2.3.4", "__time__": 1704983810, "__topic__": "test", "log_time_nano":"1705043680630940602" }Transformation rule
e_set( "__time__", op_div_floor(ct_int(v("log_time_nano")), 1000000000), ) e_set( "__time_ns_part__", op_mod(ct_int(v("log_time_nano")), 1000000000), )Transformation result
{ "__source__": "1.2.3.4", "__time__": 1705043680, "__time_ns_part__": 630940602, "__topic__": "test", "log_time_nano":"1705043680630940602" }
Scenario 8: Convert data to a microsecond-level standard ISO 8601 timestamp
In some scenarios, data transformation in Simple Log Service must support high-precision timestamps. If a raw log contains a field in the standard ISO 8601 time format, you can use field operation functions to parse it into a log time with microsecond precision.
Raw log
{ "__source__": "1.2.3.4", "__time__": 1704983810, "__topic__": "test", "log_time":"2024-01-11 23:10:43.992847200" }Transformation rule
e_set( "__time__", dt_parsetimestamp(v("log_time"), tz="Asia/Shanghai"), mode="overwrite", ) e_set("tmp_ms", dt_prop(v("log_time"), "microsecond")) e_set( "__time_ns_part__", op_mul(ct_int(v("tmp_ms")), 1000), )Transformation result
{ "__source__": "1.2.3.4", "__time__": 1704985843, "__time_ns_part__": 992847000, "__topic__": "test", "log_time": "2024-01-11 23:10:43.992847200", "tmp_ms": "992847" }