The data transformation feature of Log Service allows you to cleanse raw data. You can use one or more functions to cleanse a large amount of data. This way, the logs collected to Log Service can be converted to a standard format. This topic describes how to use functions to cleanse data in various scenarios.
Scenario 1: Filter logs by using the e_keep function and e_drop function
You can use the e_drop or e_keep function to filter logs. You can also specify the DROP parameter and use the e_if or e_if_else function to filter logs.
e_keep(e_search(...))
: The log entries that meet the conditions are retained. Otherwise, the log entries are dropped.e_drop(e_search(...))
: The log entries that meet the conditions are dropped. Otherwise, the log entries are retained.e_if_else(e_search("..."), KEEP, DROP)
: The log entries that meet the conditions are retained. Otherwise, the log entries are dropped.e_if(e_search("not ..."), DROP)
: The log entries that meet the conditions are dropped. Otherwise, the log entries are retained.e_if(e_search("..."), KEEP)
. This transformation rule is invalid.
- Raw log entries
# Log entry 1 __source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214851 __topic__: app class: test_case id: 7992 test_string: <function test1 at 0x1027401e0> # Log entry 2 __source__: 192.168.0.1 class: produce_case id: 7990 test_string: <function test1 at 0x1020401e0>
- Transformation rule
Drop the log entries whose __topic__ and __tag__:__receive_time__ fields are empty.
e_if(e_not_has("__topic__"),e_drop()) e_if(e_not_has("__tag__:__receive_time__"),e_drop())
- Result
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214851 __topic__: app class: test_case id: 7992 test_string: <function test1 at 0x1027401e0>
Scenario 2: Assign values to empty fields in a log entry by using the e_set function
You can use the e_set function to assign values to empty fields in a log entry.
- Sub-scenario 1: Assign a value to a field if the field does not exist or is empty.
e_set("result", "......value......", mode="fill")
For information about the mode parameter, see Field check and overwrite modes.
Example:- Raw log entry
name:
- Transformation rule
e_set("name", "aspara2.0", mode="fill")
- Result
name: aspara2.0
- Raw log entry
- Sub-scenario 2: Use Grok function to simplify a regular expression and extract field values.
Example:
- Raw log entry
content: "ip address: 192.168.1.1"
- Transformation rule
Use the Grok function to extract the IP address in the content field.
e_regex("content", grok(r"(%{IP})"),"addr")
- Result
addr: 192.168.1.1 content: "ip address: 192.168.1.1"
- Raw log entry
- Sub-scenario 3: Assign values to multiple fields.
e_set("k1", "v1", "k2", "v2", "k3", "v3", ......)
Example:- Raw log entry
__source__: 192.168.0.1 __topic__: __tag__: __receive_time__: id: 7990 test_string: <function test1 at 0x1020401e0>
- Transformation rule
Assign values to the __topic__, __tag__, and __receive_time__ fields.
e_set("__topic__","app", "__tag__","stu","__receive_time__","1597214851")
- Result
__source__: 192.168.0.1 __topic__: app __tag__: stu __receive_time__: 1597214851 id: 7990 test_string: <function test1 at 0x1020401e0>
- Raw log entry
Scenario 3: Delete a field and rename a field by using the e_search, e_rename, and e_compose functions
We recommend that you use the e_compose function to check whether the data meets the conditions and then perform operations based on the check result.
- Raw log entry
content: 123 age: 23 name: twiss
- Transformation rule
If the value of the content field is 123, delete the age and name fields. Then, rename the content field to ctx.
e_if(e_search("content==123"),e_compose(e_drop_fields("age|name"), e_rename("content", "ctx")))
- Result
ctx: 123
Scenario 4: Convert the type of fields in a log entry by using the v, cn_int, and dt_totimestamp functions
- Sub-scenario 1: Use the op_add function to concatenate strings or add numbers.
The op_add function can receive data of both the string and numeric types. Therefore, no field type needs to be converted.
Example:- Raw log entry
a : 1 b : 2
- Transformation rule
e_set("d",op_add(v("a"), v("b"))) e_set("e",op_add(ct_int(v("a")), ct_int(v("b"))))
- Result
a:1 b:2 d:12 e:3
- Raw log entry
- Sub-scenario 2: Use the Event check functions function and ct_int function to convert data types and use the op_mul function to multiply data.
Example:
- Raw log entry
a:2 b:5
- Transformation rule
The values of both v("a") and v("b") are of the string type. However, the second field of the op_mul function can receive only numeric values. Therefore, you must use the ct_int function to convert a string to an integer, and then pass the value to the op_mul function.
e_set("c",op_mul(ct_int(v("a")), ct_int(v("b")))) e_set("d",op_mul(v("a"), ct_int(v("b"))))
- Result
a: 2 b: 5 c: 10 d: 22222
- Raw log entry
- Sub-scenario 3: Use the dt_parse function or dt_parsetimestamp function to convert a string or datetime object to standard time.
The dt_totimestamp function receives only datetime objects. Therefore, you must use the dt_parse function to convert the string value of time1 to a datetime object. You can also use the dt_parsetimestamp function because it can receive both datetime objects and strings. For more information, see Date and time functions.
Example:- Raw log entry
time1: 2020-09-17 9:00:00
- Transformation rule
Convert the time indicated by time1 to a UNIX timestamp.
e_set("time1", "2019-06-03 2:41:26") e_set("time2", dt_totimestamp(dt_parse(v("time1")))) or e_set("time2", dt_parsetimestamp(v("time1")))
- Result:
time1: 2019-06-03 2:41:26 time2: 1559529686
- Raw log entry
Scenario 5: Fill the default values in log fields that do not exist by specifying the default parameter
- Raw log entry
data_len: 1024
- Transformation rule
e_set("data_len", op_len(v("data", default="")))
- Result
data: 0 data_len: 0
Scenario 6: Add one or more fields based on conditions by using the e_if and e_switch functions
- e_if function syntax
e_if(condition 1, operation 1, condition 2, operation 2, condition 3, operation 3, ...)
- e_switch function syntax
When you use the e_switch function, you must specify condition-operation pairs. The conditions are checked in sequence. An operation is performed only after its paired condition is met. After a condition is met, the corresponding operation result is returned and no more conditions are checked. If a condition is not met, its paired operation is not performed and the next condition is checked. If no conditions are met and the default field is specified, the operation that is specified by default is performed and the corresponding result is returned.
e_switch(condition 1, operation 1, condition 2, operation 2, condition 3, operation 3, ..., default=None)
- Raw log entry
status1: 200 status2: 404
- e_if function
- Transformation rule
e_if(e_match("status1", "200"), e_set("status1_info", "normal"), e_match("status2", "404"), e_set("status2_info", "error"))
- Result
status1: 200 status2: 404 status1_info: normal status2_info: error
- Transformation rule
- e_switch function
- Transformation rule
e_switch(e_match("status1", "200"), e_set("status1_info", "normal"), e_match("status2", "404"), e_set("status2_info", "error"))
- Result
The e_switch function checks the conditions in sequence. After a condition is met, the corresponding operation result is returned and no more conditions are checked.
status1: 200 status2: 404 status1_info: normal
- Transformation rule