The data transformation feature of Simple Log Service allows you to cleanse raw data. You can use one or more functions to cleanse a large amount of data. This way, the logs collected to Simple Log Service can be converted to a standard format. This topic describes how to use functions to cleanse data in various scenarios.
Scenario 1: Filter logs by using the e_keep function and e_drop function
You can use the e_drop or e_keep function to filter logs. You can also specify the DROP parameter and use the e_if or e_if_else function to filter logs.
The following common transformation rules can be used:
e_keep(e_search(...))
: The log entries that meet the conditions are retained. Otherwise, the log entries are dropped.e_drop(e_search(...))
: The log entries that meet the conditions are dropped. Otherwise, the log entries are retained.e_if_else(e_search("..."), KEEP, DROP)
: The log entries that meet the conditions are retained. Otherwise, the log entries are dropped.e_if(e_search("not ..."), DROP)
: The log entries that meet the conditions are dropped. Otherwise, the log entries are retained.e_if(e_search("..."), KEEP)
. This transformation rule is invalid.
Example:
Raw log entries
# Log entry 1 __source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214851 __topic__: app class: test_case id: 7992 test_string: <function test1 at 0x1027401e0> # Log entry 2 __source__: 192.168.0.1 class: produce_case id: 7990 test_string: <function test1 at 0x1020401e0>
Transformation rule
Drop the log entries whose __topic__ and __tag__:__receive_time__ fields are empty.
e_if(e_not_has("__topic__"),e_drop()) e_if(e_not_has("__tag__:__receive_time__"),e_drop())
Result
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.0.2 __tag__:__receive_time__: 1597214851 __topic__: app class: test_case id: 7992 test_string: <function test1 at 0x1027401e0>
Scenario 2: Assign values to empty fields in a log entry by using the e_set function
You can use the e_set function to assign values to empty fields in a log entry.
Sub-scenario 1: Assign a value to a field if the field does not exist or is empty.
e_set("result", "......value......", mode="fill")
For information about the mode parameter, see Field extraction check and overwrite modes.
Example:
Raw log entry
name:
Transformation rule
e_set("name", "aspara2.0", mode="fill")
Result
name: aspara2.0
Sub-scenario 2: Use Grok function to simplify a regular expression and extract field values.
Example:
Raw log entry
content: "ip address: 192.168.1.1"
Transformation rule
Use the Grok function to extract the IP address in the content field.
e_regex("content", grok(r"(%{IP})"),"addr")
Result
addr: 192.168.1.1 content: "ip address: 192.168.1.1"
Sub-scenario 3: Assign values to multiple fields.
e_set("k1", "v1", "k2", "v2", "k3", "v3", ......)
Example:
Raw log entry
__source__: 192.168.0.1 __topic__: __tag__: __receive_time__: id: 7990 test_string: <function test1 at 0x1020401e0>
Transformation rule
Assign values to the __topic__, __tag__, and __receive_time__ fields.
e_set("__topic__","app", "__tag__","stu","__receive_time__","1597214851")
Result
__source__: 192.168.0.1 __topic__: app __tag__: stu __receive_time__: 1597214851 id: 7990 test_string: <function test1 at 0x1020401e0>
Scenario 3: Delete a field and rename a field by using the e_search, e_rename, and e_compose functions
We recommend that you use the e_compose function to check whether the data meets the conditions and then perform operations based on the check result.
Example:
Raw log entry
content: 123 age: 23 name: twiss
Transformation rule
If the value of the content field is 123, delete the age and name fields. Then, rename the content field to ctx.
e_if(e_search("content==123"),e_compose(e_drop_fields("age|name"), e_rename("content", "ctx")))
Result
ctx: 123
Scenario 4: Convert the type of fields in a log entry by using the v, cn_int, and dt_totimestamp functions
The fields and values in log entries are processed as strings during the data transformation process. Data of a non-string type is automatically converted to data of the string type. Therefore, you must be familiar with the types of fields that a function can receive when you invoke the function. For more information, see Syntax overview.
Sub-scenario 1: Use the op_add function to concatenate strings or add numbers.
The op_add function can receive data of both the string and numeric types. Therefore, no field type needs to be converted.
Example:
Raw log entry
a : 1 b : 2
Transformation rule
e_set("d",op_add(v("a"), v("b"))) e_set("e",op_add(ct_int(v("a")), ct_int(v("b"))))
Result
a:1 b:2 d:12 e:3
Sub-scenario 2: Use the Event check functions function and ct_int function to convert data types and use the op_mul function to multiply data.
Example:
Raw log entry
a:2 b:5
Transformation rule
The values of both v("a") and v("b") are of the string type. However, the second field of the op_mul function can receive only numeric values. Therefore, you must use the ct_int function to convert a string to an integer, and then pass the value to the op_mul function.
e_set("c",op_mul(ct_int(v("a")), ct_int(v("b")))) e_set("d",op_mul(v("a"), ct_int(v("b"))))
Result
a: 2 b: 5 c: 10 d: 22222
Sub-scenario 3: Use the dt_parse function or dt_parsetimestamp function to convert a string or datetime object to standard time.
The dt_totimestamp function receives only datetime objects. Therefore, you must use the dt_parse function to convert the string value of time1 to a datetime object. You can also use the dt_parsetimestamp function because it can receive both datetime objects and strings. For more information, see Date and time functions.
Example:
Raw log entry
time1: 2020-09-17 9:00:00
Transformation rule
Convert the time indicated by time1 to a UNIX timestamp.
e_set("time1", "2019-06-03 2:41:26") e_set("time2", dt_totimestamp(dt_parse(v("time1")))) or e_set("time2", dt_parsetimestamp(v("time1")))
Result:
time1: 2019-06-03 2:41:26 time2: 1559529686
Scenario 5: Fill the default values in log fields that do not exist by specifying the default parameter
Some expression functions that are used to transform data in Simple Log Service have specific requirements for input parameters. If the input parameters do not meet the requirements, the data transformation rule returns the default values or an error. If a necessary log field is incomplete, you can fill the default value in the log field by using the op_len function.
If default values are passed to subsequent functions, errors may occur. We recommend that you resolve the exceptions returned by the data transformation rules at the earliest opportunity.
Raw log entry
data_len: 1024
Transformation rule
e_set("data_len", op_len(v("data", default="")))
Result
data: 0 data_len: 0
Scenario 6: Add one or more fields based on conditions by using the e_if and e_switch functions
We recommend that you use the e_if or e_switch function to add one or more fields to log entries based on specified conditions. For more information, see Flow control functions.
e_if function syntax
e_if(condition 1, operation 1, condition 2, operation 2, condition 3, operation 3, ...)
e_switch function syntax
When you use the e_switch function, you must specify condition-operation pairs. The conditions are checked in sequence. An operation is performed only after its paired condition is met. After a condition is met, the corresponding operation result is returned and no more conditions are checked. If a condition is not met, its paired operation is not performed and the next condition is checked. If no conditions are met and the default field is specified, the operation that is specified by default is performed and the corresponding result is returned.
e_switch(condition 1, operation 1, condition 2, operation 2, condition 3, operation 3, ..., default=None)
Example:
Raw log entry
status1: 200 status2: 404
e_if function
Transformation rule
e_if(e_match("status1", "200"), e_set("status1_info", "normal"), e_match("status2", "404"), e_set("status2_info", "error"))
Result
status1: 200 status2: 404 status1_info: normal status2_info: error
e_switch function
Transformation rule
e_switch(e_match("status1", "200"), e_set("status1_info", "normal"), e_match("status2", "404"), e_set("status2_info", "error"))
Result
The e_switch function checks the conditions in sequence. After a condition is met, the corresponding operation result is returned and no more conditions are checked.
status1: 200 status2: 404 status1_info: normal