Data Transmission Service (DTS) provides the extract, transform, and load (ETL) feature to help you process streaming data in real time. In combination with the efficient data replication capabilities of DTS, ETL can be used to extract, transform, process, and load streaming data. This topic describes how to configure ETL in a DTS task. You can use the ETL feature to filter data, mask data, record the modification time of data, and audit data modifications.
Background
DTS is used for data migration and real-time data transmission between data sources. In some cases, you may need to transform or filter real-time data before the data is written to a database. To meet such requirements, DTS provides the ETL feature and allows you to use domain-specific language (DSL) statements to process data in a flexible manner. For more information about DSL, see the Overview of DSL section of this topic.
You can configure ETL by using one of the following methods:
You can configure ETL in both data migration and data synchronization tasks. In this example, ETL is configured in a data synchronization task. You can also follow the procedure to configure ETL in a data migration task.
Supported database services
The following table describes the source and destination databases that are supported by the ETL feature.
Source database | Destination database |
SQL Server |
|
MySQL |
|
Self-managed Oracle database |
|
PolarDB for MySQL |
|
PolarDB for PostgreSQL (Compatible with Oracle) |
|
PolarDB-X 1.0 |
|
PolarDB-X 2.0 |
|
Self-managed Db2 for LUW database | MySQL |
Self-managed Db2 for i database | MySQL |
PolarDB for PostgreSQL |
|
PostgreSQL |
|
TiDB |
|
MongoDB | Lindorm |
Configure ETL when you create a data synchronization task
Usage notes
If the ETL script that you configure contains the operation to add a column, you must manually add a column to the destination table. Otherwise, the ETL script does not take effect. For example, if you configure the ETL script
script:e_set(`new_column`, dt_now())
, you must manually add thenew_column
column to the destination table.DSL scripts are used to implement operations such as data transformation and data cleansing. However, you cannot create a database object by executing the DSL script.
The fields that are configured in the DSL script must exist in the source database and cannot be the fields that are filtered out by filter conditions. Otherwise, the task becomes abnormal.
The DSL script is case-sensitive. The names of the databases, tables, and fields that are configured in the DSL script must be the same as those in the source database.
The DSL script cannot contain multi-line expressions. You can use the
e_compose
function to combine multiple expressions into one expression.If a DML statement for the change of all tables in the source database is executed, the tables must have the same column upon DSL script processing. Otherwise, the task may fail. For example, if you use the
e_set
function in a DSL script to add a column, you must perform the INSERT, UPDATE, or DELETE operation on a source database to add the column to the destination table. For more information, see Record the time when data is modified.
Procedure
Create a data synchronization task. For more information, see Overview of data synchronization scenarios.
In the Advanced Configurations section, set the Configure ETL parameter to Yes.
In the code editor , enter data processing statements based on DSL syntax.
NoteIf you want to drop entries whose values of the id column are greater than 3 by using DSL, you can use the
script:e_if(op_gt(`id`, 3), e_drop())
statement. In this statement,op_gt
is an expression function used to determine whether an entry is greater than a specific value, andid
is a variable. This way, entries whose values of the id column are greater than 3 are filtered out.Complete the subsequent steps based on your business requirements.
Modify the ETL configurations of an existing data synchronization task
You can modify the ETL configurations in the following scenarios:
If the Configure ETL parameter is set to No for an existing data synchronization task, you can set this parameter to Yes and enter DSL statements.
If the Configure ETL parameter is set to Yes for an existing data synchronization task, you can modify the existing DSL statements or set this parameter to No.
ImportantBefore you modify the existing DSL statements, you must move the objects to be synchronized from the Selected Objects section to the Source Objects section and then add these objects to the Selected Objects section again.
You cannot modify the existing DSL statements in a data migration task.
Usage notes
For an existing data synchronization task, you cannot modify the destination table schema by modifying the ETL configurations of the task. You can modify the destination table schema only before the data synchronization task is started.
If you modify the ETL configurations, the data synchronization task may be interrupted. Proceed with caution.
The modification of the ETL configurations takes effect only on the incremental data generated after the modification.
DSL scripts are used to implement operations such as data transformation and data cleansing. However, you cannot create a database object by executing the DSL script.
The fields that are configured in the DSL script must exist in the source database and cannot be the fields that are filtered out by filter conditions. Otherwise, the task becomes abnormal.
The DSL script is case-sensitive. The names of the databases, tables, and fields that are configured in the DSL script must be the same as those in the source database.
The DSL script cannot contain multi-line expressions. You can use the
e_compose
function to combine multiple expressions into one expression.If a DML statement for the change of all tables in the source database is executed, the tables must have the same column upon DSL script processing. Otherwise, the task may fail. For example, if you use the
e_set
function in a DSL script to add a column, you must perform the INSERT, UPDATE, or DELETE operation on a source database to add the column to the destination table. For more information, see Record the time when data is modified.
Procedure
Go on to the Data Synchronization page of the new DTS console
Find the data synchronization whose ETL configurations you want to modify, click the
icon in the Actions column, and then click Modify ETL Configurations.
In the Advanced Configurations section, set the Configure ETL parameter to Yes.
In the code editor , enter data processing statements based on DSL syntax.
NoteIf you want to drop entries whose values of the id column are greater than 3 by using DSL, you can use the
script:e_if(op_gt(`id`, 3), e_drop())
statement. In this statement,op_gt
is an expression function used to determine whether an entry is greater than a specific value, andid
is a variable. This way, entries whose values of the id column are greater than 3 are filtered out.Complete the subsequent steps based on your business requirements.
Overview of DSL
DSL is a scripting language that is designed to process data in data synchronization scenarios. You can use conditional functions to process data of the string, date, and numeric types. DSL boasts the following characteristics that help you process data in a flexible manner:
Various functions: DSL provides a variety of functions and supports combined functions.
Simple syntax: DSL is easy to use. For more information about how to use DSL to filter, transform, and mask data, see the Typical scenarios section of this topic.
High efficiency: DSL has a minimal effect on the data synchronization performance because DSL uses the code generation mechanism.
In a DSL statement, the column name is enclosed in backticks (``) instead of single quotation marks ('').
The syntax of DSL for DTS has something in common with the syntax of DSL for Simple Log Service. DSL supports JSON functions and does not support functions for event splitting. For more information about the syntax of DSL for Simple Log Service, see Syntax overview.
Typical scenarios
Filter data
Filter out data by a numeric column: If an entry whose value of the id column is greater than 10000, drop this entry so that it is not synchronized to the destination database. Example: e_if(op_gt(`id`, 10000), e_drop()).
Filter out data by a specific string: If an entry whose value of the name column contains "hangzhou", drop this entry so that it is not synchronized to the destination database. Example: e_if(str_contains(`name`, "hangzhou"), e_drop()).
Filter out data by date: If an entry whose timestamp of the order column is earlier than a specific point in time, drop this entry so that it is not synchronized to the destination database. Example: e_if(op_lt(`order_timestamp`, "2015-02-23 23:54:55"), e_drop()).
Filter out data by multiple conditions:
If an entry whose value of the id column is greater than 1000 and value of the name column contains "hangzhou", drop this entry so that it is not synchronized to the destination database. Example: e_if(op_and(str_contains(`name`, "hangzhou"), op_gt(`id`, 1000)), e_drop()).
If an entry whose value of the id column is greater than 1000 or value of the name column contains "hangzhou", drop this entry so that it is not synchronized to the destination database. Example: e_if(op_or(str_contains(`name`, "hangzhou"), op_gt(`id`, 1000)), e_drop()).
Mask data
Mask the last four digits of a mobile phone number with four asterisks (*). Example: e_set(`phone`, str_mask(`phone`, 7, 10, '*')).
Record the time when data is modified
Add a column to all tables: If the value of the __OPERATION__ variable is INSERT, UPDATE, or DELETE, a column named dts_sync_time whose value is the same as the __COMMIT_TIMESTAMP__ variable of logs is added to all tables in the source database.
e_if(op_or(op_or( op_eq(__OPERATION__, __OP_INSERT__), op_eq(__OPERATION__, __OP_UPDATE__)), op_eq(__OPERATION__, __OP_DELETE__)), e_set(dts_sync_time, __COMMIT_TIMESTAMP__))
Add a column to a specific table: If the value of the __OPERATION__ variable is INSERT, UPDATE, or DELETE, a column named dts_sync_time whose value is the same as the __COMMIT_TIMESTAMP__ variable of logs is added to the dts_test_table table in the source database.
e_if(op_and( op_eq(__TB__,'dts_test_table'), op_or(op_or( op_eq(__OPERATION__,__OP_INSERT__), op_eq(__OPERATION__,__OP_UPDATE__)), op_eq(__OPERATION__,__OP_DELETE__))), e_set(dts_sync_time,__COMMIT_TIMESTAMP__))
NoteTo perform the preceding operations, you must add the dts_sync_time column to the corresponding tables in the destination database before the data synchronization task is started.
Audit data modifications
Record the type and time of the data modifications in tables: 1. Record the data modification type in the operation_type column of the destination database. 2. Record the time when data is modified in the updated column of the destination database.
e_compose( e_switch( op_eq(__OPERATION__,__OP_DELETE__), e_set(operation_type, 'DELETE'), op_eq(__OPERATION__,__OP_UPDATE__), e_set(operation_type, 'UPDATE'), op_eq(__OPERATION__,__OP_INSERT__), e_set(operation_type, 'INSERT')), e_set(updated, __COMMIT_TIMESTAMP__), e_set(__OPERATION__,__OP_INSERT__) )
NoteYou must add the operation_type and updated columns to the tables in the destination database before the data synchronization task is started.
DSL syntax
Constants and variables
Constants
Data type
Example
int
123
float
123.4
string
"hello1_world"
boolean
true or false
datetime
DATETIME('2021-01-01 10:10:01')
Variables
Variable
Description
Data type
Example
__TB__
The table name.
string
table
__DB__
The database name.
string
mydb
__OPERATION__
The operation type.
string
__OP_INSERT__,__OP_UPDATE__,__OP_DELETE__
__BEFORE__
The previous image value of the UPDATE operation, which is the column value before the UPDATE operation is performed.
NoteThe DELETE operation has only the previous image value.
Special mark, no type
v(`column_name`,__BEFORE__)
__AFTER__
The new image value of the UPDATE operation, which is the column value after the UPDATE operation is performed.
NoteThe INSERT operation has only the new image value.
Special mark, no type
v(`column_name`,__AFTER__)
__COMMIT_TIMESTAMP__
The time when the transaction was committed.
datetime
'2021-01-01 10:10:01'
`column`
The name of the column.
string
`id`, `name`
Expression functions
Numeric operations
Operation
Syntax
Valid value
Return value
Example
Addition
op_sum(value1, value2)
value1: an integer or a floating-point number.
value2: an integer or a floating-point number.
If value1 and value2 are integers, an integer is returned. Otherwise, a floating-point number is returned.
op_sum(`col1`, 1.0)
Subtraction
op_sub(value1, value2)
value1: an integer or a floating-point number.
value2: an integer or a floating-point number.
If value1 and value2 are integers, an integer is returned. Otherwise, a floating-point number is returned.
op_sub(`col1`, 1.0)
Multiplication
op_mul(value1, value2)
value1: an integer or a floating-point number.
value2: an integer or a floating-point number.
If value1 and value2 are integers, an integer is returned. Otherwise, a floating-point number is returned.
op_mul(`col1`, 1.0)
Division
op_div_true(value1, value2)
value1: an integer or a floating-point number.
value2: an integer or a floating-point number.
If value1 and value2 are integers, an integer is returned. Otherwise, a floating-point number is returned.
op_div_true(`col1`, 2.0). In this example, if the value of col1 is 15, 7.5 is returned.
Modulo
op_mod(value1, value2)
value1: an integer or a floating-point number.
value2: an integer or a floating-point number.
If value1 and value2 are integers, an integer is returned. Otherwise, a floating-point number is returned.
op_div_true(`col1`, 2.0). In this example, if the value of col1 is 15, 7.5 is returned.
Logical operations
Operation
Syntax
Valid value
Return value
Example
Equal to
op_eq(value1, value2)
value1: an integer, a floating-point number, or a string.
value2: an integer, a floating-point number, or a string.
true or false
op_eq(`col1`, 23)
Greater than
op_gt(value1, value2)
value1: an integer, a floating-point number, or a string.
value2: an integer, a floating-point number, or a string.
true or false
op_gt(`col1`, 1.0)
Less than
op_lt(value1, value2)
value1: an integer, a floating-point number, or a string.
value2: an integer, a floating-point number, or a string.
true or false
op_lt(`col1`, 1.0)
Greater than or equal to
op_ge(value1, value2)
value1: an integer, a floating-point number, or a string.
value2: an integer, a floating-point number, or a string.
true or false
op_ge(`col1`, 1.0)
Less than or equal to
op_le(value1, value2)
value1: an integer, a floating-point number, or a string.
value2: an integer, a floating-point number, or a string.
true or false
op_le(`col1`, 1.0)
AND
op_and(value1, value2)
value1: a Boolean value.
value2: a Boolean value.
true or false
op_and(`is_male`, `is_student`)
OR
op_or(value1, value2)
value1: a Boolean value.
value2: a Boolean value.
true or false
op_or(`is_male`, `is_student`)
IN
op_in(value, json_array)
value: an arbitrary value.
json_array: a JSON string.
true or false
op_in(`id`,json_array('["0","1","2","3","4","5","6","7","8"]'))
Determine whether the value is empty
op_is_null(value)
value: an arbitrary value.
true or false
op_is_null(`name`)
Determine whether the value is not empty
op_is_not_null(value)
value: an arbitrary value.
true or false
op_is_not_null(`name`)
String functions
Operation
Syntax
Valid value
Return value
Example
Append strings
op_add(str_1,str_2,...,str_n)
str_1: string.
str_2: string.
...
str_n: string.
The string after the append operation.
op_add(`col`,'hangzhou','dts')
Format strings and append strings
str_format(format, value1, value2, value3, ...)
format: a string. Braces ({}) are used as placeholders. Example: "part1: {}, part2: {}".
value1: an arbitrary value.
value2: an arbitrary value.
The string after the format operation.
str_format("part1: {}, part2: {}", `col1`, `col2`). In this example, if the value of col1 is ab and the value of col2 is 12, "part1: ab, part2: 12" is returned.
Replace strings
str_replace(original, oldStr, newStr, count)
original: the original string.
oldStr: the string to be replaced.
newStr: the string after the replacement.
count: an integer that indicates the maximum number of times that a string can be replaced. A value of -1 indicates that all oldStr is replaced with newStr.
The string after the replace operation.
Example 1: str_replace(`name`, "a", 'b', 1). In this example, if the name is aba, bba is returned. Example 2: str_replace(`name`, "a", 'b', -1). In this example, if the name is aba, bbb is returned.
Replace strings in the values of fields of all string types, such as the VARCHAR, TEXT, or CHAR type
tail_replace_string_field(search, replace, all)
search: the string to be replaced.
replace: the string after the replacement.
all: indicates whether to replace all matched strings. Only a value of true is supported.
NoteIf you do not need to replace all matched strings, use the
str_replace()
function.
The string after the replace operation.
tail_replace_string_field('\u000f','',true). In this example, all "\u000f" strings in the field values of the string type are replaced with spaces.
Remove specific characters at the start and end of a string
str_strip(string_val, charSet)
string_val: The original string
char_set: the set of the first characters and the last characters of the string.
The string after the remove operation.
str_strip(`name`, 'ab'). In this example, if the name is axbzb, xbz is returned.
Convert strings to lowercase letters
str_lower(value)
value: a column of the string type or a string constant.
The string after the convert operation.
str_lower(`str_col`)
Convert strings to uppercase letters
str_upper(value)
value: a column of the string type or a string constant.
The string after the convert operation.
str_upper(`str_col`)
Convert strings to numbers
cast_string_to_long(value)
value: a string.
The string after the convert operation.
cast_string_to_long(`col`)
Convert numbers to strings
cast_long_to_string(value)
value: an integer.
The string after the convert operation.
cast_long_to_string(`col`)
Count strings
str_count(str,pattern)
str: a column of the string type or a string constant.
pattern: the substring to query.
The number of times for which the substring appears.
str_count(`str_col`, 'abc'), if str_col="zabcyabcz", it returns 2.
Query strings
str_find(str, pattern)
str: a column of the string type or a string constant.
pattern: the substring to query.
The position in which the substring matches for the first time. If no match is found, -1 is returned.
str_find(`str_col`, 'abc'), if `str_col="xabcy"`, returns `1`.
Determine whether a string contains only letters
str_isalpha(str)
str: a column of the string type or a string constant.
true or false
str_isalpha(`str_col`)
Determine whether a string contains only digits
str_isdigit(str)
str: a column of the string type or a string constant.
true or false
str_isdigit(`str_col`)
Regular expression match
regex_match(str,regex)
str: a column of the string type or a string constant.
regex: a regular expression.
true or false
regex_match(__TB__,'user_\\d+')
Mask part of a string with specific characters. This operation can be used for data masking. For example, mask the last four digits of a mobile phone number with four asterisks (*).
str_mask(str, start, end, maskStr)
str: a column of the string type or a string constant.
start: an integer that indicates the start position of the masking. The minimum value is 0.
end: an integer that indicates the end position of the masking. The maximum value is the length of the string minus 1.
maskStr: a string. The length is 1. Example: #.
The string whose part from start to end is masked with the specified characters.
str_mask(`phone`, 7, 10, '#')
Truncate a string from the cond string to the last character
substring_after(str, cond)
str: the original string.
cond: a string.
The string after the truncate operation.
NoteThe return value does not contain the cond string.
substring_after(`col`, 'abc')
Extract the part before the string cond
substring_before(str, cond)
str: the original string.
cond: a string.
The string after the truncate operation.
NoteThe return value does not contain the cond string.
substring_before(`col`, 'efg')
Extract the substring between cond1 and cond2
substring_between(str, cond1, cond2)
str: the original string.
cond1: a string.
cond2: a string.
The string after the truncate operation.
NoteThe return value does not contain the cond1 and cond2 strings.
substring_between(`col`, 'abc','efg')
Determine whether it is a string type
is_string_value(value)
value: a string or a column name.
true or false
is_string_value(`col1`)
String type field content replacement; reverse order starting from the end
tail_replace_string_field(search, replace, all)
search: the string to be replaced.
replace: the string after the replacement.
all: indicates whether to replace all matched strings. Valid values: true and false.
The string after the replace operation.
In the following example, all "\u000f" strings in the field values of the string type are replaced with spaces:
tail_replace_string_field('\u000f','',true)
Get the value of a field in MongoDB
bson_value("field1","field2","field3",…)
field1: the name of the level -1 field.
field2: the name of the level -2 field.
The value of the corresponding field in the document.
e_set(`user_id`, bson_value("id"))
e_set(`user_name`, bson_value("person","name"))
Time functions
Operation
Syntax
Valid value
Return value
Example
Query the current time
dt_now()
N/A
A value of the DATETIME data type that is accurate to seconds.
dts_now()
dt_now_millis()
N/A
A value of the DATETIME data type that is accurate to milliseconds.
dt_now_millis()
Convert a UTC timestamp in seconds to a DATETIME value
dt_fromtimestamp(value,[timezone])
value: an integer.
timezone: the time zone. This parameter is optional.
A value of the DATETIME data type that is accurate to seconds.
dt_fromtimestamp(1626837629)
dt_fromtimestamp(1626837629,'GMT+08')
Convert a UTC timestamp in milliseconds to a DATETIME value
dt_fromtimestamp_millis(value,[timezone])
value: an integer.
timezone: the time zone. This parameter is optional.
A value of the DATETIME data type that is accurate to milliseconds.
dt_fromtimestamp_millis(1626837629123);
dt_fromtimestamp_millis(1626837629123,'GMT+08')
Convert a DATETIME value to a UTC timestamp in seconds
dt_parsetimestamp(value,[timezone])
value: a value of the DATETIME data type.
timezone: the time zone. This parameter is optional.
The integer after the convert operation.
dt_parsetimestamp(`datetime_col`)
dt_parsetimestamp(`datetime_col`,'GMT+08')
Convert a DATETIME value to a UTC timestamp in milliseconds
dt_parsetimestamp_millis(value,[timezone])
value: a value of the DATETIME data type.
timezone: the time zone. This parameter is optional.
The integer after the convert operation.
dt_parsetimestamp_millis(`datetime_col`)
dt_parsetimestamp_millis(`datetime_col`,'GMT+08')
Convert a DATETIME value to a string
dt_str(value, format)
value: a value of the DATETIME data type.
format: a string that indicates the time format. Example: yyyy-MM-dd HH:mm:ss.
The string after the convert operation.
dt_str(`col1`, 'yyyy-MM-dd HH:mm:ss')
Convert a string to a DATETIME value
dt_strptime(value,format)
value: a string.
format: a string that indicates the time format. Example: yyyy-MM-dd HH:mm:ss.
The DATETIME value after the convert operation.
dt_strptime('2021-07-21 03:20:29', 'yyyy-MM-dd hh:mm:ss')
Change the time by increasing or decreasing the values of one or more time granularities, such as year, month, day, hour, minute, and second
dt_add(value, [years=intVal],
[months=intVal],
[days=intVal],
[hours=intVal],
[minutes=intVal]
)
value: a value of the DATETIME data type.
intVal: an integer.
- Note
A minus sign (-) indicates that the value of the specified time granularity is decreased. .
The DATETIME value after the change operation.
dt_add(datetime_col,years=-1)
dt_add(datetime_col,years=1,months=1)
Conditional expression
Operation
Syntax
Valid value
Return value
Example
Return a value based on the result of the condition. This expression is similar to the ternary operator (
? :
) in the C language.(cond ? val_1 : val_2)
cond: a field or an expression whose value is a Boolean value.
val_1: return value 1
val_2: return value 2
NoteThe values of val_1 and val_2 must be of the same data type.
If the value of cond is true, val_1 is returned. Otherwise, val_2 is returned.
(id>1000? 1 : 0)
JSON functions
NoteData type of values represents any field type in a database.
Operation
Syntax
Valid value
Return value
Example
Convert a JSON array string to a set
json_array(arrayText)
NoteThis operation is supported only when a Boolean expression is returned.
arrayText: string types. A JSON array string to be converted.
Set
In this example, if
op_in(`id`,json_array('["0","1","2","3"]'))
is executed, the set["0","1","2","3"]
is returned.Create a JSON array that contains specified data
json_array2(item...)
item...: a data type of values.
JSON array
In this example, if
json_array2("0","1","2","3")
is executed,["0","1","2","3"]
is returned.Create a JSON object that contains specified data
json_object(item...)
item...: the key-value pairs of a JSON object. It consists of key name (string) and key value (value type), separated by a comma (,).
JSON
In this example, if
json_object('name','ZhangSan','age',32, 'loginId',100)
is executed,{"name":"ZhangSan","age":32,"loginId":100]}
.is returned.Insert data in the specified array of a JSON object
json_array_insert(json, kvPairs...)
json: string type, JSON object to be operated.
kvPairs...: data to be inserted. It consists of jsonPath string and specific data types of values, separated by a comma (,).
JSON
NoteIf the insertion position is not specified, the JSON object to be operated is returned.
If the element of the insertion position is not specified, the data is added to the end in the destination array.
In this example, if
json_array_insert('{"Address":["City",1]}','$.Address[3]',100)
is executed,{"Address":["City",1,100]}
is returned.Insert data at a specified position of a JSON object
json_insert(json, kvPairs...)
json: string type, JSON object to be operated.
kvPairs...: data to be inserted. It consists of jsonPath string and specific data types of values, separated by a comma (,).
JSON
NoteIf the insertion position is not specified, the JSON object to be operated is returned.
If the element of the insertion position is not specified, the data is added to the end in the destination array.
In this example, if
json_insert('{"Address":["City","Xian","Number",1]}','$.ID',100)
is executed,{"Address":["City","Xian","Number",1],"ID":100}
is returned.Insert or update data at a specified position of a JSON object
json_set(json, kvPairs...)
json: string type, JSON object to be operated.
kvPairs...: data to be inserted. It consists of jsonPath string and specific data types of values, separated by a comma (,).
Data type of values
NoteIf a specified position exists, the data stored in the position is updated.
If a specified position does not exist, the data is add to the JSON object to be operated.
In this example, if
json_set('{"ID":1,"Address":["City","Xian","Number",1]}',"$.IP",100)
is executed,{"ID":1,"Address":["City","Xian","Number",1], "IP":100}
is returned.Insert or update key-value pairs in a JSON object
json_put(json, key, value)
json: string type, JSON object to be operated.
key: string type, key name to be inserted.
value: data type of values, key value to be inserted.
JSON
NoteIf the JSON parameter is not JSON object, null is returned.
If the specified key name exists, the key value that corresponds to the key name is updated.
If the specified key name does not exists, the data is add to the JSON object to be operated.
In this example, if
json_put('{"loginId":100}','loginTime','2024-10-10')
is executed,{"loginId":100, 'loginTime':'2024-10-10'}
is returned.Replace data at a specified position of a JSON object
json_replace(json, kvPairs...)
json: string type, JSON object to be operated.
kvPairs...: data to be replaced. It consists of jsonPath string and specific data types of values, separated by a comma (,).
Data type of values
NoteIf a specified position does not exist, the JSON object to be operated is returned.
In this example, if
json_replace('{"ID":1,"Address":["City","Xian","Number",1]}',"$.IP",100)
is executed,{"ID":1,"Address":["City","Xian","Number",1]}
is returned.Query whether specified data exists at a specified position in a JSON object
json_contains(json, jsonPath, item)
json: string type, JSON object to be operated.
jsonPath: string type, specified position in the JSON object.
item: data type of values, specific data to be queried.
The Boolean type. Valid values: true or false
In this example, if
json_contains('{"ID":1,"Address":["City","Xian","Number",1]}','$.ID',1)
is executed,true
is returned.Query whether a specified position exists in a JSON object
json_contains_path(json, jsonPath)
json: string type, JSON object to be operated.
jsonPath: string type, position information to be queried.
The Boolean type. Valid values: true or false
In this example, if
json_contains_path('{"ID":1,"Address":["City","Xian","Number",1]}','$.ID')
is executed,true
is returned.Obtain data at a specified position in a JSON object
json_extract(json, jsonPath)
json: string type, JSON object to be operated.
jsonPath: string type, the specified position in the JSON object.
Data type of values
In this example, if
json_extract('{"ID":1,"Address":["City","Xian","Number",1]}','$.ID')
is executed,1
is returned.Get the key value of a specified Key in a JSON object
json_get(json, key)
json: string type, JSON object to be operated.
key: string type, key name (Key) of the data to be obtained.
Data type of values
In this example, if
json_get('{"ID":1,"Address":["City","Xian","Number",1]}','ID')
is executed,1
is returned.Get all keys at a specified position in a JSON object
json_keys(json, jsonPath)
json: string type, JSON object to be operated.
jsonPath: string type, the specified position in the JSON object.
JSON array
In this example, if
json_keys('{"ID":1,"Address":["City","Xian","Number",1]}','$')
is executed,["ID","Address"]
is returned.Query the number of keys at a specified position in a JSON object
json_length(json, jsonPath)
json: string type, JSON object to be operated.
jsonPath: string type, specified position in the JSON object.
NoteIf the value of jsonPath is
"$"
, it is equivalent tojson_length(json)
.
integer
In this example, if
json_length('{"ID":1,"Address":["City","Xian","Number",1]}','$')
is executed,2
is returned.Query the number of keys of the root node of a JSON object
json_length(json)
json: string type, JSON object to be operated.
integer
In this example, if
json_length('{"ID":1,"Address":["City","Xian","Number",1]}')
is executed,2
is returned.Convert JSON string to JSON object
json_parse(json)
json:string type, JSON object to be operated.
Data type of values
In this example, if
json_parse('{"ID":1,"Address":["City","Xian","Number",1]}')
is executed,{"ID":1,"Address":["City","Xian","Number",1]}
is returned.Remove data at a specified position in a JSON object
json_remove(json, jsonPath)
json: string type, JSON object to be operated.
jsonPath: string type, specified position in the JSON object.
JSON
In this example, if
json_remove('{"loginId":100, 'loginTime':'2024-10-10'}','$.loginTime')
is executed,{"loginId":100}
is returned.
Global functions
Flow control functions
Operation
Syntax
Description
Example
if statement
e_if(bool_expr, func_invoke)
bool_expr: a Boolean constant or a function. A Boolean constant: true or false. A function: op_gt(`id`, 10).
func_invoke: a function. e_drop,e_keep,e_set,e_if,e_compose
e_if(op_gt(`id`, 10), e_drop()). In this example, if an entry whose value of the id column is greater than 10, this entry is dropped.
if-else statement
e_if_else(bool_expr, func_invoke1, func_invoke2)
bool_expr: a Boolean constant or a function. A Boolean constant: true or false. A function: op_gt(`id`, 10).
func_invoke1: a function. Invoke this function if the condition is true.
func_invoke2: a function. Invoke this function if the condition is false.
e_if_else(op_gt(`id`, 10), e_set(`tag`, 'large'), e_set(`tag`, 'small')). In this example, if an entry whose value of the id column is greater than 10, the tag column is set to large. Otherwise, the tag column is set to small.
SWITCH statement that contains multiple conditions and a default operation
s_switch(condition1, func1, condition2, func2, ..., default = default_func)
condition1: a Boolean constant or a function. A Boolean constant: true or false. A function: op_gt(`id`, 10).
func_invoke: a function. If condition1 is true, invoke this function and finish the statement. Otherwise, proceed to the next condition.
default_func: a function. If the preceding conditions are false, invoke this function.
e_switch(op_gt(`id`, 100), e_set(`str_col`, '>100'), op_gt(`id`, 90), e_set(`str_col`, '>90'), default=e_set(`str_col`, '<=90')).
Combination of multiple operations
e_compose(func1, func2, func3, ...)
func1: a function. Valid values: e_set, e_drop, and e_if.
func2: a function. Valid values: e_set, e_drop, and e_if.
e_compose(e_set(`str_col`, 'test'), e_set(`dt_col`, dt_now())). In this example, the value of the str_col column is set to test, and the value of the dt_col column is set to the current time.
Data manipulation functions
Operation
Syntax
Description
Example
Drop an entry so that it is not synchronized to the destination database
e_drop()
N/A
e_if(op_gt(`id`, 10), e_drop()). This example discards records with an ID greater than 10.
Retain an entry so that it is synchronized to the destination database
e_keep(condition)
condition: a Boolean expression.
e_keep(op_gt(id, 1)). This example synchronizes only data with an ID greater than 1.
Specify the value of a column
e_set(`col`, val, NEW)
col: the name of the column.
val: a constant or a function. The data type of the value of val must match that of col.
NEW: a data type the value of val that is converted from that of col. This is an optional field.
ImportantIf NEW is not specified, do not specify the comma (,) before NEW. Make sure that data types are converted in a compatible way. Otherwise, an error occurs.
e_set(`dt_col`, dt_now()). In this example, the value of the dt_col column is set to the current time.
e_set(`col1`, `col2` + 1). In this example, the value of the col1 column is set to the value of the col2 column plus 1.
e_set(`col1`, 1, NEW). In this example, the col1 column is converted to a numeric type and the value is set to 1.
Retain fields, drop fields, and use the field name mapping feature during data migration or synchronization between MongoDB databases
e_expand_bson_value('*', 'fieldA',{"fieldB":"fieldC"})
*: the name of the field to be retained. * indicates all fields.
fieldA: the name of the field to be dropped.
{"fieldB":"fieldC"}: the field name mapping. fieldB indicates the name of the field in the source database, and fieldC indicates the name of the field in the destination database.
NoteThe field name mapping is an optional expression.
e_expand_bson_value("*", "_id,name"). In this example, the fields other than _id and name are written to the destination database.