AnalyticDB for MySQL 3.0 data synchronization capabilities - DataWorks

The AnalyticDB for MySQL 3.0 data source provides a bidirectional channel to read data from and write data to AnalyticDB for MySQL 3.0. This topic describes the data synchronization capabilities of DataWorks for AnalyticDB for MySQL 3.0.

Limits

Sync tasks for AnalyticDB for MySQL Data Lakehouse Edition data sources cannot be configured or run in a public resource group.
If you switch an instance from Data Warehouse Edition to Data Lakehouse Edition, sync tasks that use the data source and run in a public resource group will fail. Before you switch, check for sync tasks that run in the public resource group and move them to a Serverless resource group (recommended) or an exclusive resource group for Data Integration.
Offline sync supports reading data from views.

Supported field types

Offline read

The following table lists the data type mappings for AnalyticDB for MySQL 3.0 Reader.

Type category	AnalyticDB for MySQL 3.0 type
Integer	INT, INTEGER, TINYINT, SMALLINT, and BIGINT
Floating-point	FLOAT, DOUBLE, and DECIMAL
String	VARCHAR
Date and time	DATE, DATETIME, TIMESTAMP, and TIME
Boolean	BOOLEAN

Offline write

The following table lists the data type mappings for AnalyticDB for MySQL 3.0 Writer.

Type	AnalyticDB for MySQL 3.0 data type
Integer	INT, INTEGER, TINYINT, SMALLINT, and BIGINT
Floating-point	FLOAT, DOUBLE, and DECIMAL
String	VARCHAR
Date and time	DATE, DATETIME, TIMESTAMP, and TIME
Boolean	BOOLEAN

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Appendix: Script demo and parameter description

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script demo

{
"type": "job",
"steps": [
{ 
"stepType": "analyticdb_for_mysql", // The plug-in name.
"parameter": {
"column": [ // The column names.
"id",
"value",
"table"
],
"connection": [
{
"datasource": "xxx", // The data source.
"table": [ // The table name.
"xxx"
]
}
],
"where": "", // The filter condition.
"splitPk": "", // The split key.
"encoding": "UTF-8" // The encoding format.
},
"name": "Reader",
"category": "reader"
},
{ 
"stepType": "stream",
"parameter": {},
"name": "Writer",
"category": "writer"
}
],
"version": "2.0",
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "0" // The maximum number of error records allowed during synchronization.
},
"speed": {
"throttle":true,// If throttle is set to false, the mbps parameter does not take effect and the data rate is not limited. If throttle is set to true, the data rate is limited.
                        "concurrent":1, // The number of concurrent jobs.
                      "mbps":"12"// The maximum data rate. 1 mbps = 1 MB/s.
}
}
}

Reader script parameters

Parameter	Description	Required	Default value
datasource	The name of the data source. The value must be the same as the name of the data source that you added.	Yes	None
table	The table that you want to synchronize.	Yes	None
column	The columns in the source table that you want to synchronize. Use a JSON array to specify the columns. By default, all columns are synchronized. Example: [ * ]. Column pruning is supported. You can select specific columns to export. Column reordering is supported. You can export columns in an order different from the table schema. Constants are supported. You must follow the MySQL syntax. Example: [“id”, “`table`“, “1”, “‘bazhen.csy’”, “null”, “to_char(a + 1)”, “2.3” , “true”]. id is a regular column name. The table parameter contains reserved column names. 1 is an integer constant. bazhen.csy is a string constant. null is a null pointer. to_char(a + 1) is a function expression. 2.3 is a floating-point number. true is a Boolean value. You must specify the columns to synchronize. The column parameter cannot be empty.	Yes	None
splitPk	When AnalyticDB for MySQL 3.0 Reader extracts data, if you specify splitPk, this indicates that you want to use the field represented by splitPk for data partitioning. Data synchronization then starts concurrent tasks, which improves data synchronization efficiency. We recommend using the table's primary key for the splitPk parameter because a primary key is typically evenly distributed, making the resulting shards less prone to data hot spots. Currently, splitPk supports partitioning only for integer data. It does not support strings, floating-point numbers, dates, or other types. If you specify an unsupported type, the splitPk setting is ignored and a single channel is used for synchronization. If you do not specify splitPk, for example, if you do not provide splitPk or if the splitPk value is empty, data synchronization is performed using a single channel to synchronize the table data.	No	None
where	The filter condition. In many business scenarios, you might want to synchronize only the data from the current day. To do this, set the where condition to `gmt_create>$bizdate`. The where condition can be used to effectively perform incremental synchronization. If you do not specify a where statement, which includes not providing a key or value for where, the data synchronization is considered a full data synchronization. You cannot set the where condition to limit 10. This violates the constraints of a MySQL SQL WHERE clause.	No	None

Writer script demo

{
"type": "job",
"steps": [
{
"stepType": "stream",
"parameter": {},
"name": "Reader",
"category": "reader"
},
{
"stepType": "analyticdb_for_mysql", // The plug-in name.
"parameter": {
"postSql": [], // The SQL statement to execute after the import.
"tableType": null, // A reserved field. The default value is empty.
"datasource": "hangzhou_ads", // The name of the data source.
"column": [ // The columns to synchronize.
"id",
"value"
],
"guid": null,
"writeMode": "insert", // The write mode. For more information, see the description of the writeMode parameter.
"batchSize": 2048, // The number of records in each batch. For more information, see the description of the batchSize parameter.
"encoding": "UTF-8", // The encoding format.
"table": "t5", // The destination table.
"preSql": [] // The SQL statement to execute before the import.
},
"name": "Writer",
"category": "writer"
}
],
"version": "2.0", // The version number of the configuration file format.
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "0" // The number of error records.
},
"speed": {
"throttle":true,// If throttle is set to false, the mbps parameter does not take effect and the data rate is not limited. If throttle is set to true, the data rate is limited.
                        "concurrent":2, // The number of concurrent jobs.
                        "mbps":"12"// The maximum data rate. 1 mbps = 1 MB/s.
}
}
}

Writer script parameters

Parameter	Description	Required	Default value
datasource	The name of the data source. The value must be the same as the name of the data source that you added.	Yes	None
table	The tables selected for synchronization.	Yes	None
writeMode	The write mode. Valid values: insert, replace, and update. insert: If no primary key or unique index conflict occurs, the data is written. If a conflict occurs, the current data row is ignored and not updated. replace: If no primary key or unique index conflict occurs, the data is written. If a conflict occurs, the conflicting row is deleted and the new row is inserted. The new row replaces all fields of the original row. update: If no primary key or unique index conflict occurs, the data is written. If a conflict occurs, the new row replaces all fields of the original row. Note This mode is supported only in the code editor.	No	insert
column	The columns in the destination table to which you want to write data. Separate columns with commas. Example: `"column": ["id", "name", "age"]`. To write to all columns in order, use an asterisk (). Example: `"column": [""]`. Note If a field name contains select, enclose the name in backticks. For example, write item_select_no as `item_select_no`.	Yes	None
preSql	The SQL statement to execute before the data sync task runs. In the codeless UI, you can execute only one SQL statement. In the code editor, you can execute multiple SQL statements, for example, to purge old data. Note Transactions are not supported for multiple SQL statements.	No	None
postSql	The SQL statement to execute after the data sync task runs. In the codeless UI, you can execute only one SQL statement. In the code editor, you can execute multiple SQL statements, for example, to add a timestamp. Note Transactions are not supported for multiple SQL statements.	No	None
batchSize	The number of records to submit in a single batch. A larger value can significantly reduce network interactions between the data sync system and MySQL and improve overall throughput. If this value is too large, an out-of-memory (OOM) error may occur in the data sync process.	No	1024