DataWorks provides HybridDB for MySQL Reader and HybridDB for MySQL Writer to synchronize data between HybridDB for MySQL and other data sources.
Limits
-
HybridDB for MySQL Reader can read tables and views. When reading a table, you can select all or a subset of columns, reorder columns, add constant fields, and use HybridDB for MySQL functions such as
now(). -
Views are supported in batch synchronization.
Data type mappings
HybridDB for MySQL Reader and Writer map data types as follows.
| Category | HybridDB for MySQL data types | Notes |
|---|---|---|
| Integer | INT, TINYINT, SMALLINT, MEDIUMINT, BIGINT, YEAR | TINYINT(1) is treated as an integer data type |
| Floating point | FLOAT, DOUBLE, DECIMAL | |
| String | VARCHAR, CHAR, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT | |
| Date and time | DATE, DATETIME, TIMESTAMP, TIME | |
| Boolean | BOOL | |
| Binary | TINYBLOB, MEDIUMBLOB, BLOB, LONGBLOB, VARBINARY |
Data types not listed above are not supported.
Add a data source
Add the HybridDB for MySQL data source to DataWorks before creating a synchronization task. For instructions, see Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.
Develop a data synchronization task
Synchronize data from a single table
Use either of the following methods:
For the full list of parameters and sample code for the code editor, see Appendix: Code and parameters.
Synchronize all data from a database
See Configure a real-time full-database synchronization task.
Appendix: Code and parameters
HybridDB for MySQL Reader
Sample code
{
"type": "job",
"steps": [
{
"parameter": {
"datasource": "px_aliyun_hymysql",
"column": [
"id",
"name",
"sex",
"salary",
"age",
"pt"
],
"where": "id=10001",
"splitPk": "id",
"table": "person"
},
"name": "Reader",
"category": "reader"
},
{
"parameter": {}
}
],
"version": "2.0",
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": ""
},
"speed": {
"concurrent": 7,
"throttle": true,
"mbps": 1
}
}
}
Parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
datasource |
Name of the data source. Must match the name configured in DataWorks. | Yes | None |
table |
Name of the source table. Each synchronization task reads from one table. | Yes | None |
column |
Columns to read, specified as a JSON array. Use ["*"] to read all columns. Supports reordering, constant values, and expressions — for example, ["id", "table", "1", "'mingya.wmy'", "'null'", "to_char(a+1)", "2.3", "true"]. |
Yes | None |
splitPk |
Column used to shard data for parallel reads. Set to a primary key column for the most even distribution. Only integer columns are supported — string, floating-point, and date columns are ignored, and the reader falls back to a single thread. | No | None |
where |
WHERE clause to filter rows, for example gmt_create > $bizdate to read only today's data. Leave blank to read all rows. Do not use LIMIT clauses. |
No | None |
querySql |
Custom SQL for advanced filtering or multi-table joins. Overrides table, column, where, and splitPk when set. |
No | None |
singleOrMulti |
Required for sharded tables. Set to multi. |
Yes (sharded tables only) | multi |
When configuring a sharded table in the code editor directly (without switching from the codeless UI), "singleOrMulti": "multi" is not generated automatically. Add it manually. Without this parameter, the reader reads data only from the first shard.
WhenquerySqlis set, the reader ignorestable,column,where, andsplitPk. Connection credentials are still read fromdatasource.
HybridDB for MySQL Writer
Sample code
{
"type": "job",
"steps": [
{
"parameter": {}
},
{
"parameter": {
"postSql": [],
"datasource": "px_aliyun_hy***",
"column": [
"id",
"name",
"sex",
"salary",
"age",
"pt"
],
"writeMode": "insert",
"batchSize": 256,
"encoding": "UTF-8",
"table": "person_copy",
"preSql": []
},
"name": "Writer",
"category": "writer"
}
],
"version": "2.0",
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": ""
},
"speed": {
"concurrent": 7,
"throttle": true,
"mbps": 1
}
}
}
Write modes
| Mode | Behavior when no conflict | Behavior when primary key or unique index conflict occurs |
|---|---|---|
insert (default) |
Writes data directly | Cannot write to conflicting rows; conflicting data is treated as dirty data |
replace |
Writes data directly | Updates the specified fields in the conflicting rows |
Parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
datasource |
Name of the data source. Must match the name configured in DataWorks. | Yes | None |
table |
Name of the destination table. | Yes | None |
writeMode |
Write mode. See Write modes above. | No | insert |
column |
Columns to write, specified as a JSON array — for example, ["id", "name", "age"]. Use ["*"] to write to all columns. |
Yes | None |
preSql |
SQL statement to run before the synchronization task, for example to delete outdated data. The codeless UI supports one statement; the code editor supports multiple statements. | No | None |
postSql |
SQL statement to run after the synchronization task, for example to add a timestamp. The codeless UI supports one statement; the code editor supports multiple statements. | No | None |
batchSize |
Number of records to write per batch. Tune based on your workload. A value that is too large may cause out of memory (OOM) errors. | No | 1,024 |