A DRDS (PolarDB-X 1.0) data source lets you read data from and write data to DRDS (PolarDB-X 1.0). This topic describes the data synchronization capabilities of DataWorks for this data source.
Get started
To start synchronizing data with DRDS (PolarDB-X 1.0), complete these steps in order:
-
Create a database account with
replace intopermission. See Prerequisites. -
Add DRDS (PolarDB-X 1.0) as a data source in DataWorks. See Add a data source.
-
Configure and run a synchronization task. See Develop a data synchronization task.
Prerequisites
Before you begin, ensure that you have:
-
A DRDS (PolarDB-X 1.0) database account with the
replace intopermission. For setup instructions, see Create an account.
Limitations
Offline read and write
-
The DRDS (PolarDB-X 1.0) plugin is compatible only with the MySQL engine. DRDS (PolarDB-X 1.0) is a distributed MySQL database, and most of its communication protocols follow MySQL standards.
-
MySQL 8.0 in DRDS (PolarDB-X 1.0) supports Serverless resource groups (recommended) and exclusive resource groups for Data Integration.
-
DRDS (PolarDB-X 1.0) Writer connects to the proxy of a remote DRDS (PolarDB-X 1.0) database through Java Database Connectivity (JDBC) and runs
replace intostatements to write data. The destination table must have a primary key or a unique index to prevent duplicate writes. -
DRDS (PolarDB-X 1.0) Writer retrieves data from a Reader through the data synchronization framework, then writes it using
replace intostatements: The Writer accumulates data and commits it to the DRDS (PolarDB-X 1.0) proxy, which determines whether to route the data to one or more tables.-
If no primary key or unique index conflict occurs, the behavior is equivalent to
insert into. -
If a conflict occurs, the new row replaces all fields of the existing row.
NoteThe task requires at least the
replace intopermission. Additional permissions depend on the SQL statements specified inpreSqlandpostSql. -
-
Reading from views is supported.
Supported field types
DRDS (PolarDB-X 1.0) Reader and Writer support most data types. Verify that your data types are in the following list before configuring a synchronization task.
| Type category | DRDS (PolarDB-X 1.0) data types |
|---|---|
| Integer types | INT, TINYINT, SMALLINT, MEDIUMINT, BIGINT |
| Floating-point types | FLOAT, DOUBLE, DECIMAL |
| String types | VARCHAR, CHAR, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT |
| Date and time types | DATE, DATETIME, TIMESTAMP, TIME, YEAR |
| Boolean types | BIT, BOOL |
| Binary types | TINYBLOB, MEDIUMBLOB, BLOB, LONGBLOB, VARBINARY |
Add a data source
Before developing a synchronization task in DataWorks, add DRDS (PolarDB-X 1.0) as a data source. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.
Develop a data synchronization task
Configure an offline sync task for a single table
-
Configure using the codeless UI or the code editor. See Configure a task in the codeless UI and Configure a task in the code editor.
-
For a full parameter reference and script examples, see Appendix: Script demo and parameters.
Configure an offline sync task for an entire database
FAQ
Why can't DRDS (PolarDB-X 1.0) Reader guarantee data consistency across shards?
DRDS (PolarDB-X 1.0) is a distributed database. When the Reader extracts data from different underlying sharded tables, it captures snapshots at different points in time — not from a single consistent time slice. Strong consistency across sharded databases and tables cannot be guaranteed.
How does encoding affect data synchronization?
DRDS (PolarDB-X 1.0) supports encoding settings at the field, table, database, and instance levels. The priority of encoding settings, from highest to lowest, is field, table, database, and then instance. Set the encoding to UTF-8 at the database level to avoid issues.
DRDS (PolarDB-X 1.0) Reader uses JDBC for data extraction, which handles encoding conversion automatically. However, if the encoding used when writing data to the underlying layer differs from the declared encoding, the Reader cannot detect the mismatch and the synchronization result may contain garbled characters.
How do I configure incremental data synchronization?
DRDS (PolarDB-X 1.0) Reader uses JDBC SELECT statements for data extraction. Use a WHERE clause to filter for only the records added or changed since the last sync:
-
Timestamp-based: If your application writes a
modifyfield with a change timestamp for each record (covering additions, updates, or logical deletions), add aWHEREclause using the timestamp of the last synchronization run. -
Auto-increment ID-based: For append-only data streams, add a
WHEREclause using the maximum auto-increment ID from the previous sync.
If your data has no field that distinguishes new or modified records from existing ones, the Reader cannot support incremental extraction — only full data synchronization is available.
Filter conditions based on physical table names are not supported in the WHERE clause.
Appendix: Script demo and parameters
To configure a batch synchronization task using the code editor, follow the unified script format described in Configure a task in the code editor. The following sections describe the parameters and provide script examples.
Reader script demo
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "drds",
"parameter": {
"datasource": "",
"column": [
"id",
"name"
],
"where": "",
"table": "",
"splitPk": ""
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Reader script parameters
| Parameter | Description | Required | Default value |
|---|---|---|---|
datasource |
The name of the data source. The value must match the name of the data source added in DataWorks. | Yes | None |
table |
The table from which to synchronize data. | Yes | None |
column |
The columns to synchronize, specified as a JSON array. Use ["*"] to select all columns. Supports column pruning, reordering, constants, and MySQL function expressions. For example: ["id", "table", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3", "true"]. |
Yes | None |
where |
The filter condition used to construct the SELECT statement. If left blank, the entire table is synchronized. Supports incremental synchronization via date or ID conditions. For example: STRTODATE('${bdp.system.bizdate}','%Y%m%d') <= today AND today < DATEADD(STRTODATE('${bdp.system.bizdate}', '%Y%m%d'), interval 1 day). |
No | None |
splitPk |
The shard key used for parallel reads. | No | None |
Writer script demo
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "stream",
"parameter": {},
"name": "Reader",
"category": "reader"
},
{
"stepType": "drds",
"parameter": {
"postSql": [],
"datasource": "",
"column": [
"id"
],
"writeMode": "insert ignore",
"batchSize": "1024",
"table": "test",
"preSql": []
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Writer script parameters
| Parameter | Description | Required | Default value |
|---|---|---|---|
datasource |
The name of the data source. The value must match the name of the data source added in DataWorks. | Yes | None |
table |
The destination table to which data is written. | Yes | None |
writeMode |
The write mode. Valid values: insert ignore (ignore rows that violate primary key or constraint rules) and replace into (replace existing rows on conflict). |
No | insert ignore |
column |
The destination columns to write data to, specified as a comma-separated list. Example: ["id", "name", "age"]. Use ["*"] to write to all columns in order. |
Yes | None |
preSql |
SQL statements to run before the synchronization task starts. In the codeless UI, only one statement is supported. In the code editor, multiple statements are supported. Example: delete * from table xxx;. |
No | None |
postSql |
SQL statements to run after the synchronization task completes. In the codeless UI, only one statement is supported. In the code editor, multiple statements are supported. Example: delete * from table xxx where xx=xx;. |
No | None |
batchSize |
The number of records to commit per batch. Larger values reduce network round trips and improve throughput, but very large values may cause out-of-memory (OOM) errors. | No | 1024 |