DataWorks Data Integration supports PolarDB-X 2.0 as both a source and destination for offline (batch) synchronization tasks. This page covers supported capabilities, prerequisites, and script parameter reference for PolarDB-X 2.0 Reader and Writer.
Setup overview
To synchronize data between PolarDB-X 2.0 and other systems, complete these steps:
-
Confirm you are using PolarDB-X 2.0 (not PolarDB-X 1.0).
-
Grant the required permissions to the database account DataWorks will use.
-
Add the PolarDB-X 2.0 data source in DataWorks.
-
Configure and run an offline synchronization task.
Supported versions
Offline read and write: PolarDB-X 2.0. Offline synchronization can also read data from views.
Limits
PolarDB-X 2.0 data sources support serverless resource groups (recommended) and exclusive resource groups for Data Integration.
Supported field types
For a complete list of PolarDB-X 2.0 field types, see Data types. The table below lists the major field types and their support status.
| Field type | Offline read (PolarDB-X 2.0 Reader) | Offline write (PolarDB-X 2.0 Writer) |
|---|---|---|
| TINYINT | Supported | Supported |
| SMALLINT | Supported | Supported |
| INTEGER | Supported | Supported |
| BIGINT | Supported | Supported |
| FLOAT | Supported | Supported |
| DOUBLE | Supported | Supported |
| DECIMAL/NUMERIC | Supported | Supported |
| REAL | Not supported | Not supported |
| VARCHAR | Supported | Supported |
| JSON | Supported | Supported |
| TEXT | Supported | Supported |
| MEDIUMTEXT | Supported | Supported |
| LONGTEXT | Supported | Supported |
| VARBINARY | Supported | Supported |
| BINARY | Supported | Supported |
| TINYBLOB | Supported | Supported |
| MEDIUMBLOB | Supported | Supported |
| LONGBLOB | Supported | Supported |
| ENUM | Supported | Supported |
| SET | Supported | Supported |
| BOOLEAN | Supported | Supported |
| BIT | Supported | Supported |
| DATE | Supported | Supported |
| DATETIME | Supported | Supported |
| TIMESTAMP | Supported | Supported |
| TIME | Supported | Supported |
| YEAR | Supported | Supported |
| LINESTRING | Not supported | Not supported |
| POLYGON | Not supported | Not supported |
| MULTIPOINT | Not supported | Not supported |
| MULTILINESTRING | Not supported | Not supported |
| MULTIPOLYGON | Not supported | Not supported |
| GEOMETRYCOLLECTION | Not supported | Not supported |
Prerequisites
Before you begin, ensure that you have:
-
Confirmed you are running PolarDB-X 2.0. For PolarDB-X 1.0, use the DRDS data source instead.
-
A PolarDB-X 2.0 account with the permissions described below.
Grant account permissions
Create a dedicated PolarDB-X 2.0 account for DataWorks access, then grant the appropriate permissions based on your synchronization scenario.
Offline read (SELECT permission on source table)
The account must have the SELECT permission on the source table.
Offline write (write permissions on destination table)
The account must have INSERT, DELETE, and UPDATE permissions on the destination table.
Real-time synchronization — full database (binary logging access)
-
Privileged account: Can read binary logging (binlog) data by default.
-
Standard account: Grant SELECT, REPLICATION SLAVE, and REPLICATION CLIENT permissions using a privileged account:
-- Create a sync account and allow login from any host (% represents any host)
-- CREATE USER 'sync_account'@'%' IDENTIFIED BY 'password';
-- Grant permissions for real-time (CDC) synchronization
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'sync_account'@'%';
Add a data source
Add the PolarDB-X 2.0 data source to DataWorks before configuring any synchronization task. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.
Configure an offline synchronization task
For the entry point and configuration procedure, see Configure an offline sync task in the code editor.
For the script format and all available parameters, see Appendix: Script demo and parameter descriptions below.
Appendix: Script demo and parameter descriptions
Use the code editor to configure batch synchronization tasks in JSON format. For the unified script format requirements, see Configure a task in the code editor.
All examples use "type": "job" and "version": "2.0" at the top level.
Reader script demo
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "polardbx20",
"parameter": {
"connection": [
{
"datasource": "",
"table": [
"t1"
]
}
],
"column": [
"c1",
"c2",
"'const'"
],
"where": "",
"splitPk": "",
"checkSlave": "true",
"slaveDelayLimit": "300"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Reader script parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
datasource |
The data source name. Must match the name configured on the Data Source Management page. | Yes | None |
table |
The table to synchronize. Only a single table is supported per connection block. | Yes | None |
column |
The columns to synchronize, as a JSON array. Use ["*"] to include all columns. Cannot be blank. Supports column pruning (select specific columns), column reordering (order need not match the table schema), and constants following PolarDB-X 2.0 SQL syntax. Example: ["id", "table", "1", "'mingya.wmy'", "'null'", "to_char(a+1)", "2.3", "true"]. |
Yes | None |
splitPk |
The column to use for data partitioning, enabling concurrent reads. Set to the primary key for balanced shards. Supports integer-type columns only — string, floating-point, and date columns are ignored, and data falls back to a single channel. If blank or omitted, data is read through a single channel. | No | None |
where |
A SQL WHERE filter condition for incremental synchronization. For example, gmt_create>$bizdate synchronizes only the current day's data. Cannot be set to LIMIT 10. If omitted, all data is synchronized. |
No | None |
checkSlave |
When the data source is a read-only instance, checks replication lag before the task starts to prevent data loss. | No | true |
slaveDelayLimit |
The maximum allowed replication lag in seconds. If the actual lag exceeds this value, the task fails. | No | 30 |
Writer script demo
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "stream",
"parameter": {},
"name": "Reader",
"category": "reader"
},
{
"stepType": "PolarDB-X 2.0",
"parameter": {
"postSql": [],
"datasource": "",
"column": [
"id",
"value"
],
"writeMode": "insert",
"batchSize": 1024,
"table": "",
"preSql": [
"delete from XXX;"
]
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Writer script parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
datasource |
The data source name. Must match the name configured on the Data Source Management page. | Yes | None |
table |
The destination table name. | Yes | None |
column |
The destination columns to write to, as a JSON array. Example: ["id", "name", "age"]. Use ["*"] to write to all columns in schema order. |
Yes | None |
writeMode |
The write conflict mode. Set to insert (insert into) or replace (replace into). See Write modes below. |
No | insert |
preSql |
SQL statement(s) to run before the task starts — for example, truncate table tablename;. In the codeless UI, only one statement is allowed. In the code editor, multiple statements are supported. Transactions are not supported for multiple statements. |
No | None |
postSql |
SQL statement(s) to run after the task completes — for example, adding a timestamp column. In the codeless UI, only one statement is allowed. In the code editor, multiple statements are supported. Transactions are not supported for multiple statements. | No | None |
batchSize |
The number of records submitted per batch. A larger value reduces network round trips and improves throughput, but may cause memory overflow if set too high. | No | 256 |
Write modes
| Mode | Script value | Behavior on conflict |
|---|---|---|
| insert into | insert |
If a primary key or unique index conflict occurs, the conflicting row is skipped and recorded as dirty data. |
| replace into | replace |
If no conflict occurs, behaves the same as insert into. If a conflict occurs, the existing row is deleted and the new row is inserted, replacing all fields. |
Job-level settings
| Parameter | Description | Default |
|---|---|---|
errorLimit.record |
The number of error records allowed before the task fails. | "0" |
speed.throttle |
Whether to apply a rate limit. Set to true to enable; false disables the rate limit and the mbps parameter has no effect. |
true |
speed.concurrent |
The number of concurrent channels. | 1 |
speed.mbps |
The maximum synchronization rate in Mbps. Controls read/write pressure on the source and destination. Takes effect only when throttle is true. |
"12" |