Synchronize data from tables in sharded databases
DataWorks offline sync tasks can read from multiple sharded databases and merge all records into a single destination table. This topic walks through three methods for configuring this, using MySQL sharded databases as the example.
Prerequisites
Before you begin, ensure that you have:
-
The sharded databases registered as data sources in DataWorks. See Supported data sources and read/write plugins
-
Access to the DataWorks console with permissions to create and publish sync nodes
Choose a method
| Sharded data source + codeless UI | Standard data source + codeless UI | Code editor | |
|---|---|---|---|
| When to use | Regex-based table matching; schemas may differ across shards; more than 50 data sources; need to add shards without republishing the task | Standard MySQL/PolarDB/AnalyticDB sources already configured; fewer than 50 shards; identical schemas across all shards | Broad database support (SQL Server, Oracle, PostgreSQL, DM, DB2); custom JSON configuration required |
| Codeless UI | Yes | Yes | No |
| Table name configuration | Regular expression — DataWorks searches for matching tables at runtime | Not supported | Numeric range, e.g. tb_[1-10]; range must be continuous; each child table must exist |
| Schema requirement | Schemas can differ; configure a missing field policy to set missing fields to NULL | All shards must have identical schemas | Manually configured in code |
| Field mapping reference | First matching table in the Meta Data Source | First table of the first data source | Defined manually in the code editor |
| Max data sources | 5,000 | 50 | — |
| Adding a shard | No task republish needed; changes take effect on new instances | Must modify and republish the task | — |
| Supported databases | MySQL, PolarDB, PolarDB-O, OceanBase | MySQL, PolarDB, AnalyticDB, OceanBase | MySQL, PolarDB, AnalyticDB, SQL Server, Oracle, PostgreSQL, DM, DB2, OceanBase |
Method 1: Sharded data source + codeless UI (recommended)
This method supports regex-based table matching and scales to 5,000 data sources. Adding a new shard does not require republishing the task.
Supported databases: MySQL, PolarDB, PolarDB-O, OceanBase
-
Go to the Data Integration page. Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Integration > Data Integration. Select the target workspace from the drop-down list and click Go to Data Integration.
-
In the left navigation pane, click Data Source, then click Add Data Source. Select a data source type. This step supports MySQL, PolarDB, and PolarDB-O. Create each sharded database as a standard data source. This topic uses MySQL as an example. For details, see Data Source Configuration.

-
Create a Sharding data source to merge the individual data sources into a single logical source.
-
Click Add Data Source and select Sharding.
-
Select a sharding data source type and configure the parameters. This topic uses MySQL (Sharding) as an example.

Key parameters:
Parameter Description Data Source Name A name for the sharding data source Select Data Source The standard data sources you created for each sharded database Meta Data Source The data source used as the schema template. DataWorks pulls the default table schema from this source during task configuration. ImportantIf the table schemas across data sources are not identical, the sync task fails.
-
-
In the codeless UI, set Data Source to MySQL (Sharding) and select the sharding data source you created. For full configuration details, see Codeless UI configuration.

-
Click Next.
-
Select the tables to synchronize, click Save and Publish, and complete the remaining steps.

Method 2: Standard data source + codeless UI
Use this method when your shards are already registered as standard data sources and all shards have identical table schemas.
Supported databases: MySQL, PolarDB, AnalyticDB, OceanBase
This method supports a maximum of 50 data sources per node. All shards must have identical table schemas.
-
Create each sharded database as a standard data source. For details, see Data Source Configuration.

-
In the codeless UI, go to the Source section and click + Edit Source to add multiple data sources. For details, see Codeless UI configuration.

-
Add the standard data sources to the Selected Data Sources list and click OK.

-
Click Next.
-
For each data source, select the tables to synchronize. Click Save and Publish, and complete the remaining steps.
ImportantBy default, Data Sources for Sharded Databases Use Same Account And Password is selected, which means all sharded data sources use the account and password of the first data source. If your sharded databases use different credentials, deselect this option so that each data source uses its own account and password.

Method 3: Code editor
Use this method when you need to configure sharding across databases that include SQL Server, Oracle, PostgreSQL, DM, or DB2, or when you need custom JSON configuration.
Supported databases: MySQL, PolarDB, AnalyticDB, SQL Server, Oracle, PostgreSQL, DM, DB2, OceanBase
-
Switch to the code editor and configure the sync task. For details, see Code editor configuration. The following example configures sharding across two data sources. Add a
connectionentry for each sharded database.ImportantDelete all inline comments before running the task. The JSON parser does not support
//comments.Parameter Description connectionArray of shard configurations. Add one entry per data source. datasourceName of the data source as registered in DataWorks tableList of table names in this data source to include in the sync useSpecialSecretSet to trueso each data source uses its own credentialssplitPkThe column used to split reads across concurrent tasks. Use the primary key for best performance. encodingCharacter encoding of the source data { "type": "job", "version": "2.0", "steps": [ { "stepType": "mysql", "parameter": { "envType": 0, "column": [ "id", "name" ], "socketTimeout": 3600000, "tableComment": "", "connection": [ { "datasource": "datasourceName1", "table": [ "tb1" ] }, { "datasource": "datasourceName2", "table": [ "tb2", "tb3" ] } ], "useSpecialSecret": true, "where": "", "splitPk": "id", "encoding": "UTF-8" }, "name": "Reader", "category": "reader" }, { "stepType": "odps", "parameter": { "partition": "pt=${bizdate}", "truncate": true, "datasource": "odpsname", "envType": 0, "isSupportThreeModel": false, "column": [ "id", "name" ], "emptyAsNull": false, "tableComment": "", "table": "t1", "consistencyCommit": false }, "name": "Writer", "category": "writer" } ], "setting": { "executeMode": null, "errorLimit": { "record": "" }, "speed": { "concurrent": 2, "throttle": false } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }Key parameters for the Reader configuration: