All Products
Search
Document Center

DataWorks:Configure a batch sharding synchronization task

Last Updated:Mar 26, 2026

Synchronize data from tables in sharded databases

DataWorks offline sync tasks can read from multiple sharded databases and merge all records into a single destination table. This topic walks through three methods for configuring this, using MySQL sharded databases as the example.

Prerequisites

Before you begin, ensure that you have:

Choose a method

Sharded data source + codeless UI Standard data source + codeless UI Code editor
When to use Regex-based table matching; schemas may differ across shards; more than 50 data sources; need to add shards without republishing the task Standard MySQL/PolarDB/AnalyticDB sources already configured; fewer than 50 shards; identical schemas across all shards Broad database support (SQL Server, Oracle, PostgreSQL, DM, DB2); custom JSON configuration required
Codeless UI Yes Yes No
Table name configuration Regular expression — DataWorks searches for matching tables at runtime Not supported Numeric range, e.g. tb_[1-10]; range must be continuous; each child table must exist
Schema requirement Schemas can differ; configure a missing field policy to set missing fields to NULL All shards must have identical schemas Manually configured in code
Field mapping reference First matching table in the Meta Data Source First table of the first data source Defined manually in the code editor
Max data sources 5,000 50
Adding a shard No task republish needed; changes take effect on new instances Must modify and republish the task
Supported databases MySQL, PolarDB, PolarDB-O, OceanBase MySQL, PolarDB, AnalyticDB, OceanBase MySQL, PolarDB, AnalyticDB, SQL Server, Oracle, PostgreSQL, DM, DB2, OceanBase

Method 1: Sharded data source + codeless UI (recommended)

This method supports regex-based table matching and scales to 5,000 data sources. Adding a new shard does not require republishing the task.

Supported databases: MySQL, PolarDB, PolarDB-O, OceanBase

  1. Go to the Data Integration page. Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Integration > Data Integration. Select the target workspace from the drop-down list and click Go to Data Integration.

  2. In the left navigation pane, click Data Source, then click Add Data Source. Select a data source type. This step supports MySQL, PolarDB, and PolarDB-O. Create each sharded database as a standard data source. This topic uses MySQL as an example. For details, see Data Source Configuration.

    image

  3. Create a Sharding data source to merge the individual data sources into a single logical source.

    1. Click Add Data Source and select Sharding.

    2. Select a sharding data source type and configure the parameters. This topic uses MySQL (Sharding) as an example.

      image

      Key parameters:

      Parameter Description
      Data Source Name A name for the sharding data source
      Select Data Source The standard data sources you created for each sharded database
      Meta Data Source The data source used as the schema template. DataWorks pulls the default table schema from this source during task configuration.
      Important

      If the table schemas across data sources are not identical, the sync task fails.

  4. Create an offline sync node.

  5. In the codeless UI, set Data Source to MySQL (Sharding) and select the sharding data source you created. For full configuration details, see Codeless UI configuration.

    image

  6. Click Next.

  7. Select the tables to synchronize, click Save and Publish, and complete the remaining steps.

    image

Method 2: Standard data source + codeless UI

Use this method when your shards are already registered as standard data sources and all shards have identical table schemas.

Supported databases: MySQL, PolarDB, AnalyticDB, OceanBase

Important

This method supports a maximum of 50 data sources per node. All shards must have identical table schemas.

  1. Create each sharded database as a standard data source. For details, see Data Source Configuration.

    image

  2. Create an offline sync node.

  3. In the codeless UI, go to the Source section and click + Edit Source to add multiple data sources. For details, see Codeless UI configuration.

    image

  4. Add the standard data sources to the Selected Data Sources list and click OK.

    image

  5. Click Next.

  6. For each data source, select the tables to synchronize. Click Save and Publish, and complete the remaining steps.

    Important

    By default, Data Sources for Sharded Databases Use Same Account And Password is selected, which means all sharded data sources use the account and password of the first data source. If your sharded databases use different credentials, deselect this option so that each data source uses its own account and password.

    image

Method 3: Code editor

Use this method when you need to configure sharding across databases that include SQL Server, Oracle, PostgreSQL, DM, or DB2, or when you need custom JSON configuration.

Supported databases: MySQL, PolarDB, AnalyticDB, SQL Server, Oracle, PostgreSQL, DM, DB2, OceanBase

  1. Create an offline sync node.

  2. Switch to the code editor and configure the sync task. For details, see Code editor configuration. The following example configures sharding across two data sources. Add a connection entry for each sharded database.

    Important

    Delete all inline comments before running the task. The JSON parser does not support // comments.

    Parameter Description
    connection Array of shard configurations. Add one entry per data source.
    datasource Name of the data source as registered in DataWorks
    table List of table names in this data source to include in the sync
    useSpecialSecret Set to true so each data source uses its own credentials
    splitPk The column used to split reads across concurrent tasks. Use the primary key for best performance.
    encoding Character encoding of the source data
    {
        "type": "job",
        "version": "2.0",
        "steps": [
            {
                "stepType": "mysql",
                "parameter": {
                    "envType": 0,
                    "column": [
                        "id",
                        "name"
                    ],
                    "socketTimeout": 3600000,
                    "tableComment": "",
                    "connection": [
                        {
                            "datasource": "datasourceName1",
                            "table": [
                                "tb1"
                            ]
                        },
                        {
                            "datasource": "datasourceName2",
                            "table": [
                                "tb2",
                                "tb3"
                            ]
                        }
                    ],
                    "useSpecialSecret": true,
                    "where": "",
                    "splitPk": "id",
                    "encoding": "UTF-8"
                },
                "name": "Reader",
                "category": "reader"
            },
            {
                "stepType": "odps",
                "parameter": {
                    "partition": "pt=${bizdate}",
                    "truncate": true,
                    "datasource": "odpsname",
                    "envType": 0,
                    "isSupportThreeModel": false,
                    "column": [
                        "id",
                        "name"
                    ],
                    "emptyAsNull": false,
                    "tableComment": "",
                    "table": "t1",
                    "consistencyCommit": false
                },
                "name": "Writer",
                "category": "writer"
            }
        ],
        "setting": {
            "executeMode": null,
            "errorLimit": {
                "record": ""
            },
            "speed": {
                "concurrent": 2,
                "throttle": false
            }
        },
        "order": {
            "hops": [
                {
                    "from": "Reader",
                    "to": "Writer"
                }
            ]
        }
    }

    Key parameters for the Reader configuration:

What's next