All Products
Search
Document Center

DataWorks:Configure an offline sharding synchronization task

Last Updated:Jan 13, 2026

In DataWorks, you can configure offline sync tasks in the codeless UI or code editor to synchronize data from sharded databases and tables to a single destination table. This topic uses an example to demonstrate data synchronization from sharded MySQL databases and tables.

Prerequisites

Prepare the data sources that you want to synchronize. For more information, see Supported data sources and read/write plugins.

Overview

You can use one of the following methods to create an offline sync task for sharded databases and tables. The following table compares these methods.

Differences

Sharded data source + codeless UI (recommended)

Standard data source + codeless UI

Code editor

Codeless UI support

Yes

Yes

No

Table name configuration by rule

You can configure source table names with regular expressions. At runtime, the task searches for and synchronizes matching tables based on the regular expression.

Not supported.

You can configure table names with a numeric range, such as tb_[1-10]. The range must be continuous, and each child table must exist.

Requires identical table schemas

You can configure a missing field policy to allow some tables to have missing fields. These fields are then set to NULL.

Requires identical table schemas.

Field mapping reference

The first matching table in the metadata source of the sharded data source.

The first table of the first data source.

Manually configured in the code editor.

Number of supported data sources

A sharded data source can reference a maximum of 5,000 data sources.

You can configure a maximum of 50 data sources for a single node.

Requires node modification and publishing to add a data source

You do not need to modify the node. After you modify the data source, the changes take effect on new instances.

You must modify the task to add the data source and configure its table name.

Supported data source types

MySQL, PolarDB, PolarDB-O, OceanBase

MySQL, PolarDB, AnalyticDB, OceanBase

MySQL, PolarDB, AnalyticDB, SQL Server, Oracle, PostgreSQL, DM, DB2, OceanBase

Sharded data source + codeless UI

  1. Go to the Data Integration page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Integration > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

  2. In the navigation pane on the left, click Data Source and then click Add Data Source. Select a data source type. This method supports MySQL, PolarDB, and PolarDB-O.

  3. Create each sharded database as a standard data source. This topic uses MySQL as an example. For more information, see Data Source Configuration.

    image

  4. Create a Sharding data source. Merge the standard data sources into a single source for data synchronization.

    1. Click Add Data Source and select Sharding.

    2. Select a sharding data source type and configure the parameters. This topic uses MySQL (Sharding) as an example.

      image

      Key parameter descriptions:

      • Customize the Data Source Name.

      • Select Data Source: Select the standard data sources that you created for each sharded database.

      • Meta Data Source: For sharded database and table synchronization, the schemas of all databases and tables across all data sources must be identical. Select a data source to use as the template for the default metadata. This allows the sync task to pull the default database and table schemas during configuration.

        Important

        If the database and table schemas within the data sources are not identical, the sync task fails.

  5. Create an offline sync node.

  6. Configure the sharded database and table sync task.

    In the codeless UI, set the Data Source parameter to MySQL (Sharding) and select an existing sharded data source. For more information about how to configure the task, see Codeless UI Configuration.

    Note

    This topic uses MySQL (Sharded) as an example. You can select a data source type as needed.

    image

  7. Click Next.

  8. Select the tables that you want to synchronize, click Save and Publish, and then complete the subsequent steps.

    image

Standard data source + codeless UI

  1. Create each sharded database as a standard data source. This topic uses MySQL as an example. For more information, see Data Source Configuration.

    image

  2. Create an offline sync node.

  3. Configure the sharded database and table sync task.

    In the codeless UI, configure the sharded database and table sync task. In the Source section, click + Edit Source to add multiple data sources. For more information about how to configure the task, see Configure in codeless UI.

    image

  4. Add the standard data sources to the Selected Data Sources list and click OK.

    image

  5. Click Next.

  6. For each data source, select the tables that you want to synchronize. Click Save and Publish, and then complete the subsequent steps.

    Important

    By default, the Data Sources for Sharded Databases Use Same Account And Password option is selected. This means that all sharded data sources use the account and password that are configured for the first data source to access the databases. If your sharded databases use different accounts and passwords, deselect this option. If you deselect this option, the account and password that are configured in each respective data source are used.

    image

Code editor

  1. Create an offline sync node.

  2. Configure the sharded database and table sync task.

    The following example script shows how to configure sharding in code editor mode. For more information, see Code editor configuration.

    Important

    Before you run the code, delete the comments.

    {
        "type":"job",
        "version":"2.0",
        "steps":[
            {
                "stepType":"mysql",
                "parameter":{
                    "envType":0,
                    "column":[
                        "id",
                        "name"
                    ],
                    "socketTimeout":3600000,
                    "tableComment":"",
                    "connection":[    // Configure the connection based on the number of sharded databases.
                        {
                            "datasource":"datasourceName1",  // Data source 1 for sharding
                            "table":[           // Table list 1 for sharding
                                "tb1"
                            ]
                        },
                        {
                            "datasource":"datasourceName2", // Data source 2 for sharding
                            "table":[          // Table list 2 for sharding
                                "tb2",
                                "tb3"
                            ]
                        }
                    ],
                    "useSpecialSecret":true,// Each data source uses its own password.
                    "where":"",
                    "splitPk":"id",
                    "encoding":"UTF-8"
                    },
                "name":"Reader",
                "category":"reader"
                },
            {
                "stepType":"odps",
                "parameter":{
                    "partition":"pt=${bizdate}",
                    "truncate":true,
                    "datasource":"odpsname",
                    "envType":0,
                    "isSupportThreeModel":false,
                    "column":[
                        "id",
                        "name"
                    ],
                    "emptyAsNull":false,
                    "tableComment":"",
                    "table":"t1",
                    "consistencyCommit":false
                    },
                "name":"Writer",
                "category":"writer"
                }
        ],
        "setting":{
            "executeMode":null,
            "errorLimit":{
                "record":""
                },
            "speed":{
                "concurrent":2,
                "throttle":false
                }
        },
        "order":{
            "hops":[
                {
                    "from":"Reader",
                    "to":"Writer"
                    }
            ]
        }
    }