All Products
Search
Document Center

DataWorks:Scenario: Configure a batch synchronization node to synchronize data from tables in sharded databases

Last Updated:Dec 12, 2023

DataWorks allows you to configure a batch synchronization node by using the codeless user interface (UI) or code editor. You can configure a batch synchronization node to synchronize data from tables in sharded databases to a single table. In this example, a batch synchronization node is configured to synchronize data from tables in sharded MySQL databases.

Prerequisites

The data sources from which you want to read data and to which you want to write data are prepared. For more information, see Supported data source types, Reader plug-ins, and Writer plug-ins.

Background information

  • Synchronization of data from tables in sharded databases requires that all source tables have the same table schema. When you configure a batch synchronization node to synchronize data from tables in sharded databases, mappings between the source fields and destination fields are displayed based on the first source table.

  • You can configure a batch synchronization node to synchronize data from tables in sharded databases of data source types, such as MySQL, PolarDB, AnalyticDB, SQL Server, Oracle, and PostgreSQL. You can use the codeless UI to configure a batch synchronization node to synchronize data from tables in sharded databases in a MySQL, PolarDB, or AnalyticDB data source.

Procedure

  1. Create a batch synchronization node to synchronize data from tables in sharded MySQL databases.

  2. Configure the batch synchronization node.

    • If you configure the batch synchronization node by using the codeless UI, click Add sub database and table to add databases and tables when you configure the source. For information about how to configure a batch synchronization node by using the codeless UI, see Configure a batch synchronization node by using the codeless UI.

      Note

      You can use the codeless UI to configure a batch synchronization node to synchronize data from tables in only sharded MySQL databases. If you want to configure a batch synchronization node to synchronize data from tables in other types of sharded databases, use the code editor.

    • If you want to configure the batch synchronization node by using the code editor, refer to the following sample code. For more information about how to configure a batch synchronization node by using the code editor, see Configure a batch synchronization node by using the code editor.

      Important

      Delete the comments from the following sample code before you run the code.

      {
          "type":"job",
          "version":"2.0",
          "steps":[
              {
                  "stepType":"mysql",
                  "parameter":{
                      "envType":0,
                      "column":[
                          "id",
                          "name"
                      ],
                      "socketTimeout":3600000,
                      "tableComment":"",
                      "connection":[    // The name of the source.
                          {
                              "datasource": "datasourceName1"  // The name of the first sharded MySQL database.
                              "table":[           // The names of the source tables in the first sharded MySQL database.
                                  "tb1"
                              ]
                          },
                          {
                              "datasource": "datasourceName2" // The name of the second sharded MySQL database.
                              "table":[          // The names of the source tables in the second sharded MySQL database.
                                  "tb2",
                                  "tb3"
                              ]
                          }
                      ],
                      "useSpecialSecret": true, // Indicates whether to use the password dedicated to each MySQL database.
                      "where":"",
                      "splitPk":"id",
                      "encoding":"UTF-8"
                      },
                  "name":"Reader",
                  "category":"reader"
                  },
              {
                  "stepType":"odps",
                  "parameter":{
                      "partition":"pt=${bizdate}",
                      "truncate":true,
                      "datasource":"odpsname",
                      "envType":0,
                      "isSupportThreeModel":false,
                      "column":[
                          "id",
                          "name"
                      ],
                      "emptyAsNull":false,
                      "tableComment":"",
                      "table":"t1",
                      "consistencyCommit":false
                      },
                  "name":"Writer",
                  "category":"writer"
                  }
          ],
          "setting":{
              "executeMode":null,
              "errorLimit":{
                  "record":""
                  },
              "speed":{
                  "concurrent":2,
                  "throttle":false
                  }
          },
          "order":{
              "hops":[
                  {
                      "from":"Reader",
                      "to":"Writer"
                      }
              ]
          }
      }
  3. Commit and deploy the batch synchronization node.