After you configure network environments and resources and add data sources to DataWorks, you can create and run a real-time synchronization node to synchronize data between the data sources. This topic describes how to create a real-time synchronization node to synchronize incremental data from a single table and view the status of the node.
Prerequisites
- The required data sources are configured. Before you configure a real-time synchronization node, you must configure the data sources that you want to use. This way, you can select the data sources when you configure the real-time synchronization node. For more information about the data source types, readers, and writers that are supported by real-time synchronization, see Data source types that support real-time synchronization.
Note For more information about the configurations of data sources, see Overview.
- An exclusive resource group for Data Integration that meets your business requirements is purchased. For more information, see Create and use an exclusive resource group for Data Integration.
- The data sources are connected to the exclusive resource group for Data Integration. For more information, see Establish a network connection between a resource group and a data source.
- The settings that are required to prepare databases are configured. You must create an account that can be used to access the source database and an account that can be used to access the destination database. You must also grant the accounts the permissions required to perform specific operations on the databases based on your configurations for data synchronization. For more information, see Overview.
Background information
You can create a real-time synchronization node to synchronize data to only a single table. If you want to synchronize data to multiple tables, you can select one of the following solutions based on your business requirements:- If you want to filter data, replace strings, or mask data during data synchronization, you can create multiple real-time synchronization nodes and use each of the nodes to synchronize data from a single table in real time.
- If you want to synchronize data from multiple source tables to multiple destination tables, you can create multiple real-time synchronization nodes. For specific data sources, you can also create a real-time synchronization node to synchronize all incremental data from a database. For more information, see Configure a real-time synchronization node to synchronize all incremental data from a database.
- If you want to synchronize full data and then synchronize incremental data to the destination, you can select a synchronization solution based on your business requirements. For more information, see Synchronization solutions.
Go to the DataStudio page
You must go to the DataStudio page to create and configure a real-time synchronization node.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region in which the desired workspace resides. On the Workspaces page, find the workspace and click DataStudio in the Actions column.
Procedure
Step 1: Create a real-time synchronization node
- Create a workflow. For more information, see Manage workflows.
- Create a real-time synchronization node.
Step 2: Configure a resource group
You can use only exclusive resource groups for Data Integration to run real-time data synchronization nodes. You can perform the following operations to configure a resource group: Double-click the name of the created node. In the right-side navigation pane of the configuration tab of the node, click the
Basic Configuration tab. On the Basic Configuration tab, select the exclusive resource group for Data Integration that is connected to the data source from the
Resource Group drop-down list.

Note We recommend that you run a real-time synchronization node and a batch synchronization node on different resource groups. If you use the same resource group to run a real-time synchronization node and a batch synchronization node, the two nodes compete for resources and affect each other. For example, CPU resources, memory resources, and networks used by the two nodes may affect each other. In this case, the batch synchronization node may slow down, or the real-time synchronization node may be delayed. Even worse, out-of-memory (OOM) errors may occur due to insufficient resources.
Step 3: Configure the real-time synchronization node
- Configure the source.
- Optional:Configure a data conversion method.
If you want to convert data types during synchronization, you can configure a data conversion method.
- Configure the destination.
- Connect the source to the destination.
After the source and destination are added, you can connect them by drawing lines. This way, data can be synchronized between the data sources based on the configurations.
- Scenario 1: Synchronize data in real time but do not convert data types.
Note Direction of data synchronization: The source is connected to the destination. Data is synchronized from a MySQL data source to a MaxCompute data source.
- Scenario 2: Synchronize data in real time and convert data types during data synchronization.
You can drag the required data conversion component from the Conversion section to the middle of the source and destination on the canvas. Then, connect the data conversion component with the source and destination by drawing lines.
- Example 1: Data in the MySQL data source is filtered and then synchronized to the MaxCompute data source.
- Example 2: Data in the MySQL data source is masked and then synchronized to the MaxCompute data source.
- Scenario 1: Synchronize data in real time but do not convert data types.
Step 4: Commit and deploy the real-time synchronization node
- Click the
icon in the top toolbar to save the node.
- Click the
icon in the top toolbar to commit the node.
- In the Commit Node dialog box, configure the Change description parameter.
- Click Confirm.
If you use a workspace in standard mode, you must deploy the node in the production environment after you commit the node. On the left side of the top navigation bar, click Deploy. For more information, see Deploy nodes.