After you configure network environments and resources and add data sources to DataWorks, you can create and run a real-time synchronization node to synchronize data between the data sources. This topic describes how to create a real-time synchronization node to synchronize incremental data from a single table and view the status of the node.

Prerequisites

  1. The required data sources are configured. Before you configure a real-time synchronization node, you must configure the data sources that you want to use. This way, you can select the data sources when you configure the real-time synchronization node. For more information about the data source types, readers, and writers that are supported by real-time synchronization, see Data source types that support real-time synchronization.
    Note For more information about the configurations of data sources, see Overview.
  2. An exclusive resource group for Data Integration that meets your business requirements is purchased. For more information, see Create and use an exclusive resource group for Data Integration.
  3. The data sources are connected to the exclusive resource group for Data Integration. For more information, see Establish a network connection between a resource group and a data source.
  4. The settings that are required to prepare databases are configured. You must create an account that can be used to access the source database and an account that can be used to access the destination database. You must also grant the accounts the permissions required to perform specific operations on the databases based on your configurations for data synchronization. For more information, see Overview.

Background information

You can create a real-time synchronization node to synchronize data to only a single table. If you want to synchronize data to multiple tables, you can select one of the following solutions based on your business requirements:
  • If you want to filter data, replace strings, or mask data during data synchronization, you can create multiple real-time synchronization nodes and use each of the nodes to synchronize data from a single table in real time.
  • If you want to synchronize data from multiple source tables to multiple destination tables, you can create multiple real-time synchronization nodes. For specific data sources, you can also create a real-time synchronization node to synchronize all incremental data from a database. For more information, see Configure a real-time synchronization node to synchronize all incremental data from a database.
  • If you want to synchronize full data and then synchronize incremental data to the destination, you can select a synchronization solution based on your business requirements. For more information, see Synchronization solutions.

Go to the DataStudio page

You must go to the DataStudio page to create and configure a real-time synchronization node.

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. In the top navigation bar, select the region in which the desired workspace resides. On the Workspaces page, find the workspace and click DataStudio in the Actions column.

Procedure

  1. Step 1: Create a real-time synchronization node
  2. Step 2: Configure a resource group
  3. Step 3: Configure the real-time synchronization node
  4. Step 4: Commit and deploy the real-time synchronization node

Step 1: Create a real-time synchronization node

  1. Create a workflow. For more information, see Manage workflows.
  2. Create a real-time synchronization node.
    1. You can use one of the following methods to create a real-time synchronization node:
      • Method 1: In the Scheduled Workflow pane of the DataStudio page, find the desired workflow in the Business Flow section and click the name of the workflow. Then, right-click Data Integration and choose Create Node > Real-time synchronization.
      • Method 2: In the Scheduled Workflow pane of the DataStudio page, find the desired workflow in the Business Flow section and double-click the name of the workflow. In the Data Integration section of the workflow configuration tab that appears, drag Real-time synchronization to the canvas on the right.
      Real-time synchronization
    2. In the Create Node dialog box, select End-to-end ETL from the Sync Method drop-down list and configure other parameters based on your business requirements.

Step 2: Configure a resource group

You can use only exclusive resource groups for Data Integration to run real-time data synchronization nodes. You can perform the following operations to configure a resource group: Double-click the name of the created node. In the right-side navigation pane of the configuration tab of the node, click the Basic Configuration tab. On the Basic Configuration tab, select the exclusive resource group for Data Integration that is connected to the data source from the Resource Group drop-down list. Resource group
Note We recommend that you run a real-time synchronization node and a batch synchronization node on different resource groups. If you use the same resource group to run a real-time synchronization node and a batch synchronization node, the two nodes compete for resources and affect each other. For example, CPU resources, memory resources, and networks used by the two nodes may affect each other. In this case, the batch synchronization node may slow down, or the real-time synchronization node may be delayed. Even worse, out-of-memory (OOM) errors may occur due to insufficient resources.

Step 3: Configure the real-time synchronization node

  1. Configure the source.
    1. In the Input section of the configuration tab of the real-time synchronization node, drag the source to the canvas on the right.
    2. Click the source and configure the parameters in the panel that appears.
      The following topics describe the data sources that can be used as the sources for real-time single-table synchronization and the configurations required for these data sources:
  2. Optional:Configure a data conversion method.
    If you want to convert data types during synchronization, you can configure a data conversion method.
    1. In the Conversion section of the configuration tab of the real-time data synchronization node, drag the required data conversion component to the canvas on the right.
    2. Click the component and configure the parameters in the panel that appears.
      The following topics describe the supported data conversion methods and the configurations required for these methods:
  3. Configure the destination.
    1. In the Output section of the configuration tab of the real-time synchronization node, drag the destination to the canvas on the right.
    2. Click the destination and configure the parameters in the panel that appears.
      The following topics describe the data sources that can be used as the destinations for real-time single-table synchronization and the configurations required for these data sources:
  4. Connect the source to the destination.
    After the source and destination are added, you can connect them by drawing lines. This way, data can be synchronized between the data sources based on the configurations.
    • Scenario 1: Synchronize data in real time but do not convert data types.
      Note Direction of data synchronization: The source is connected to the destination. Data is synchronized from a MySQL data source to a MaxCompute data source.
    • Scenario 2: Synchronize data in real time and convert data types during data synchronization.
      You can drag the required data conversion component from the Conversion section to the middle of the source and destination on the canvas. Then, connect the data conversion component with the source and destination by drawing lines.
      • Example 1: Data in the MySQL data source is filtered and then synchronized to the MaxCompute data source.
      • Example 2: Data in the MySQL data source is masked and then synchronized to the MaxCompute data source.

Step 4: Commit and deploy the real-time synchronization node

  1. Click the Save icon in the top toolbar to save the node.
  2. Click the Submit icon in the top toolbar to commit the node.
  3. In the Commit Node dialog box, configure the Change description parameter.
  4. Click Confirm.
    If you use a workspace in standard mode, you must deploy the node in the production environment after you commit the node. On the left side of the top navigation bar, click Deploy. For more information, see Deploy nodes.

What to do next

After the real-time synchronization node is configured, you can start and manage the node on the Real Time DI page in Operation Center. To go to the Real Time DI page, perform the following operations: Log on to the DataWorks console and go to the Operation Center page. In the left-side navigation pane of the Operation Center page, choose RealTime Task > RealTime DI. For more information, see Operations for real-time synchronization nodes.