After you configure network environments and resources and add data sources to DataWorks, you can create and run a real-time data synchronization node to synchronize data between the data sources. This topic describes how to create a real-time data synchronization node and view the running status of the node.

Prerequisites

Before you create a real-time data synchronization node, make sure that the following operations are performed:

Limits

  • Real-time data synchronization nodes support the following data sources and data conversion methods:
    • Sources: MySQL Binlog, DataHub, LogHub, Kafka, PolarDB, and ApsaraDB RDS for SQL Server
    • Destinations: MaxCompute, Hologres, Elasticsearch, DataHub, and Kafka
    • Data conversion methods: data filtering and string replacement
  • The following rules must be observed when you configure a real-time data synchronization node:
    • You can synchronize the data from one or more source tables to a single destination table. If you want to synchronize data to multiple destination tables, you must create a real-time data synchronization node for each destination table.
    • You can synchronize data from multiple tables to a destination table only if you use MySQL Binlog or ApsaraDB RDS for SQL Server as the source. The types and schemas of the source tables must be the same. For example, all the source tables are MySQL Binlog tables.

Create a real-time data synchronization node

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. After you select the region where the required workspace resides, find the workspace and click Data Analytics.
  4. Create a workflow.
    If you have a workflow, skip this step.
    1. Move the pointer over the Create icon and select Workflow.
    2. In the Create Workflow dialog box, specify Workflow Name.
    3. Click Create.
  5. Create a real-time data synchronization node.
    1. In the Create Node dialog box, configure the parameters. Create a real-time data synchronization node to synchronize data in a single table
      Parameter Description
      Node Type The type of the node. Default value: Real-time synchronization.
      Sync Method Set this parameter to End-to-end ETL. This method is used to synchronize data from one or more source tables to a destination table in real time. You can convert data types during synchronization.
      Note
      • You can synchronize the data from one or more source tables to a single destination table. If you want to synchronize data to multiple destination tables, you must create a real-time data synchronization node for each destination table.
      • You can synchronize data from multiple tables to a destination table only if you use MySQL Binlog or ApsaraDB RDS for SQL Server as the source. The types and schemas of the source tables must be the same. For example, all the source tables are MySQL Binlog tables.
      Node Name The name of the node. The name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
      Location The directory in which the real-time data synchronization node is stored.
  6. Configure a data source (source).
    1. From the left-side Input section of the editing tab of the real-time data synchronization node, drag the required source component to the configuration canvas on the right.
    2. Click the component and configure the parameters in the panel that appears.
      The following topics describe the data sources that can be used as the sources for real-time single-table synchronization and the configurations required for these data sources:
  7. Optional:Configure a data conversion method.
    If you want to convert data types during synchronization, you can configure a data conversion method.
    1. From the left-side Conversion section of the editing tab of the real-time data synchronization node, drag the required data conversion method component to the configuration canvas on the right.
    2. Click the component and configure the parameters in the panel that appears.
      The following topics describe the supported data conversion methods and the configurations required for these methods:
  8. Configure a data source (destination).
    1. From the left-side Output section of the editing tab of the real-time synchronization node, drag the required destination component to the configuration canvas on the right.
    2. Click the component and configure the parameters in the panel that appears.
      The following topics describe the data sources that can be used as the destinations for real-time single-table synchronization and the configurations required for these data sources:
  9. Connect the source component to the destination component.
    After the source and destination components are added, you can connect the components by drawing lines. This way, the components can synchronize data based on the connection.
    • Example 1: The following figure shows the process during which data is synchronized from MySQL Binlog to MaxCompute in real time. During the synchronization, data types are not converted.
      • Source: MySQL Binlog, which is the source component.
      • Destination: MaxCompute, which is the destination component.
      • Direction of data synchronization: The source component is connected to the destination component. Data is synchronized from MySQL Binlog to MaxCompute.
    • Example 2: The following figure shows the process during which data is synchronized from MySQL Binlog to MaxCompute in real time. During the synchronization, data types are converted.
      • Source: MySQL Binlog, which is the source component.
      • Data conversion method: data filtering. The data in the source is filtered by using this method.
      • Destination: MaxCompute, which is the destination component.
      • Direction of data synchronization: The source component is connected to the data filtering component, and the data filtering component is connected to the destination component. Data read from MySQL Binlog is filtered. Then, the filtered data is synchronized to MaxCompute.

Start the real-time data synchronization node

Start the real-time data synchronization node.
  1. Go back to the previous page and click Start in the Operation column that corresponds to your desired node.
  2. In the Start dialog box, configure the parameters.Start the real-time data synchronization node
    Parameter Description
    Whether to reset the site Specifies whether to set the time point for the next startup. If this parameter is selected, the Start time point and Time zone parameters are required.
    Start time point The date and time for starting the real-time data synchronization node.
    Time zone The time zone in which the real-time data synchronization node is run. You can select a time zone from the Time Zone drop-down list.
    Failover The maximum number of failovers allowed within the specified time range.
    Note If this parameter is not specified, the system automatically stops the node if the number of failovers exceeds 100 within 5 minutes. This avoids excessive resource consumption caused by the frequent starting of the node.
    Dirty data policy
    • Zero tolerance, not allowed: The real-time data synchronization node is automatically stopped if the node contains dirty data.
    • No limit: The real-time data synchronization node can normally run regardless of whether the node contains dirty data.
    • Limited control: The real-time data synchronization node is automatically stopped if the amount of dirty data contained in the node exceeds a specified value.
  3. Click Confirm.