After you prepare data sources, network environments, and resources, you can create a real-time data synchronization node to synchronize data to DataHub. This topic describes how to create a real-time data synchronization node and view the running status of the node.

Prerequisites

Before you create a data synchronization node, make sure that the following operations are performed:

Limits

  • Real-time data synchronization nodes can be used to synchronize data only from PolarDB, ApsaraDB for OceanBase, or MySQL to DataHub.

Create a real-time data synchronization node

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. After you select the region where the required workspace resides, find the workspace and click Data Analytics.
  4. Create a real-time data synchronization node.
    1. In the Create Node dialog box, configure the parameters.Create a real-time data synchronization node (synchronize data to DataHub)
      Parameter Description
      Node Type The type of the node. Default value: Real-time synchronization.
      Sync Method Set the value to Migration to Datahub. In this case, partial or all tables in your desired database are migrated to DataHub.
      Node Name The name of the node. The name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
      Location The directory in which the real-time data synchronization node is stored.
  5. Select a data source as the source and configure synchronization rules.
    1. In the Data source section, specify the Type and Data source parameters.
      Note You can set Type only to MySQL, ApsaraDB for OceanBase, or PolarDB.
    2. In the Select the source table for synchronization section, select the tables whose data you want to synchronize from the Source Table list, and click the Icon icon to move the tables to the Selected Source table list.Select the source table for synchronization.
      The Source Table list displays all the tables in the source. You can select partial or all tables.
    3. In the Conversion Rule for Table Name section, click Add rule to select a rule.
      Supported options include Conversion Rule for Table Name and Rule for Destination Topic.
      • Conversion Rule for Table Name: the rule for converting the names of source tables to those of destination topics.
      • Rule for Destination Topic: the rule for adding prefixes and suffixes to destination topics.
    4. Click Next Step.
  6. Select a data source as the destination and configure the formats for the destination topics.
    1. In the Set Destination Topic step, specify Target DataHub data source, Datahub write mode, and Sharding Strategy.
      If you want to synchronize a source table that does not contain a primary key, you can select Source tables without primary keys can be synchronized..
    2. Click Refresh source table and DataHub Topic mapping to configure the mappings between the source tables and destination DataHub topics.
    3. View the mapping progress, source tables, and mapped destination topics.View the mapping progress, source tables, and mapped destination topics
      Serial number Description
      1
      2
      • If the tables in the source database contain primary keys, the system removes duplicate data based on the primary keys during the synchronization.
      • If the tables in the source database do not contain primary keys, you must perform the following operations:
        • If you select Source tables without primary keys can be synchronized. in the Set Destination Topic step, tables without primary keys can be synchronized. You can click the Edit icon to customize primary keys. You can use one field or a combination of several fields that are not primary keys to replace primary keys. This way, the system removes duplicate data based on the primary keys during the synchronization.
        • If you do not select Source tables without primary keys can be synchronized. in the Set Destination Topic step, errors occur when you synchronize tables without primary keys. In this case, you must delete these tables before the data synchronization can continue.
      3 The method used to create a destination topic. The message that appears in the DataHub Topic column varies based on the method that you select.
      • If you set Topic creation method to Create Topic, the name of the topic that is automatically created appears in the DataHub Topic column. You can click the topic name to modify the topic information.
      • If you set Topic creation method to Use existing Topic, you must select a topic from the drop-down list in the DataHub Topic column.
    4. Click Next Step.
      If you set Topic creation method to Create Topic, you must click Start table building in the Create tables automatically dialog box to create DataHub topics.

Start the real-time data synchronization node

Start the real-time data synchronization node.
  1. Go back to the previous page and click Start in the Operation column that corresponds to your desired node.
  2. In the Start dialog box, configure the parameters.Start the real-time data synchronization node
    Parameter Description
    Whether to reset the site Specifies whether to set the time point for the next startup. If this parameter is selected, the Start time point and Time zone parameters are required.
    Start time point The date and time for starting the real-time data synchronization node.
    Time zone The time zone in which the real-time data synchronization node is run. You can select a time zone from the Time Zone drop-down list.
    Failover The maximum number of failovers allowed within the specified time range.
    Note If this parameter is not specified, the system automatically stops the node if the number of failovers exceeds 100 within 5 minutes. This avoids excessive resource consumption caused by the frequent starting of the node.
    Dirty data policy
    • Zero tolerance, not allowed: The real-time data synchronization node is automatically stopped if the node contains dirty data.
    • No limit: The real-time data synchronization node can normally run regardless of whether the node contains dirty data.
    • Limited control: The real-time data synchronization node is automatically stopped if the amount of dirty data contained in the node exceeds a specified value.
  3. Click Confirm.