After you prepare data sources, network environments, and resources, you can create a real-time synchronization node to synchronize data to DataHub. This topic describes how to create a real-time synchronization node and view the status of the node.

Prerequisites

  1. The data sources that you want to use are prepared. Before you configure a data synchronization node, you must prepare the data sources from which you want to read data and to which you want to write data. This way, when you configure a data synchronization node, you can select the data sources. For information about the data source types, readers, and writers that are supported by real-time synchronization, see Data source types that support real-time synchronization.
    Note For information about the items that you need to understand before you prepare a data source, see Overview.
  2. An exclusive resource group for Data Integration that meets your business requirements is purchased. For more information, see Create and use an exclusive resource group for Data Integration.
  3. Network connections are established between the exclusive resource group for Data Integration and the data sources. For more information, see Establish a network connection between a resource group and a data source.
  4. The data source environments are prepared. You must create an account that can be used to access a database in the source and an account that can be used to access a database in the destination. You must also grant the accounts the permissions required to perform specific operations on the databases based on your configurations for data synchronization. For more information, see Overview.

Precautions

  • You can use only exclusive resource groups for Data Integration to run real-time synchronization nodes.

  • You can use a real-time synchronization node to synchronize data to a DataHub data source only from a PolarDB, OceanBase, MySQL, or Oracle data source.

Usage notes

For information about support of different topic types for synchronization of data changes generated by operations on a source table, sharding strategies for different topic types, data formats, and sample messages, see Appendix: DataHub message formats.

Create a real-time synchronization node

  1. Create a real-time synchronization node to synchronize all data in a database.
  2. Configure an exclusive resource group for Data Integration.
  3. Configure the source and mapping rules.
    1. In the Data Source section of the Configure Source and Synchronization Rules step, configure the Type and Data source parameters.
    2. Select the tables from which you want to read data.
      In the Source Table section, all tables in the selected data source are displayed in the Source Table list. You can select all or some tables from the Source Table list and click the Icon icon to move the tables to the Selected Source Table list.
      Important If a selected table does not have a primary key, the table cannot be synchronized in real time.
    3. In the Conversion Rule for Table Name section, click Add Rule, select a mapping rule type, and then configure a mapping rule of the selected type.
      By default, data in a source table is written to a DataHub topic that has the same name as the source table. You can specify a destination topic name in a mapping rule to write data in multiple source tables to the same DataHub topic. You can also specify prefixes in a mapping rule to write data in source tables whose names start with a specified prefix to DataHub topics whose names start with another specified prefix. Data Integration allows you to use a regular expression to configure a mapping rule to specify the names of the destination topics to which you want to write data. You can also concatenate built-in variables to specify the names of the destination topics. For more information about the configuration logic, see Configure the source and mapping rules.
  4. Configure the destination topics.
    1. In the Set Destination Topic step, configure the Destination, DataHub write mode, and Sharding Strategy parameters.
      If you want to synchronize source tables that do not have primary keys, you can select Source tables without primary keys can be synchronized.
  5. Select a data source as the destination and configure formats for the destination topics.
    1. In the Set Destination Topic step, configure the Destination, DataHub write mode, and Sharding Strategy parameters.
      If you want to synchronize source tables that do not have primary keys, you can select Source tables without primary keys can be synchronized.
    2. Refresh mappings between source tables and destination DataHub topics.
      Click Refresh source table and DataHub Topic mapping to map the source tables and destination DataHub topics based on the mapping rules that you configured in the Conversion Rule for Table Name section. If no mapping rule is configured in the Conversion Rule for Table Name section, data in the source tables is written to the DataHub topics that have the same names as the source tables. If no such destination DataHub topic exists in the destination, the system automatically creates the topics in the destination. You can modify the topic generation method and add additional fields to the destination DataHub topics.
      OperationDescription
      Synchronize a source table that does not have a primary keyIf Source tables without primary keys can be synchronized is not selected, errors may occur when you synchronize data from source tables that do not have primary keys. If you want to synchronize data from a source table that does not have a primary key, you must click the Edit icon in the Synchronized Primary Key column of the source table to specify a primary key for the source table.
      Select a topic generation methodYou can select Create Topic or Use Existing Topic from the drop-down list in the Topic creation method column.
      • If you select Use Existing Topic, you can select a destination topic from the drop-down list in the DataHub Topic column.
      • If you select Create Topic, the name of the topic that is automatically created appears in the DataHub Topic column.
      Add additional fields to a destination DataHub topic and assign values to the fieldsYou can click Edit additional fields in the Actions column of a destination DataHub topic to add additional fields to the topic and assign values to the fields. You can manually assign constants and variables to the additional fields as values.
      Note You can add additional fields to a destination DataHub topic only if you select Create Topic from the drop-down list in the Topic creation method column of the topic.
      Edit the destination topic schemaBy default, the lifecycle of DataHub topics that are created by using the Create Topic method is seven days and field type conversion may occur. For example, if the data types of the fields in a destination topic are different from the data types of the fields in a source table, the data synchronization solution converts the data types of the fields in the source table to the data types that are supported by the destination topic. You can click the name of a destination DataHub topic in the DataHub Topic column to modify the lifecycle or field types of the topic.
      Note You can edit the schema of a destination DataHub topic only if you select Create Topic from the drop-down list in the Topic creation method column of the topic.
    3. Click Next.
      If you select Create Topic from the drop-down list in the Topic creation method column of a destination DataHub topic, you must click Start table building in the Create Table dialog box to create destination DataHub topics.
  6. Configure the resources required to run the data synchronization node.
    1. In the Configure Resources step, configure the parameters.
      ParameterDescription
      Maximum number of connections supported by source readThe maximum number of Java Database Connectivity (JDBC) connections that are allowed for the source. Configure this parameter based on the resources of the source database. Default value: 15.
      Maximum number of parallel threads allowed to read by destinationThe maximum number of parallel threads that the synchronization node uses to read data from the source table or write data to the destination. Maximum value: 32. Specify an appropriate number based on the specifications of the exclusive resource group for Data Integration and the data write capabilities of the destination.
    2. Click Complete Configuration.

Commit and deploy the real-time synchronization node

  1. Click the Save icon in the top toolbar to save the node.
  2. Click the Submit icon in the top toolbar to commit the node.
  3. In the Commit Node dialog box, configure the Change description parameter.
  4. Click Confirm.
    If you use a workspace in standard mode, you must deploy the node in the production environment after you commit the node. On the left side of the top navigation bar, click Deploy. For more information, see Deploy nodes.

What to do next

After the real-time synchronization node is configured, you can start and manage the node on the Real Time DI page in Operation Center. To go to the Real Time DI page, perform the following operations: Log on to the DataWorks console and go to the Operation Center page. In the left-side navigation pane of the Operation Center page, choose RealTime Task > RealTime DI. For more information, see O&M for real-time synchronization nodes.