After you configure data sources, network environments, and resource groups, you can
create and run a real-time sync solution to synchronize all data in a database. This
topic describes how to create a real-time sync solution to synchronize data in some
or all tables in a database to DataHub in batch mode and then synchronize incremental
data in the database to DataHub in real time. This topic also describes how to view
the statuses of the nodes generated by the real-time synchronization solution.
Prerequisites
Before you create a data sync solution, make sure that the following operations are
performed:
Background information
DataWorks provides a sync solution that can be used to synchronize all data in a database
to DataHub in real time. The synchronization solution synchronizes all data in the
database to DataHub in batch mode and then synchronizes incremental data in the database
to DataHub in real time. You can view the details of the sync solution, the statuses
of the nodes generated by the solution, and data updates in the database in real time.
This facilitates subsequent data searches, analysis, and development.
Real-time sync solutions that are used to synchronize all data in a database provide
the following benefits:
- Synchronizes the full data of a database.
You do not need to create multiple batch data synchronization nodes to synchronize
source tables one by one. You can directly create a batch synchronization solution
to synchronize some or all of the tables in a database at a time.
- You can configure synchronization rules in a flexible manner.
- You can configure synchronization rules for different data definition language (DDL)
messages based on your business requirements. For example, if you select Ignore for a DDL message that is specified in the source and used to drop a table in the
destination, the system ignores the message and does not drop the table in the destination
when the system receives the message.
- You can add or remove source tables for a sync solution that is running.
- You can configure synchronization rules for destination DataHub topics to determine
whether to synchronize the incremental data in source tables to destination DataHub
topics based on your business requirements. After the incremental data is synchronized,
the incremental data can be searched in destination DataHub topics.
- Requires only simple configurations.
You do not need to perform complex operations, such as creating synchronization nodes,
databases, and tables, configuring dependencies for nodes, and configure mappings
between sources and destinations. Instead, you need only to configure a batch synchronization
solution in a configuration wizard.
- Large amounts of data can be updated in real time. This improves the efficiency of
automated O&M.
Scenarios
If you want the system to monitor data updates in business databases in real time,
you can use real-time sync solutions to synchronize all data in the databases. This
way, upper-layer applications can search for, analyze, and develop data in real time.
Create a real-time sync solution to synchronize all data in a database
- Go to the Data Integration page and choose to go to the Task list page.
- On the Task list page, click New task in the upper-right corner.
- On the Create Data Synchronization Solution page, click One-click real-time synchronization to DataHub.
- In the Set synchronization sources and rules step, configure basic information such
as the solution name for the data synchronization solution.
In the
Basic configuration section, configure the parameters.

Parameter |
Description |
Scheme name |
The name of the data synchronization solution. The name can be a maximum of 50 characters
in length.
|
Description |
The description of the data synchronization solution. The description can be a maximum
of 50 characters in length.
|
Destination task storage location |
The Automatically establish workflow check box is selected by default. This indicates
that DataWorks automatically creates a workflow named in the format of clone_database_Source data source name+to+Destination data source name in the Data Integration directory. All synchronization nodes generated by the data synchronization solution
are placed in the directory of this workflow.
If you clear the Automatically establish workflow check box, select a directory from the Select Location drop-down list. All synchronization nodes generated by the data synchronization solution
are placed in the specified directory.
|
- Select a source and configure synchronization rules.
- In the Data Source section, specify the Type and Data source parameters.
Note
A real-time sync solution that is used to synchronize all data in a database can synchronize
data only from MySQL, PolarDB, or Oracle to DataHub.
- In the Source Table section, select the tables whose data you want to synchronize from the Source Table list. Then, click the
icon to move the tables to the Selected Source Table list. 
The Source Table list displays all tables in the selected source. You can choose to
synchronize data in some or all tables in the source.
- In the Conversion Rule for Table Name section, click Add rule to select a rule.
Supported options include
Conversion Rule for Table Name and
Rule for Destination Topic.
- Conversion Rule for Table Name: the rule for converting the names of source tables to those of destination topics.
- Rule for Destination Topic: the rule for adding prefixes and suffixes to destination topics.
- Click Next Step.
- Select a data source as the destination and configure the destination topics.
- In the Set Destination Topic step, select the destination DataHub data source.
- Click Refresh source table and DataHub Topic mapping to configure the mappings between the source tables and destination DataHub topics.
- View the mapping progress, source tables, and mapped destination DataHub topics.

No. |
Description |
① |
The progress of mapping the source tables to destination DataHub topics.
Note The mapping may require a long period of time if you synchronize data from a large
number of tables.
|
② |
- If the tables in the source database contain primary keys, the system removes duplicate
data based on the primary keys during the synchronization.
- If the tables in the source database do not contain primary keys, you can click the
icon to customize primary keys. You can use one field or a combination of several
fields as the primary keys of the tables. This way, the system removes duplicate data
based on the primary keys during the synchronization.
|
③ |
The methods of creating the destination DataHub topics.
- If you set Topic creation method to Create Topic for a destination DataHub topic, the DataHub topic is automatically created. The
name of the DataHub topic is displayed in the DataHub Topic column. You can click the name of the DataHub topic to modify the configurations
of the topic.
- If you set Topic creation method to Use Existing Topic for a destination DataHub topic, select the topic that you want to use from the drop-down
list in the DataHub Topic column.
|
If you set
Topic creation method to
Create Topic for a destination DataHub topic, you can click the name of the DataHub topic to modify
the configurations of the topic based on your business requirements.

- Create Topic in Production Environment: indicates whether to create the topic in the production environment. This option
is displayed for a DataWorks workspace in standard mode and is selected by default.
- Life cycle: the lifecycle of the topic. Unit: days. Default value: 7.
- Data field structure: the fields and their data types in the topic.
Note If you do not change the values of the parameters related to a topic after the topic
is created, the system synchronizes data based on the default values of the parameters.
- Click Next.
- Configure the resources required by the data sync solution.
In the
Set Resources for Solution Running step, set the parameters.

- Offline Full synchronization
Parameter |
Description |
Offline task name rules |
The name of the batch sync node that is used to synchronize the full data of the source.
After a data sync solution is created, DataWorks first generates a batch sync node
to synchronize full data, and then generates real-time sync nodes to synchronize incremental
data.
|
Resource Groups for Full Batch Sync Nodes |
The exclusive resource group for Data Integration that is used to run the batch sync
node.
Only exclusive resource groups for Data Integration can be used to run sync solutions.
You can set this parameter to the name of the exclusive resource group for Data Integration
that you purchased. For more information, see Plan and configure resources.
Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
|
- Full Batch Scheduling
Parameter |
Description |
Select scheduling Resource Group |
The resource group for scheduling that is used to run the nodes.
Only exclusive resource groups for Data Integration can be used to run sync solutions.
You can set this parameter to the name of the exclusive resource group for Data Integration
that you purchased. For more information, see Plan and configure resources.
Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
|
- Real-time Incremental synchronization
Parameter |
Description |
Select an exclusive resource group for real-time tasks |
The exclusive resource group that is used to run the real-time sync nodes.
Only exclusive resource groups for Data Integration can be used to run solutions.
You can set this parameter to the name of the exclusive resource group for Data Integration
that you purchased. For more information, see Plan and configure resources.
Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
|
- Channel Settings
Parameter |
Description |
Maximum number of connections supported by source read |
The maximum number of Java Database Connectivity (JDBC) connections that are allowed
for the source. Specify an appropriate number based on the resources of the source.
Default value: 15.
|
- Click Complete Configuration. The real-time sync solution used to synchronize all data in a database is created.
Run the real-time sync solution
On the Tasks page, find the newly created data sync solution and choose More > Submit and Run in the Operation column to run the data sync solution.
View the statuses and running results of the sync nodes
- On the Tasks page, find the solution that is run and click Execution details in the Operation column. Then, you can view the execution details of all nodes generated
by the sync solution.

- Find a node whose execution details you want to view and click Execution details in the Status column. Then, you can click the link provided in the dialog box that
appears to go to the DataStudio page.
Manage the real-time sync solution
- View the configurations of the sync solution.
- Modify the sync solution.
On the Tasks page, find the newly created sync solution and choose . Then, you can modify the configurations of the sync solution.
For a sync solution that is successfully run, you can choose to add or remove source tables. Procedure:
In the Source Table section of the Set Synchronization Sources and Rules step, add or remove source tables for the sync solution. Then, save the modification
and run the sync solution.
- Change the priority for the batch synchronization solution
Find the newly created batch synchronization solution and choose in the Operation column. In the
Change Priority dialog box, enter the desired priority and click
Confirm. You can set the priority to an integer from
1 to
8. A larger value indicates a higher priority.
Note If multiple batch synchronization solutions have the same priority, the system runs
them based on the order they are committed.
- Delete the batch synchronization solution.
Find the batch synchronization solution that you want to delete and choose in the Operation column. In the Delete message, click
OK.
Note After you click OK, only the configuration record of the batch synchronization solution
is deleted. The synchronization nodes generated by the solution and data tables generated
by the synchronization nodes are not affected.