Data Integration allows you to synchronize data in a single DataHub topic, Hologres table, Kafka topic, or LogHub Logstore to Hologres in real time. If you run a real-time synchronization task to synchronize data in a single Hologres table to another Hologres data source, the task creates a destination table in the Hologres data source based on the schema of the source Hologres table and synchronizes data from the source table to the destination table. This topic describes how to create a real-time ETL synchronization task to synchronize data in a single Hologres table to another Hologres table.
Limits
The version of your Hologres instance must be later than V2.1.
Incremental synchronization of data from a Hologres partitioned table is not supported.
Messages for DDL changes on a Hologres table cannot be synchronized.
Incremental data of the following data types can be synchronized from Hologres:
INTEGER, BIGINT, TEXT, CHAR(n), VARCHAR(n), REAL, JSON, SERIAL, OID, INT4[], INT8[], FLOAT8[], BOOLEAN[], and TEXT[].
Prerequisites
A serverless resource group is purchased.
Two Hologres data sources are added to DataWorks. One is used as the source, and the other is used as the destination. For more information, see Add and manage data sources in Data Integration.
Network connections between the serverless resource group and the data sources are established. For more information, see Network connectivity solutions.
To synchronize data in a single Hologres table to another Hologres table in real time, you must enable the binary logging feature for the Hologres table of the source Hologres database. For more information, see Subscribe to Hologres binary logs.
Procedure
1. Select a synchronization type
Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane of the Data Integration page, click Synchronization Task. In the upper part of the Synchronization Task page, select a source type from the Source drop-down list and a destination type from the Destination drop-down list, and click Create Synchronization Task. On the page that appears, configure basic information for the synchronization task.
Source And Destination: Select
Hologresfor both the source and destination types.New Node Name: Specify a name for the synchronization task based on your business requirements.
Synchronization Method: Select
Single table real-time synchronization.Synchronization Mode: Select
Full initialization.
2. Configure network settings and a resource group
In the Network and Resource Configuration section, select a resource group that you want to use to run the synchronization task, and configure the Task Resource Usage parameter.
Separately select the added
Hologresdata sources as the source in the Source section and as the destination in the Destination section, and click Test Connectivity.
If the network connectivity test is successful, click Next.
3. Configure the data synchronization link
1. Configure the source
In the wizard of the upper part of the configuration page, click Hologres to configure the source.

In the Holo source information section, configure the Schema and Table parameters.
Click Data Sampling in the upper-right corner of the Holo source information section.
In the Preview Data Output dialog box, configure the Sampled Data Records parameters and click Start Collection. The system samples data from the Hologres table that you specified. You can preview the data in the Hologres table. The data in the Hologres table is used as input data for data preview and visualization configurations of a data processing node.
2. Configure a data processing node
You can click the
icon to add data processing methods. The following data processing methods are supported: Data Masking, Replace String, Data filtering, JSON Parsing, and Edit Field and Assign Value. You can arrange the data processing methods based on your business requirements. When the synchronization task is run, data is processed based on the processing order that you specify.

After you configure a data processing node, you can click Preview Data Output in the upper-right corner of the section. In the Preview Data Output dialog box, you can click Re-obtain Output of Ancestor Node to enable the data processing node to process the data that is sampled from the specified Hologres table and preview the processing result.
Before you preview the result generated after the input data is processed by a data processing node, you must configure data sampling settings for the Kafka data source.
3. Configure the destination
In the wizard of the upper part of the configuration page, click Hologres to configure the destination.

In the Destination Information section, configure the Schema and Destination Table parameters. The valid values of the Destination Table parameter are Create tables automatically and Use Existing Table.
If you set the Destination Table parameter to Create tables automatically, the system automatically creates a table that has the same name as the source table in the destination. You can manually change the name of the created table.
If you set the Destination Table parameter to Use Existing Table, you can select a table from the Table Name drop-down list.
(Optional) Modify the schema of a destination table.
If you select Create tables automatically for the Destination Table parameter, click Edit Table Schema. In the dialog box that appears, edit the schema of the destination table that will be automatically created. You can also click Re-generate Table Schema Based on Output Column of Ancestor Node to re-generate a schema based on the output columns of an ancestor node. You can select a column from the generated schema and configure the column as the primary key.
NoteThe destination table must have a primary key. Otherwise, the configurations cannot be saved.
Configure the Job Type and Write Conflict Policy parameters.
Valid values of the Job Type parameter:
Replay (Replay Operation Log to Restore Data): indicates that the same operation is performed on the destination as that performed on the source. For example, if the INSERT, UPDATE, or DELETE operation is performed on the source, the same operation is also performed on the destination.
Insert (Archived Storage): indicates that the destination is used as a streaming data storage, and all data that is synchronized from the source is inserted into the destination.
Write Conflict Policy: The processing policy that is used when a data writing conflict occurs. Valid values: Cover (Overwrite) and Ignore (Ignore).
Configure mappings between fields in the source and fields in the destination.
After you complete the preceding configuration, the system automatically establishes mappings between fields in the source and fields in the destination based on the same-name mapping principle. You can modify the mappings based on your business requirements. One field in the source can map to multiple fields in the destination. Multiple fields in the source cannot map to the same field in the destination. If a field in the source has no mapped field in the destination, data in the field in the source is not synchronized to the destination.
4. Configure alert rules
To prevent the failure of the synchronization task from causing latency on business data synchronization, you can configure different alert rules for the synchronization task.
In the upper-right corner of the page, click Configure Alert Rule to go to the Configure Alert Rule panel.
In the Configure Alert Rule panel, click Add Alert Rule. In the Add Alert Rule dialog box, configure the parameters to configure an alert rule.
NoteThe alert rules that you configure in this step take effect for the real-time synchronization subtask that will be generated by the synchronization task. After the configuration of the synchronization task is complete, you can refer to Manage real-time synchronization tasks to go to the Real-time Synchronization Task page and modify alert rules configured for the real-time synchronization subtask.
Manage alert rules.
You can enable or disable alert rules that are created. You can also specify different alert recipients based on the severity levels of alerts.
5. Configure advanced parameters
DataWorks allows you to modify the configurations of specific parameters. You can change the values of these parameters based on your business requirements.
To prevent unexpected errors or data quality issues, we recommend that you understand the meanings of the parameters before you change the values of the parameters.
In the upper-right corner of the configuration page, click Configure Advanced Parameters.
In the Configure Advanced Parameters panel, change the values of the desired parameters.
6. Configure resource groups
You can click Configure Resource Group in the upper-right corner of the page to view and change the resource groups that are used to run the current synchronization task.
7. Perform a test on the synchronization task
After the preceding configuration is complete, you can click Perform Simulated Running in the upper-right corner of the configuration page to enable the synchronization task to synchronize the sampled data to the destination table. You can view the synchronization result in the destination table. If specific configurations of the synchronization task are invalid, an exception occurs during the test run, or dirty data is generated, the system reports an error in real time. This can help you check the configurations of the synchronization task and determine whether expected results can be obtained at the earliest opportunity.
In the dialog box that appears, configure the parameters for data sampling from the specified table, including the Start At and Sampled Data Records parameters.
Click Start Collection to enable the synchronization task to sample data from the source.
Click Preview to enable the synchronization task to synchronize the sampled data to the destination.
8. Run the synchronization task
After the configuration of the synchronization task is complete, click Complete in the lower part of the page.
In the Tasks section of the Synchronization Task page, find the created synchronization task and click Start in the Operation column.
Click the name or ID of the synchronization task in the Tasks section and view the detailed running process of the synchronization task.
Perform O&M operations on the synchronization task
View the running status of the synchronization task
After the synchronization task is created, you can go to the Synchronization Task page to view all synchronization tasks that are created in the workspace and the basic information of each synchronization task.

You can click Start or Stop in the Operation column to start or stop the synchronization task. You can also click More in the Actions column and select Edit or View to modify the synchronization task or view information about the synchronization task.
You can view the basic running information of the synchronization task in the Execution Overview column. You can also click different sections on the execution details page of the synchronization task to view the related information.

The synchronization task is divided into three stages:
Schema Migration: This tab displays information such as whether the destination table is an automatically created table or an existing table. For an automatically created table, the DDL statement that is used to create the table is displayed.
Full Data Initialization: If you set the Synchronization Mode parameter to Full initialization when you configure the synchronization task, the progress of full synchronization is displayed in this section.
Real-time Data Synchronization: This tab displays statistics about real-time synchronization, including real-time read and write traffic, dirty data information, failovers, and operation logs.
Rerun the synchronization task
In some special cases, if you want to modify the fields to synchronize, the fields in a destination table, or table name information, you can also click Rerun in the Operation column of the desired synchronization task. This way, the system synchronizes the changes that are made to the destination. Data in the tables that are already synchronized and are not modified will not be synchronized again.
Directly click Rerun without modifying the configurations of the synchronization task to enable the system to rerun the synchronization task.
Modify the configurations of the synchronization task and then click Complete. Click Apply Updates that is displayed in the Operation column of the synchronization task to rerun the synchronization task for the latest configurations to take effect.