Data Integration supports real-time synchronization of data from a single table in data sources such as DataHub and Hologres to Kafka. A real-time ETL synchronization task initializes a topic in Kafka based on the schema of the source Hologres table and synchronizes data from the Hologres table to Kafka in real time for consumption. This topic describes how to configure real-time synchronization from a single Hologres table to Kafka.
Limits
The version of the Kafka data source must range from 0.10.2 to 3.6.0.
The version of the Hologres data source must be V2.1 or later.
Incremental synchronization of data from a Hologres partitioned table is not supported.
Messages for DDL changes on a Hologres table cannot be synchronized.
Incremental data of the following data types can be synchronized from Hologres: INTEGER, BIGINT, TEXT, CHAR(n), VARCHAR(n), REAL, JSON, SERIAL, OID, INT4[], INT8[], FLOAT8[], BOOLEAN[], TEXT[], and JSONB.
You must enable binary logging for the Hologres table in the source Hologres database. For more information, see Subscribe to Hologres binary logs.
Prerequisites
A Serverless resource group is purchased.
Hologres and Kafka data sources are created. For more information, see Create a data source for Data Integration.
Network connectivity between the resource group and the data sources is established. For more information, see Network connectivity solutions.
Procedure
1. Select a synchronization task type
Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane, click Synchronization Task. Then, click Create Synchronization Task at the top of the page to go to the page for creating a synchronization task. Configure the following basic information:
Data Source And Destination:
Hologres→KafkaNew Task Name: Customize a name for the synchronization task.
Synchronization Type:
Single Table Real-time.Synchronization Step: Select
Full Synchronization.
2. Configure network and resources
In the Network And Resource Configuration section, select a Resource Group for the synchronization task. You can allocate Task Resource Usage in CUs for the task.
For Source Data Source, select the added
Hologresdata source. For Destination Data Source, select the addedKafkadata source. Then, click Test Connectivity.
After you make sure that both the source and destination data sources are connected, click Next.
3. Configure the synchronization link
a. Configure the Hologres source
At the top of the page, click the Hologres data source and edit Holo Source Information.

In the Holo Source Information section, select the schema that contains the Hologres table from which you want to read data and the source table.
Click Data Sampling in the upper-right corner.
In the Data Output Preview dialog box, specify Number Of Samples and click Start Collection. You can sample data from the specified Hologres table to preview the data in the Hologres table. This provides input for data preview and visual configuration in subsequent data processing nodes.
b. Configure the Kafka destination
At the top of the page, click the Kafka destination and edit Kafka Destination Information.

In the Kafka Destination Information section, select the Kafka topic to which you want to write data.
Set Merge Source Binlog Update Messages as needed. If you enable this option, the two update messages that correspond to an update operation in the source binary logs are merged into one message before they are written to Kafka.
Set Output Format, Key Column, and Kafka Producer Parameters.
Output Format: Confirm the format of the value content in records that are written to Kafka. Valid values: Canal CDC and JSON. For more information, see Appendix: Description of output formats.
Key Column: Select source columns. The values of the selected columns are serialized into strings and concatenated with commas to form the key of records that are written to the Kafka topic.
NoteThe serialization rules for column values are the same as the JSON serialization rules for column data types in Hologres.
The key values in the Kafka topic determine the partitions to which data is written. Data with the same key value is written to the same partition. To ensure that a consumer can consume data in the Kafka topic in sequence, we recommend that you use the primary key columns of the Hologres table as the key columns.
If no source column is used as the key column, the key values in the Kafka topic are null. In this case, data is written to random partitions in the Kafka topic.
Kafka Producer Parameters: These parameters affect the consistency, stability, and exception handling behavior of write operations. In most cases, you can use the default configurations. If you have custom requirements, you can specify specific parameters. For information about the producer parameters that are supported by different versions of Kafka, see the Kafka documentation.
4. Configure alert rules
To prevent the failure of the synchronization task from causing latency on business data synchronization, you can configure different alert rules for the synchronization task.
In the upper-right corner of the page, click Configure Alert Rule to go to the Configure Alert Rule panel.
In the Configure Alert Rule panel, click Add Alert Rule. In the Add Alert Rule dialog box, configure the parameters to configure an alert rule.
NoteThe alert rules that you configure in this step take effect for the real-time synchronization subtask that will be generated by the synchronization task. After the configuration of the synchronization task is complete, you can refer to Manage real-time synchronization tasks to go to the Real-time Synchronization Task page and modify alert rules configured for the real-time synchronization subtask.
Manage alert rules.
You can enable or disable alert rules that are created. You can also specify different alert recipients based on the severity levels of alerts.
5. Configure advanced parameters
DataWorks allows you to modify the configurations of specific parameters. You can change the values of these parameters based on your business requirements.
To prevent unexpected errors or data quality issues, we recommend that you understand the meanings of the parameters before you change the values of the parameters.
In the upper-right corner of the configuration page, click Configure Advanced Parameters.
In the Configure Advanced Parameters panel, change the values of the desired parameters.
6. Configure resource groups
You can click Configure Resource Group in the upper-right corner of the page to view and change the resource groups that are used to run the current synchronization task.
7. Execute the synchronization task
After the configuration of the synchronization task is complete, click Complete in the lower part of the page.
In the Tasks section of the Synchronization Task page, find the created synchronization task and click Start in the Operation column.
Click the name or ID of the synchronization task in the Tasks section and view the detailed running process of the synchronization task.
Perform O&M operations on the synchronization task
View the status of the synchronization task
After the data synchronization solution is created, you can go to the Tasks page to view all data synchronization solutions that are created in the workspace and the basic information of each data synchronization solution.

You can Start or Stop a synchronization task in the Actions column. You can also Edit or View a synchronization task from the More drop-down list.
For a started task, you can view the basic status of the task in Execution Overview. You can also click the corresponding overview area to view execution details.

A real-time synchronization task from a Hologres table to Kafka consists of the following three steps:
Structure Migration: includes the creation method of the destination table (existing table or automatic table creation). If you select automatic table creation, the data definition language (DDL) statement for creating the table is displayed.
Full Initialization: If you select Full Synchronization for Synchronization Step of your task, the progress of full initialization is displayed here.
Real-time Data Synchronization: includes statistics information about real-time synchronization, such as real-time read and write traffic, dirty data, failover, and operation logs.
Rerun the synchronization task
In some special cases, if you want to modify the fields to synchronize, the fields in a destination table, or table name information, you can also click Rerun in the Operation column of the desired synchronization task. This way, the system synchronizes the changes that are made to the destination. Data in the tables that are already synchronized and are not modified will not be synchronized again.
Directly click Rerun without modifying the configurations of the synchronization task to enable the system to rerun the synchronization task.
Modify the configurations of the synchronization task and then click Complete. Click Apply Updates that is displayed in the Operation column of the synchronization task to rerun the synchronization task for the latest configurations to take effect.
Appendix: Description of output formats
Canal CDC
Canal CDC is a CDC data format defined by Alibaba Canal.
Json
Json is a format that uses field names in the Hologres binary logs as keys and serializes the data content of the fields into strings as values. Then, the keys and values are organized as JSON-formatted strings and written to the Kafka topic.
