After you prepare data sources, network environments, and resources, you can create a real-time synchronization node to synchronize data to Kafka. This topic describes how to create a real-time synchronization node and view the status of the node.
Prerequisites
- The data sources that you want to use are prepared. Before you configure a data synchronization node, you must prepare the data sources from which you want to read data and to which you want to write data. This way, when you configure a data synchronization node, you can select the data sources. For information about the data source types, readers, and writers that are supported by real-time synchronization, see Data source types that support real-time synchronization. Note For information about the items that you need to understand before you prepare a data source, see Overview.
- An exclusive resource group for Data Integration that meets your business requirements is purchased. For more information, see Create and use an exclusive resource group for Data Integration.
- Network connections are established between the exclusive resource group for Data Integration and the data sources. For more information, see Establish a network connection between a resource group and a data source.
- The data source environments are prepared. You must create an account that can be used to access a database in the source and an account that can be used to access a database in the destination. You must also grant the accounts the permissions required to perform specific operations on the databases based on your configurations for data synchronization. For more information, see Overview.
Limits
- You can use only exclusive resource groups for Data Integration to run real-time synchronization nodes.
- You can use a real-time synchronization node to synchronize data only from a MySQL, Oracle, or PolarDB data source to Kafka.
Precautions
- If a source table has a primary key, the values of the primary key are used as the keys in Kafka records. This ensures that changes of data that use the same primary key value in the source table are sequentially written to the same partition in Kafka.
- If you select source tables that do not have a primary key for synchronization when you configure the destination, empty strings are used as the keys in Kafka records during data synchronization. To ensure that data changes in the source table can be sequentially written to Kafka, you must make sure that the Kafka topic to which the data changes are written contains only one partition. You can specify a custom primary key for a source table that does not have a primary key when you configure a destination table. In this case, a field or a combination of multiple fields in the source table are used as the primary key. The values of the primary key are used as the keys in Kafka records during data synchronization.
- To ensure that changes of data that use the same primary key value in the source table are sequentially written to the same partition in Kafka when a response exception occurs on the Kafka data source, you must add the following configurations to extended parameters when you add the Kafka data source to DataWorks:
{"max.in.flight.requests.per.connection":1,"buffer.memory": 100554432}
Important After you add the configurations to the extended parameters of the Kafka data source, data synchronization performance is significantly degraded. You must balance the performance and order of data write operation. - For more information about the format of a Kafka message, format of a heartbeat message that is generated by a synchronization node, and format of a Kafka message for data changes in the source, see Appendix: Message formats.
Create a real-time synchronization node
- Create a real-time synchronization node to synchronize all data in a database.
- Configure a resource group.
- Configure the source and synchronization rules.
- Configure the destination topics.
- Configure the resources required to run the real-time synchronization node.
Commit and deploy the real-time synchronization node
Commit and deploy the MySQL node.
- Click the icon in the top toolbar to save the node.
- Click the icon in the top toolbar to commit the node.
- In the Commit Node dialog box, configure the Change description parameter.
- Click OK.
If you use a workspace in standard mode, you must deploy the node in the production environment after you commit the node. On the left side of the top navigation bar, click Deploy. For more information, see Deploy nodes.
What to do next
- After the real-time synchronization node is configured, you can start and manage the node on the Real Time DI page in Operation Center. To go to the Real Time DI page, perform the following operations: Log on to the DataWorks console and go to the Operation Center page. In the left-side navigation pane of the Operation Center page, choose . For more information, see Perform operations for a real-time synchronization node.
- The data in the source is written to the destination Kafka topics in the JSON format. For more information about the formats of Kafka messages that indicate data changes and status of data in the data source, see Appendix: Message formats.