Kafka is a distributed message queue service that features high throughput and high scalability. Kafka is widely used for big data analytics such as log collection, data aggregation, streaming processing, and online and offline analysis. It is important for the big data ecosystem. This topic describes how to synchronize data from a PolarDB for MySQL cluster to a user-created Kafka cluster by using Data Transmission Service (DTS). The data synchronization feature allows you to extend message processing capabilities.
Prerequisites
- A Kafka cluster is created and the Kafka version is 0.10.1.0 to 1.0.2.
- The binary logging feature is enabled for the PolarDB for MySQL cluster. For more information, see Enable binary logging
Precautions
The source database must have PRIMARY KEY or UNIQUE constraints and all fields must be unique. Otherwise, the destination database may contain duplicate data records.
Limits
- You can select only tables as the objects to be synchronized.
- DTS does not automatically update the objects to be synchronized based on their names.
Note DTS does not synchronize the data in a renamed table to the destination Kafka cluster. This applies if the new table name is not included in the objects to be synchronized. To synchronize the data in a renamed table to the destination Kafka cluster, you must modify the objects to be synchronized. For more information, see Add an object to a data synchronization task.
Data format
The data that is synchronized to the Kafka cluster is stored in the Avro format. For more information, see DTS Avro schema.