This topic describes how to synchronize data from a user-created MySQL database to Message Queue for Apache Kafka by using Data Transmission Service (DTS). The data synchronization feature allows you to extend message processing capabilities.
Prerequisites
- A MySQL database of version 5.1, 5.5, 5.6, 5.7, or 8.0 is created.
- A Kafka instance of version 0.10.1.0 to 1.0.2 is created.
- In the destination Kafka instance, a topic is created to receive the synchronized data. For more information, see Create a topic.
Background information
Message Queue for Apache Kafka is a distributed, high-throughput, and scalable message queue service that is provided by Alibaba Cloud. It provides fully managed services for the open source Apache Kafka to resolve the long-standing shortcomings of open source products. Message Queue for Apache Kafka allows you to focus on business development without spending much time in deployment and O&M. Message Queue for Apache Kafka is used in big data scenarios such as log collection, monitoring data aggregation, streaming data processing, and online and offline analysis. It is important for the big data ecosystem.
Precautions
- DTS uses read and write resources of the source and destination databases during initial full data synchronization. This may increase the database load. If the database performance is unfavorable, the specification is low, or the data volume is large, database services may become unavailable. For example, DTS occupies a large amount of read and write resources in the following cases: a large number of slow SQL queries are performed on the source database, the tables have no primary keys, or a deadlock occurs in the destination database. Before synchronizing data, you must evaluate the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. For example, you can synchronize data when the CPU usage of the source and destination databases is less than 30%.
- The source database must have PRIMARY KEY or UNIQUE constraints and all fields must be unique. Otherwise, the destination database may contain duplicate data records.
Limits
- You can select only tables as the objects to be synchronized.
- DTS does not synchronize the data in a renamed table to the destination Kafka cluster. This applies if the new table name is not included in the objects to be synchronized. To synchronize the data in a renamed table to the destination Kafka cluster, you must reselect the objects to be synchronized. For more information, see Add an object to a data synchronization task.
Data format
The data that is synchronized to the Kafka cluster is stored in the Avro format. For more information, see DTS Avro schema.