This topic describes how to synchronize data from a self-managed MySQL database to a Message Queue for Apache Kafka instance by using Data Transmission Service (DTS). The data synchronization feature allows you to extend message processing capabilities.
Prerequisites
- The version of the self-managed MySQL database is 5.1, 5.5, 5.6, 5.7, or 8.0.
- The version of the destination Message Queue for Apache Kafka instance is 0.10.1.0 to 2.x.
- In the destination Message Queue for Apache Kafka instance, a topic is created to receive the synchronized data. For more information, see Create a topic.
Background information
Message Queue for Apache Kafka is a distributed, high-throughput, and scalable message queue service that is provided by Alibaba Cloud. It provides fully managed services for the open source Apache Kafka to resolve the long-standing shortcomings of open source products. Message Queue for Apache Kafka allows you to focus on business development without spending much time in deployment and O&M. Message Queue for Apache Kafka is used in big data scenarios such as log collection, monitoring data aggregation, streaming data processing, and online and offline analysis. It is important for the big data ecosystem.
Precautions
- DTS uses read and write resources of the source and destination databases during initial full data synchronization. This may increase the loads of the database servers. If the database performance is unfavorable, the specification is low, or the data volume is large, database services may become unavailable. For example, DTS occupies a large amount of read and write resources in the following cases: a large number of slow SQL queries are performed on the source database, the tables have no primary keys, or a deadlock occurs in the destination database. Before you synchronize data, evaluate the impact of data synchronization on the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. For example, you can synchronize data when the CPU utilization of the source and destination databases is less than 30%.
- The source database must have PRIMARY KEY or UNIQUE constraints and all fields must be unique. Otherwise, the destination database may contain duplicate data records.
Limits
- Only tables can be selected as the objects to synchronize.
- DTS does not synchronize the data in a renamed table to the destination Kafka cluster. This applies if the new table name is not included in the objects to synchronize. If you want to synchronize the data in a renamed table to the destination Kafka cluster, you must reselect the objects to be synchronized. For more information, see Add an object to a data synchronization task.