Kafka is a distributed message queue service that features high throughput and high scalability. Kafka is widely used for big data analytics such as log collection, data aggregation, streaming processing, and online and offline analysis. It is important for the big data ecosystem. This topic describes how to synchronize data from a user-created MySQL database hosted on ECS to a user-created Kafka cluster by using Data Transmission Service (DTS). The data synchronization feature allows you to extend message processing capabilities.
Prerequisites
- A MySQL database of version 5.1, 5.5, 5.6, 5.7, or 8.0 is created. The database is hosted on an ECS instance.
- A Kafka cluster is created and the Kafka version is 0.10.1.0 to 1.0.2.
Precautions
- DTS uses read and write resources of the source and destination databases during initial full data synchronization. This may increase the database load. If the database performance is unfavorable, the specification is low, or the data volume is large, database services may become unavailable. For example, DTS occupies a large amount of read and write resources in the following cases: a large number of slow SQL queries are performed on the source database, the tables have no primary keys, or a deadlock occurs in the destination database. Before synchronizing data, you must evaluate the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. For example, you can synchronize data when the CPU usage of the source and destination databases is less than 30%.
- The source database must have PRIMARY KEY or UNIQUE constraints and all fields must be unique. Otherwise, the destination database may contain duplicate data records.
Limits
- You can select only tables as the objects to be synchronized.
- DTS does not synchronize the data in a renamed table to the destination Kafka cluster. This applies if the new table name is not included in the objects to be synchronized. To synchronize the data in a renamed table to the destination Kafka cluster, you must modify the objects to be synchronized. For more information, see Add an object to a data synchronization task.
Data format
The data that is synchronized to the Kafka cluster is stored in the Avro format. For more information, see DTS Avro schema.
Supported synchronization topologies
- One-way one-to-one synchronization
- One-way one-to-many synchronization
- One-way many-to-one synchronization
- One-way cascade synchronization
Preparations
Before you configure a synchronization task, you must create an account and configure binary logging for the source instance. For more information, see Create an account for a user-created MySQL database and configure binary logging.