This topic describes how to synchronize data from a self-managed TiDB database to an AnalyticDB for MySQL cluster by using Data Transmission Service (DTS). In this example, Pump, Drainer, and a Kafka cluster are deployed.
Prerequisites
- An AnalyticDB for MySQL cluster is created. For more information, see Create an AnalyticDB for MySQL cluster.
- The destination AnalyticDB for MySQL cluster has sufficient storage space.
Background information

The binary log format and implementation mechanism of a TiDB database are different from those of a MySQL database. To synchronize data and minimize modifications to the source TiDB database, you must deploy Pump, Drainer, and a Kafka cluster.
Pump records the binary log files that are generated in TiDB in real time, and sends the binary log files to Drainer. Drainer writes the binary log files to the downstream Kafka cluster. During incremental data synchronization, DTS retrieves data from the Kafka cluster and synchronizes the data to the destination database in real time. For example, DTS can synchronize data to an AnalyticDB for MySQL cluster.
Precautions
- DTS uses read and write resources of the source and destination databases during initial full data synchronization. This may increase the loads of the database servers. If the database performance or specifications are unfavorable, or the data volume is large, database services may become unavailable. For example, DTS occupies a large amount of read and write resources in the following cases: a large number of slow SQL queries are performed on the source database, the tables have no primary keys, or a deadlock occurs in the destination database. Before you synchronize data, evaluate the impact of data synchronization on the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. For example, you can synchronize data when the CPU utilization of the source and destination databases is less than 30%.
- We recommend that you do not use gh-ost or pt-online-schema-change to perform DDL operations on the required objects during data synchronization. Otherwise, data may fail to be synchronized.
- Due to the limits of AnalyticDB for MySQL, if the disk space usage of the nodes in an AnalyticDB for MySQL cluster reaches 80%, the cluster is locked. We recommend that you estimate the required disk space based on the objects to be synchronized. You must make sure that the destination cluster has sufficient storage space.
- Prefix indexes cannot be synchronized. If the source database contains prefix indexes, data may fail to be synchronized.
Billing
Synchronization type | Task configuration fee |
---|---|
Schema synchronization and full data synchronization | Free of charge. |
Incremental data synchronization | Charged. For more information, see Billing overview. |
SQL operations that can be synchronized
- DDL operations: CREATE TABLE, DROP TABLE, RENAME TABLE, TRUNCATE TABLE, ADD COLUMN, and DROP COLUMN
- DML operations: INSERT, UPDATE, and DELETE
Preparations
- Deploy Pump and Drainer. For more information, see TiDB Binlog Cluster Deployment.
- Modify the configuration file of Drainer and specify a Kafka cluster to receive data from Drainer. For more information, see Binlog Slave Client User Guide.
- Deploy a Kafka cluster by using one of the following methods:
- Deploy a self-managed Kafka cluster. For more information, visit the Apache Kafka official website. Warning We recommend that you set the
message.max.bytes
andreplica.fetch.max.bytes
parameters for the Kafka broker to greater values. We also recommend that you set thefetch.message.max.bytes
parameter for the Kafka consumer to a greater value. These settings ensure that the Kafka cluster can receive the binary log files that are generated in TiDB. For more information, see Kafka 2.5 Documentation. - Purchase and deploy a Message Queue for Apache Kafka instance. For more information, see Quick start of Message Queue for Apache Kafka. Note The Message Queue for Apache Kafka instance must be deployed in the same virtual private cloud (VPC) as the source database server. This ensures reliable data transmission and minimizes the impact of network latency on data synchronization.
- Deploy a self-managed Kafka cluster. For more information, visit the Apache Kafka official website.
- Create a topic in the self-managed Kafka cluster or the Message Queue for Apache Kafka instance.
- Add the CIDR blocks of DTS servers to a whitelist of the TiDB database. For more information, see Add the CIDR blocks of DTS servers to the security settings of on-premises databases.
Procedure
- Purchase a data synchronization instance. For more information, see Purchase a DTS instance. Note On the buy page, set Source Instance to TiDB and set Destination Instance to AnalyticDB for MySQL.
- Log on to the DTS console. Note If you are redirected to the Data Management (DMS) console, you can click the
icon in the lower-right corner to go to the previous version of the DTS console.
- In the left-side navigation pane, click Data Synchronization.
- In the upper part of the Data Synchronization Tasks page, select the region where the data synchronization instance resides.
- Find the data synchronization instance and click Configure Task in the Actions column.
- Configure the source and destination instances.
- In the lower-right corner of the page, click Set Whitelist and Next. Note
- You do not need to modify the security settings for ApsaraDB instances (such as ApsaraDB RDS for MySQL and ApsaraDB for MongoDB) and ECS-hosted databases. DTS automatically adds the CIDR blocks of DTS servers to the whitelists of ApsaraDB instances or the security group rules of Elastic Compute Service (ECS) instances. For more information, see Add the CIDR blocks of DTS servers to the security settings of on-premises databases.
- After data synchronization is complete, we recommend that you remove the CIDR blocks of DTS servers from the whitelists or security groups.
- Select the synchronization policy and the objects to synchronize.
Parameter or setting Description Select the initial synchronization types You must select both Initial Schema Synchronization and Initial Full Data Synchronization in most cases. After the precheck is complete, DTS synchronizes the schemas and data of required objects from the source instance to the destination cluster. The schemas and data are the basis for subsequent incremental synchronization. Select the processing mode of conflicting tables - Precheck and Report Errors: checks whether the destination database contains tables that have the same names as tables in the source database. If the destination database does not contain tables that have the same names as tables in the source database, the precheck is passed. Otherwise, an error is returned during precheck and the data synchronization task cannot be started. Note You can use the object name mapping feature to rename the tables that are synchronized to the destination database. If the source and destination databases contain identical table names and the tables in the destination database cannot be deleted or renamed, you can use this feature. For more information, see Rename an object to be synchronized.
- Ignore Errors and Proceed: skips the precheck for identical table names in the source and destination databases. Warning If you select Ignore Errors and Proceed, data inconsistency may occur and your business may be exposed to potential risks.
- If the source and destination databases have the same schema, DTS does not synchronize data records that have the same primary keys as data records in the destination database.
- If the source and destination databases have different schemas, initial data synchronization may fail. In this case, only specific columns are synchronized, or the data synchronization task fails.
Specify whether to merge tables - If you select Yes, DTS adds the
__dts_data_source
column to each table to store data sources. In this case, DDL operations cannot be synchronized. - No is selected by default. In this case, DDL operations can be synchronized.
Note If you set this parameter to Yes, all the selected source tables in the task are merged into the destination table. To merge only the data source columns of specific tables, you can create two data synchronization tasks.Select the operation types to synchronize Select the types of operations that you want to synchronize based on your business requirements. All operation types are selected by default. For more information, see SQL operations that can be synchronized. Select the objects to synchronize Select one or more objects from the Available section and click the
icon to add the objects to the Selected section.
You can select tables or databases as the objects to synchronize.
Note- If you select a database as the object to synchronize, all schema changes in the database are synchronized to the destination database.
- If you select a table as the object to synchronize, only the ADD COLUMN operations that are performed on the table are synchronized to the destination database.
- By default, after an object is synchronized to the destination cluster, the name of the object remains unchanged. You can use the object name mapping feature to rename the objects that are synchronized to the destination cluster. For more information, see Rename an object to be synchronized.
Rename Databases and Tables You can use the object name mapping feature to rename the objects that are synchronized to the destination instance. For more information, see Object name mapping.
Replicate Temporary Tables When DMS Performs DDL Operations If you use Data Management (DMS) to perform online DDL operations on the source database, you can specify whether to synchronize temporary tables generated by online DDL operations.- Yes: DTS synchronizes the data of temporary tables generated by online DDL operations. Note If online DDL operations generate a large amount of data, the data synchronization task may be delayed.
- No: DTS does not synchronize the data of temporary tables generated by online DDL operations. Only the original DDL data of the source database is synchronized. Note If you select No, the tables in the destination database may be locked.
Retry Time for Failed Connections By default, if DTS fails to connect to the source or destination database, DTS retries within the next 720 minutes (12 hours). You can specify the retry time based on your needs. If DTS reconnects to the source and destination databases within the specified time, DTS resumes the data synchronization task. Otherwise, the data synchronization task fails.Note When DTS retries a connection, you are charged for the DTS instance. We recommend that you specify the retry time based on your business needs. You can also release the DTS instance at your earliest opportunity after the source and destination instances are released. - Precheck and Report Errors: checks whether the destination database contains tables that have the same names as tables in the source database. If the destination database does not contain tables that have the same names as tables in the source database, the precheck is passed. Otherwise, an error is returned during precheck and the data synchronization task cannot be started.
- In the lower-right corner of the page, click Next.
- Specify a type for the tables that you want to synchronize to the destination database. Note After you select Initial Schema Synchronization, you must specify the type, primary key column, and partition key column for the tables that you want to synchronize to the destination AnalyticDB for MySQL cluster. For more information, see CREATE TABLE.
- In the lower-right corner of the page, click Precheck. Note
- Before you can start the data synchronization task, DTS performs a precheck. You can start the data synchronization task only after the task passes the precheck.
- If the task fails to pass the precheck, click the
icon next to each failed item to view details.
- After you troubleshoot the issues based on the causes, run a precheck again.
- If you do not need to troubleshoot the issues, ignore failed items and run a precheck again.
- Close the Precheck dialog box after the following message is displayed: Precheck Passed. Then, the data synchronization task starts.
- Wait until initial synchronization is complete and the data synchronization task enters the Synchronizing state. You can view the status of the data synchronization task on the Synchronization Tasks page.