Migrate incremental data from a self-managed TiDB database to an ApsaraDB RDS for MySQL instance - Data Transmission Service

This topic describes how to migrate incremental data from a self-managed TiDB database to an ApsaraDB RDS for MySQL instance by using Data Transmission Service (DTS). Incremental data migration allows you to ensure service continuity when you migrate data to Alibaba Cloud. In this example, Pump, Drainer, and a Kafka cluster are deployed.

Prerequisites

Note

Before you migrate incremental data, you can migrate historical data from the self-managed TiDB database to the ApsaraDB RDS for MySQL instance. For more information, see Migrate full data from a self-managed TiDB database to an ApsaraDB RDS for MySQL instance.

Important

The destination ApsaraDB RDS for MySQL instance must reside in the China (Hangzhou), China (Shanghai), China (Qingdao), China (Beijing), China (Shenzhen), China (Zhangjiakou), China (Hong Kong), Singapore (Singapore), US (Silicon Valley), or US (Virginia) region.
The available storage space of the destination ApsaraDB RDS for MySQL instance must be larger than the total size of the data in the self-managed TiDB database.

Background information

Migrate incremental data from TiDB

The binary log format and implementation mechanism of a TiDB database are different from those of a MySQL database. To migrate incremental data and minimize modifications to the source TiDB database, you must deploy Pump, Drainer, and a Kafka cluster.

Pump records the binary log files that are generated in TiDB in real time, and sends the binary log files to Drainer. Drainer writes the binary log files to the downstream Kafka cluster. During incremental data migration, DTS retrieves data from the Kafka cluster and migrates the data to the destination database in real time. For example, DTS can migrate incremental data to an ApsaraDB RDS for MySQL instance.

Limits

DTS uses read and write resources of the source and destination databases during full data migration. This may increase the loads of the database servers. If the database performance is unfavorable, the specification is low, or the data volume is large, database services may become unavailable. For example, DTS occupies a large amount of read and write resources in the following cases: a large number of slow SQL queries are performed on the source database, the tables have no primary keys, or a deadlock occurs in the destination database. Before you migrate data, evaluate the impact of data migration on the performance of the source and destination databases. We recommend that you migrate data during off-peak hours. For example, you can migrate data when the CPU utilization of the source and destination databases is less than 30%.
The tables to be migrated in the source database must have PRIMARY KEY or UNIQUE constraints and all fields must be unique. Otherwise, the destination database may contain duplicate data records.
DTS uses the ROUND(COLUMN,PRECISION) function to retrieve values from columns of the FLOAT or DOUBLE data type. If you do not specify a precision, DTS sets the precision for the FLOAT data type to 38 digits and the precision for the DOUBLE data type to 308 digits. You must check whether the precision settings meet your business requirements.
DTS automatically creates a destination database in the ApsaraDB RDS for MySQL instance. However, if the name of the source database is invalid, you must manually create a database in the ApsaraDB RDS for MySQL instance before you configure the data migration task.
Note
For more information about the database naming conventions of ApsaraDB RDS for MySQL databases and how to create a database, see Manage databases.
If a data migration task fails, DTS automatically resumes the task. Before you switch your workloads to the destination instance, stop or release the data migration task. Otherwise, the data in the source database overwrites the data in the destination instance after the task is resumed.

Billing rules

Migration type	Task configuration fee	Internet traffic fee
Schema migration and full data migration	Free of charge.	Charged only when data is migrated from Alibaba Cloud over the Internet. For more information, see Billing overview.
Incremental data migration	Charged. For more information, see Billing overview.

Migration types

Migration type	Description
Schema migration	DTS migrates the schemas of required objects to the destination database. DTS supports schema migration for views, tables, and databases. Warning TiDB and MySQL are heterogeneous databases. DTS does not ensure that the schemas of the source and destination databases are consistent after schema migration. We recommend that you evaluate the impact of data type conversion on your business. For more information, see Data type mappings between heterogeneous databases.
Full data migration	DTS migrates the historical data of required objects to the destination database. Note During full data migration, concurrent INSERT operations cause fragmentation in the tables of the destination database. After full data migration is complete, the size of used tablespace of the destination database is larger than that of the source database.
Incremental data migration	DTS retrieves binary log files that are generated in TiDB from the Kafka cluster, and migrates incremental data to the destination database in real time. During incremental data migration, the following SQL operations can be synchronized: DML operations: INSERT, UPDATE, and DELETE DDL operations: CREATE TABLE, DROP TABLE, ALTER TABLE, RENAME TABLE, TRUNCATE TABLE, CREATE VIEW, DROP VIEW, and ALTER VIEW Incremental data migration allows you to ensure service continuity when you migrate data from a self-managed TiDB database to Alibaba Cloud.

Preparations

Note

The server on which the source database is deployed must be in the same internal network as the servers on which Pump, Drainer, and the Kafka cluster are deployed. This minimizes the impact of network latency on the incremental data migration task.

Deploy Pump and Drainer. For more information, see TiDB Binlog Cluster Deployment.
Modify the configuration file of Drainer and specify a Kafka cluster to receive data from Drainer. For more information, see Binlog Slave Client User Guide.
Deploy a Kafka cluster by using one of the following methods:
- Deploy a self-managed Kafka cluster. For more information, visit the Apache Kafka official website.
  Warning
  We recommend that you set the message.max.bytes and replica.fetch.max.bytes parameters for the Kafka broker and the fetch.message.max.bytes parameter for the Kafka consumer to greater values. This ensures that the Kafka cluster can receive the binary log files that are generated in TiDB. For more information, see Kafka 2.5 Documentation.
- Purchase and deploy a Message Queue for Apache Kafka instance. For more information, see Quick start of Message Queue for Apache Kafka.
  Note
  The Message Queue for Apache Kafka instance must be deployed in the same virtual private cloud (VPC) as the source database server. This ensures reliable data transmission and minimizes the impact of network latency on incremental data migration.
Create a topic in the self-managed Kafka cluster or the Message Queue for Apache Kafka instance.
Add the CIDR blocks of DTS servers to a whitelist of the TiDB database. For more information, see Add the CIDR blocks of DTS servers.

Procedure

Log on to the DTS console.
Note
If you are redirected to the Data Management (DMS) console, you can click the icon in the to go to the previous version of the DTS console.
In the left-side navigation pane, click Data Migration.
At the top of the Migration Tasks page, select the region where the destination cluster resides.
In the upper-right corner of the page, click Create Migration Task.

Configure the source and destination databases.

Configure the task name and source database.

Configure the task name and source database

Parameter	Description
Task Name	The task name that DTS automatically generates. We recommend that you specify a descriptive name that makes it easy to identify the task. You do not need to specify a unique task name.
Instance Type	The access method of the source database. In this example, User-Created Database in ECS Instance is selected. Note If you select other instance types, you must deploy the network environment for the self-managed database. For more information, see Preparation overview.
Instance Region	The region of the Elastic Compute Service (ECS) instance on which the source TiDB database is deployed.
Database Type	Select TiDB.
Port Number	The service port number of the source TiDB database. Default value: 4000.
Database Account	The account of the source TiDB database. The account must have the SELECT permission on the objects to migrate and the SHOW VIEW permission.
Database Password	The password of the database account. Important After you specify the information about the source database, you can click Test Connectivity next to Database Password to check whether the information is valid. If the information is valid, the Passed message appears. If the Failed message appears, click Check next to Failed. Then, modify the information based on the check results.
Incremental migration or not	Specifies whether to perform incremental data migration. In this example, Yes is selected. For more information about how to perform only full data migration, see Migrate full data from a self-managed TiDB database to an ApsaraDB RDS for MySQL instance.
Kafka Cluster Type	The access method of the Kafka cluster. In this example, User-Created Database in ECS Instance is selected. If the Kafka cluster is connected over other methods, you must deploy the network environment for the Kafka cluster. For more information, see Preparation overview. Note You cannot select Message Queue for Apache Kafka for the Kafka Cluster Type parameter. If you deploy a Message Queue for Apache Kafka instance, you must select User-Created Database Connected over Express Connect, VPN Gateway, or Smart Access Gateway. Then, you must select the VPC to which the Message Queue for Apache Kafka instance belongs.
Instance Region	The value of this parameter is the same as the region of the source database and cannot be changed.
ECS Instance ID	The ID of the ECS instance that hosts the self-managed Kafka cluster.
Kafka Port Number	The service port number of the self-managed Kafka cluster. Default value: 9092.
Kafka Cluster Account	The username that is used to log on to the Kafka cluster. If no authentication is enabled for the Kafka cluster, you do not need to enter the username.
Kafka Cluster Password	The password that corresponds to the username. If no authentication is enabled for the Kafka cluster, you do not need to enter the password.
Topic	Click Get Topic List and select a topic name from the drop-down list.
Kafka version	The version of the self-managed Kafka cluster.
Kafka Cluster Encryption	Select Non-encrypted or SCRAM-SHA-256 based on your business and security requirements.

Configure the destination database.

Configure the destination database

Parameter	Description
Instance Type	Select RDS Instance.
Instance Region	The region where the destination ApsaraDB RDS for MySQL instance resides.
Database Account	The database account of the destination ApsaraDB RDS for MySQL instance. The account must have read and write permissions on the destination database. For more information about how to create and authorize a database account, see Create an account and Modify account permissions.
Database Password	The password of the database account. Important After you specify the information about the source database, you can click Test Connectivity next to Database Password to check whether the information is valid. If the information is valid, the Passed message appears. If the Failed message appears, click Check next to Failed. Then, modify the information based on the check results.
Encryption	Select Non-encrypted or SSL-encrypted based on your needs. If you select SSL-encrypted, you must enable SSL encryption for the ApsaraDB RDS instance before you configure the data migration task. For more information, see Use a cloud certificate to enable SSL encryption. Important The Encryption parameter is available only for regions in the Chinese mainland and the China (Hong Kong) region.

In the lower-right corner of the page, click Set Whitelist and Next.
Warning
If the CIDR blocks of DTS servers are automatically or manually added to the whitelist of the database or instance, or to the ECS security group rules, security risks may arise. Therefore, before you use DTS to migrate data, you must understand and acknowledge the potential risks and take preventive measures, including but not limited to the following measures: enhance the security of your username and password, limit the ports that are exposed, authenticate API calls, regularly check the whitelist or ECS security group rules and forbid unauthorized CIDR blocks, or connect the database to DTS by using Express Connect, VPN Gateway, or Smart Access Gateway.

Select the migration types and objects to migrate.

Select the migration types and objects to migrate

Setting	Description
Select the migration types	If you want to perform only full data migration, select Schema Migration and Full Data Migration. If you want to ensure service continuity during data migration, select Schema Migration, Full Data Migration, and Incremental Data Migration. In this example, all of the three migration types are selected.
Select the objects that you want to migrate	Select one or more objects from the Available section and click the icon to add the objects to the Selected section. Note You can select columns, tables, or databases as the objects to be migrated. If you select tables or columns as the objects to be migrated, DTS does not migrate other objects such as views, triggers, or stored procedures to the destination database. By default, after an object is migrated to the destination database, the name of the object remains unchanged. You can use the object name mapping feature to rename the objects that are migrated to the destination database. For more information, see Object name mapping. If you use the object name mapping feature to rename an object, other objects that are dependent on the object may fail to be migrated.
Specify whether to rename objects	You can use the object name mapping feature to rename the objects that are migrated to the ApsaraDB RDS instance. For more information, see Object name mapping.
Specify the retry time range for failed connections to the source or destination database	By default, if DTS fails to connect to the source or destination database, DTS retries within the following 12 hours. You can specify the retry time range based on your business requirements. If DTS is reconnected to the source and destination databases within the specified time range, DTS resumes the data migration task. Otherwise, the data migration task fails. Note When DTS retries a connection, you are charged for the DTS instance. We recommend that you specify the retry time based on your business needs. You can also release the DTS instance at your earliest opportunity after the source and destination instances are released.

In the lower-right corner of the page, click Precheck.
Note
- Before you can start the data migration task, DTS performs a precheck. You can start the data migration task only after the task passes the precheck.
- If the task fails to pass the precheck, you can click the icon next to each failed item to view details.
  You can troubleshoot the issues based on the causes and run a precheck again.
  If you do not need to troubleshoot the issues, you can ignore failed items and run a precheck again.
After the task passes the precheck, click Next.
In the Confirm Settings dialog box, specify the Channel Specification parameter and select Data Transmission Service (Pay-As-You-Go) Service Terms.
Click Buy and Start to start the data migration task.
- Schema migration and full data migration
  We recommend that you do not manually stop the task during full data migration. Otherwise, the data migrated to the destination database may be incomplete. You can wait until the data migration task automatically stops.
- Schema migration, full data migration, and incremental data migration
  An incremental data migration task does not automatically stop. You must manually stop the task.
  Important
  We recommend that you select an appropriate time to manually stop the data migration task. For example, you can stop the task during off-peak hours or before you switch your workloads to the destination cluster.
  1. Wait until Incremental Data Migration and The migration task is not delayed appear in the progress bar of the migration task. Then, stop writing data to the source database for a few minutes. The latency of incremental data migration may be displayed in the progress bar.
  2. Wait until the status of incremental data migration changes to The migration task is not delayed again. Then, manually stop the migration task.