Migrate data from TiDB to PolarDB-X 2.0 - Data Transmission Service

Migration process overview

The end-to-end migration has the following stages:

Set up incremental data collection (required only for incremental data migration): deploy Apache Kafka and connect TiDB Binlog or TiCDC to stream changes to Kafka.
Grant the required permissions: on both the source TiDB database and the destination PolarDB-X 2.0 instance.
Create a DTS migration task: configure source, destination, and migration type, then run a precheck.
Purchase and start the instance: select an instance class and start the migration.

Prerequisites

Before you begin, ensure that you have:

A PolarDB-X 2.0 instance with more storage space than the source TiDB database. To create one, see Create PolarDB-X instances.
(Required for incremental data migration) A Kafka cluster and TiDB Binlog or TiCDC deployed and configured. See Set up incremental data collection.

Permissions required

Grant the following permissions before creating the DTS task.

Database	Required permissions	Reference
TiDB (source)	SHOW VIEW and SELECT on the objects to be migrated	Permission Management
PolarDB-X 2.0 (destination)	Read and write on the destination database	Manage database accounts

Supported SQL operations for incremental migration

Type	Operations
DML	INSERT, UPDATE, DELETE
DDL	CREATE TABLE, DROP TABLE, ALTER TABLE, RENAME TABLE, TRUNCATE TABLE, CREATE VIEW, ALTER VIEW

Billing

Migration type	Instance configuration fee	Internet traffic fee
Schema migration and full data migration	Free of charge	Charged when Access Method is set to Public IP Address. See Billing overview.
Incremental data migration	Charged. See Billing overview.	—

Limitations

Review the following limitations before starting the migration.

Source database

The source database server must have enough outbound bandwidth. Insufficient bandwidth reduces migration speed.
Tables to be migrated must have PRIMARY KEY or UNIQUE constraints, with all fields unique. Without these, the destination database may contain duplicate records.
If you select individual tables as migration objects and need to rename tables or columns in the destination database, a single task can migrate up to 1,000 tables. For more than 1,000 tables, configure multiple tasks or migrate the entire database.
To migrate incremental data from the source TiDB database, you must deploy a Kafka cluster and install related components for the TiDB database to collect the incremental data.
TiDB does not store prefix index length in metadata. Migrating tables with prefix indexes causes length loss in the destination instance, which may cause the instance to fail. Manually fix prefix index lengths after migration.
Do not run DDL operations on the source database during schema migration or full data migration. DDL changes during this phase cause the migration task to fail.

Incremental data migration

DTS reads data only from the partition with ID 0 in the Kafka topic.
After the DTS instance is created, immediately run change or INSERT operations on test data in the source database. This updates the DTS instance offset. Without this step, high latency may cause the instance to fail.

General

Full data migration uses concurrent INSERT operations, which causes table fragmentation in the destination database. After full data migration, the storage used in the destination is larger than in the source.
Full data migration consumes read and write resources on both databases. Run migration during off-peak hours when CPU load is below 30%.
Do not write data from sources other than DTS to the destination database while the DTS instance is running. Doing so can cause data inconsistency and may cause the instance to fail.
For FLOAT and DOUBLE columns, DTS reads values using ROUND(COLUMN, PRECISION). The default precision is 38 for FLOAT and 308 for DOUBLE. Confirm that the precision meets your requirements.
DTS attempts to resume failed instances for up to 7 days. Before switching traffic to the destination, end or release the DTS instance, or revoke write permissions for the DTS database account to prevent resumed instances from overwriting destination data.
If a DTS task fails, DTS technical support will attempt to restore it within 8 hours. The task may be restarted and task parameters (not database parameters) may be modified during restoration. For the list of parameters that may be modified, see Modify instance parameters.

Set up incremental data collection

Skip this section if you only need full data migration.

To migrate incremental data from TiDB, you must route changes through Apache Kafka. DTS then reads from Kafka. Choose one of two methods: TiDB Binlog or TiCDC.

Deploy the source database server, Pump, Drainer (for TiDB Binlog) or TiCDC (for TiCDC), and the Kafka cluster on the same internal network. This minimizes network latency during incremental data migration.

Step 1: Prepare a Kafka cluster

Both methods require a Kafka cluster. Set the following parameters to values large enough to accommodate the binary log data volume from TiDB. For reference values, see CONFIGURATION.

Parameter	Where to set	Why
`message.max.bytes`	Kafka broker	Allows the broker to receive larger binary log payloads from TiDB
`replica.fetch.max.bytes`	Kafka broker	Allows replicas to fetch larger messages
`fetch.message.max.bytes`	Kafka consumer	Allows the consumer to fetch larger messages

Use one of the following options to create a Kafka cluster:

Self-managed Apache Kafka cluster: deploy Kafka on your own infrastructure. See the Apache Kafka official website.
ApsaraMQ for Kafka instance: create a managed Kafka instance. See Getting started overview.

The ApsaraMQ for Kafka instance must be in the same virtual private cloud (VPC) as the source database server.

Step 2: Create a topic

Create a topic in the Kafka cluster.

Important

The topic must contain exactly one partition. This ensures incremental data is replicated to partition ID 0, which is the only partition DTS reads from.

Step 3: Configure your change capture method

Choose Option A (TiDB Binlog) or Option B (TiCDC) based on your environment.

Use TiDB Binlog

Deploy Pump and Drainer. See TiDB Binlog cluster deployment.
Configure Drainer to forward data to your Kafka cluster. See Binlog Consumer Client User Guide.
Verify that the TiDB database server can connect to the Kafka cluster.
Add the CIDR blocks of DTS servers to the TiDB database whitelist. See Add the CIDR blocks of DTS servers.

Use TiCDC

Install TiCDC. Use TiUP to add a new TiCDC node or scale out an existing TiCDC node in the TiDB cluster. See Deploy and maintain TiCDC.Deploy and Maintain TiCDC
Create a changefeed to replicate incremental data from TiDB to Kafka. Use tiup cdc cli changefeed create \ in the first command line. See Replicate data to Kafka.Replicate Data to Kafka
Verify that the TiDB database server can connect to the Kafka cluster.

Create a migration task

Step 1: Go to the Data Migration page

Use one of the following consoles:

DTS console

Log on to the DTS console.DTS console
In the left-side navigation pane, click Data Migration.
In the upper-left corner, select the region where the migration instance will reside.

DMS console

The actual operation may vary based on the mode and layout of the DMS console. See Simple mode and Customize the layout and style of the DMS console.

Log on to the DMS console.DMS console
In the top navigation bar, move the pointer over Data + AI > DTS (DTS) > Data Migration.
From the drop-down list to the right of Data Migration Tasks, select the region.

Step 2: Configure source and destination databases

Click Create Task and configure the following parameters.

Source database (TiDB)

Parameter	Description
Task Name	A descriptive name for the DTS task. DTS generates a name automatically. No uniqueness required.
Select Existing Connection	If the TiDB instance is registered with DTS, select it from the list. DTS pre-fills the following fields. Otherwise, configure them manually.
Database Type	Select TiDB.
Access Method	Select the connection type based on where the TiDB database is deployed. This example uses Self-managed Database on ECS. For other connection types, complete the relevant preparations.
Instance Region	The region of the ECS instance hosting the TiDB database.
ECS Instance ID	The ID of the ECS instance hosting the TiDB database.
Port Number	The service port of the TiDB database. Default: 4000.
Database Account	The database account for the TiDB database.
Database Password	The password for the database account.
Migrate Incremental Data	Select Yesalert notification settings to enable incremental data migration. You must then configure the Kafka cluster parameters below.

Kafka cluster (required when Migrate Incremental Data is Yes)

Parameter	Description
Kafka Cluster Type	The deployment location of the Kafka cluster. This example uses Self-managed Database on ECS. If you select Express Connect, VPN Gateway, or Smart Access Gateway, also select a VPC from Connected VPC and specify Domain Name or IP.
Kafka Data Source Component	Select Use the default binlog format of the TiDB database (TiDB Binlog) or Use the TiCDC Canal-JSON format (TiCDC), based on your setup.
ECS Instance ID	The ID of the ECS instance where the Kafka cluster is deployed.
Port Number	The service port of the Kafka cluster.
Kafka Cluster Account	The username for the Kafka cluster. Leave blank if authentication is not enabled.
Kafka Cluster Password	The password for the Kafka cluster. Leave blank if authentication is not enabled.
Kafka Version	The Kafka cluster version. Select 1.0 if the version is 1.0 or later.
Encryption	Select Non-encrypted or SCRAM-SHA-256 based on your security requirements.
Topic	The topic that receives incremental data.

Destination database (PolarDB-X 2.0)

Parameter	Description
Select Existing Connection	If the PolarDB-X 2.0 instance is registered with DTS, select it from the list. Otherwise, configure the fields below.
Database Type	Select PolarDB-X 2.0.
Access Method	Select Alibaba Cloud Instance.
Instance Region	The region of the destination PolarDB-X 2.0 instance.
Instance ID	The ID of the destination PolarDB-X 2.0 instance.
Database Account	The database account for the destination instance.
Database Password	The password for the database account.

Step 3: Test connectivity

In the lower part of the page, click Test Connectivity and Proceed. In the CIDR Blocks of DTS Servers dialog box, click Test Connectivity.

Make sure DTS server CIDR blocks are added to the security settings of both source and destination databases. See Add the CIDR blocks of DTS servers.

Step 4: Configure migration objects

On the Configure Objects page, configure the following settings.

Migration types

Goal	Selection
Full migration only	Schema Migration + Full Data Migration
Migration with minimal downtime	Schema Migration + Full Data Migration + Incremental Data Migration

If you skip Schema Migration, create the target database and tables in the destination before starting. Enable object name mapping in Selected Objects.

If you skip Incremental Data Migration, avoid writing to the source database during migration to maintain data consistency.

Processing mode of conflicting tables

Option	Behavior
Precheck and Report Errors	DTS checks for tables with identical names in source and destination. The precheck fails if conflicts exist, blocking the task. To resolve conflicts, use object name mapping to rename destination tables.
Ignore Errors and Proceed	DTS skips the conflict check. During full data migration, conflicting records are not overwritten; the destination record is retained. During incremental data migration, conflicting records overwrite the destination record. If schemas differ, only specific columns may be migrated or the task may fail. Use with caution.

Capitalization of object names in destination instance: controls the capitalization of database names, table names, and column names in the destination. Default: DTS default policy. See Specify the capitalization of object names.

Source Objects: select objects to migrate at the database or table level, then click the arrow icon to move them to Selected Objects.

Selected Objects:

To rename an object or specify the destination object, right-click it and use object name mapping.
To filter rows using a WHERE clause, right-click the table and set a filter condition.
To remove an object, click it and then click the remove icon.

Object name mapping may cause dependent objects to fail migration.

Step 5: Configure advanced settings

Click Next: Advanced Settings and configure the following.

Parameter	Description
Retry Time for Failed Connections	How long DTS retries after a connection failure. Range: 10–1,440 minutes. Default: 720. Set to greater than 30 minutes. If DTS reconnects within this period, the task resumes; otherwise, it fails. Note that DTS charges for the instance during retries. If multiple tasks share a source or destination database, the most recently set retry time applies.
Retry Time for Other Issues	How long DTS retries after DDL or DML failures. Range: 1–1,440 minutes. Default: 10. Set to greater than 10 minutes. This value must be smaller than Retry Time for Failed Connections.
Enable Throttling for Full Data Migration	Throttle full data migration to reduce load on source and destination. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Migration is selected.
Enable Throttling for Incremental Data Migration	Throttle incremental data migration. Configure RPS of Incremental Data Migration and Data migration speed for incremental migration (MB/s). Available only when Incremental Data Migration is selected.
Environment Tag	An optional tag to identify the instance.
Configure ETL	Enable extract, transform, and load (ETL) to transform data during migration. Select Yes to enter data processing statements. See What is ETL? and Configure ETL in a data migration or data synchronization task.
Monitoring and Alerting	Configure alerts for task failures or latency exceeding a threshold. Select Yes to configure the alert threshold and notification settings. See Configure monitoring and alerting.

Step 6: Run a precheck

Click Next: Save Task Settings and Precheck.

To preview API parameters, move the pointer over the button and click Preview OpenAPI parameters before clicking through.

DTS runs a precheck before starting the task. The task can only start after passing the precheck.

If the precheck fails, click View Details next to each failed item, resolve the issues, then click Precheck Again.

If an alert is triggered: for non-ignorable alerts, resolve the issue and rerun the precheck. For ignorable alerts, click Confirm Alert Details > Ignore > OK > Precheck Again. Ignoring alerts may cause data inconsistency.

Step 7: Purchase and start the instance

Wait until Success Rate reaches 100%, then click Next: Purchase Instance.

On the Purchase Instance page, configure the following:

Parameter	Description
Resource Group	The resource group for the instance. Default: default resource group. See What is Resource Management?
Instance Class	The instance class, which determines migration speed. See Instance classes of data migration instances.

Read and accept Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the confirmation dialog.

Monitor the migration task

After the task starts, view progress on the Data Migration page.

Full migration only: the task stops automatically when complete. The status changes to Completed.
With incremental migration: the task runs continuously and does not stop automatically. The status shows Running.

Before switching your business traffic to the destination, end or release the DTS instance, or revoke write permissions for the DTS database account. This prevents a resumed instance from overwriting destination data.

What's next

After migration is complete, verify data integrity between source and destination databases.
Switch your business connections to the PolarDB-X 2.0 instance.
Release or end the DTS instance to avoid unnecessary charges.