How to migrate data from an RDS for MySQL instance to a DataHub project - Data Transmission Service

Use Data Transmission Service (DTS) to stream MySQL change events into a DataHub project with near-zero downtime. This is the recommended approach for building real-time analytics pipelines or Change Data Capture (CDC) workflows on top of an ApsaraDB RDS for MySQL source.

Prerequisites

Before you begin, make sure you have:

An ApsaraDB RDS for MySQL instance. See Create an ApsaraDB RDS for MySQL instance and Overview of data migration scenarios for supported versions.
A DataHub project to receive the migrated data. See Get started with DataHub and Manage projects.

Migration types

DTS supports the following migration types for this scenario. Select the combination that matches your goal.

Type	What it does	When to use
Schema migration	Copies table schemas from source to destination	Required as a first step when using incremental migration
Incremental data migration (recommended)	Continuously replicates changes (`INSERT`, `UPDATE`, `DELETE`) after schema migration	CDC pipelines, near-zero-downtime cutover, ongoing replication

Full data migration is not available for the DataHub destination. Select schema migration, incremental data migration, or both.

Why use incremental data migration? It replicates changes in real time without interrupting the source application, making it the right choice for any workload that cannot tolerate downtime.

SQL operations supported for incremental migration

Operation type	SQL statements
DML	`INSERT`, `UPDATE`, `DELETE`
DDL	`ADD COLUMN`

Required permissions

Database	Required permission
Source ApsaraDB RDS for MySQL instance	Read access on objects to migrate

Billing

Migration type	Task configuration fee	Internet traffic fee
Schema migration	Free	Charged only when data leaves Alibaba Cloud over the Internet. See Billing overview.

Limitations

DTS does not migrate foreign keys. Cascade and delete operations defined in the source database are not replicated to the destination.

Source database requirements

The source server must have enough outbound bandwidth. Low bandwidth reduces migration speed.
Tables must have a PRIMARY KEY or UNIQUE constraint, and all fields must be unique. Without this, the destination may contain duplicate records.
If you select tables as objects to migrate and need to rename tables or columns in the destination, a single task supports up to 1,000 tables. For more than 1,000 tables, split them across multiple tasks or migrate the entire database instead.

For incremental data migration, enable binary logging in the ApsaraDB RDS console and set the following parameters. If either value is incorrect, the precheck fails and the task cannot start.

Parameter	Required value	Reference
`binlog_format`	`row`	Modify the parameters of an ApsaraDB RDS for MySQL instance
`binlog_row_image`	`full`	Same as above

Important

For self-managed MySQL deployed in a dual-primary cluster, set log_slave_updates to ON so DTS can capture all binary logs. See Create an account for a self-managed MySQL database and configure binary logging.

Binary log retention requirements — if DTS cannot read a required binary log, the task fails and data may be lost or inconsistent:

Migration scope	Minimum retention period
Incremental data migration only	More than 24 hours
Schema migration + incremental data migration	At least 7 days (reducible to 24 hours after schema migration completes)

See View and delete binary log files of an ApsaraDB RDS for MySQL instance.

During schema migration: Do not run DDL operations that change database or table schemas. Doing so causes the task to fail.

Other limitations

Only tables can be selected as objects to migrate.
Do not use tools such as pt-online-schema-change for DDL operations during migration — this may cause the task to fail.
If only DTS writes to the destination, Data Management (DMS) can perform online DDL on source tables. See Perform lock-free DDL operations.

Warning
If other tools also write to the destination, do not use DMS for online DDL. Data loss in the destination may occur.
DTS uses ROUND(COLUMN, PRECISION) to read FLOAT and DOUBLE columns. Default precision: 38 digits for FLOAT, 308 digits for DOUBLE. Verify that these defaults meet your requirements before starting.
DTS retries failed tasks for up to 7 days. Before switching your workload to the destination, stop or release the task — or run REVOKE to remove DTS write permissions. Otherwise, source data may overwrite the destination after a retry.

Special cases for self-managed MySQL sources

A primary/secondary failover during migration causes the task to fail.
If no DML operations occur on the source for an extended period, migration latency readings may be inaccurate. Run a DML operation on the source to refresh the latency value. If you select an entire database as the migration object, create a heartbeat table that receives data every second.
DTS periodically runs CREATE DATABASE IF NOT EXISTS `test` on the source to advance the binary log position.

Create a data migration task

Step 1: Go to the Data Migration Tasks page

Log on to the Data Management (DMS) console.
In the top navigation bar, click DTS.
In the left-side navigation pane, choose DTS (DTS) > Data Migration.

Console layout may vary. See Simple mode and Configure the DMS console based on your business requirements. Alternatively, go directly to the Data Migration Tasks page of the new DTS console.

Step 2: Select a region

From the drop-down list next to Data Migration Tasks, select the region where your data migration instance resides.

In the new DTS console, select the region in the upper-left corner.

Step 3: Configure source and destination databases

Click Create Task. Read the limits displayed at the top of the configuration page before proceeding.

Warning

Skipping the displayed limits may cause task failure or data inconsistency.

Source database parameters:

Parameter	Description
Task Name	DTS auto-assigns a name. Specify a descriptive name to identify the task easily. The name does not need to be unique.
Select an existing DMS database instance	Optional. If you select an existing instance, DTS applies its settings automatically.
Database Type	Select MySQL.
Connection Type	Select Alibaba Cloud Instance.
Instance Region	The region where the source ApsaraDB RDS for MySQL instance resides.
Cross-account	Select No for same-account migration.
RDS Instance ID	The ID of the source ApsaraDB RDS for MySQL instance.
Database Account	An account with read permissions on the objects to migrate.
Database Password	The password for the database account.
Encryption	Select Non-encrypted or SSL-encrypted. For SSL encryption, enable it on the RDS instance first. See Configure SSL encryption for an ApsaraDB RDS for MySQL instance.

Destination database parameters:

Parameter	Description
Select an existing DMS database instance	Optional. If you select an existing instance, DTS applies its settings automatically.
Database Type	Select DataHub.
Connection Type	Select Alibaba Cloud Instance.
Instance Region	The region where the DataHub project resides.
Project	The destination DataHub project.

Step 4: Test connectivity

Click Test Connectivity and Proceed.

DTS automatically adds its CIDR blocks to the IP address whitelist of Alibaba Cloud database instances or the security group rules of Elastic Compute Service (ECS) instances. For self-managed databases hosted in data centers or on third-party clouds, manually add the DTS CIDR blocks to the database IP address whitelist. See Add the CIDR blocks of DTS servers to the security settings of on-premises databases.

Warning

Adding DTS CIDR blocks to whitelists or security groups introduces security exposure. Before proceeding, take appropriate measures: strengthen account and password security, restrict exposed ports, authenticate API calls, audit the IP address whitelist and ECS security group rules regularly, and consider connecting the database to DTS through Express Connect, VPN Gateway, or Smart Access Gateway (SAG).

Step 5: Select migration objects

Configure the following settings:

Parameter	Description
Synchronization Type	Select Schema Migration and/or Incremental Data Migration. Full Data Migration is not available for DataHub. If you do not select incremental data migration, avoid writing to the source during migration to maintain data consistency.
Processing Mode of Conflicting Tables	Precheck and Report Errors: checks for tables with identical names in source and destination. The task cannot start if conflicts exist. Use object name mapping to resolve conflicts without deleting destination tables. Ignore Errors and Proceed: skips the conflict check. During incremental migration, conflicting records overwrite existing destination records. If schemas differ, the task may fail or migrate only partial columns.
Apply New Naming Rules of Additional Columns	DTS adds system columns to the destination DataHub topic. If these column names conflict with existing topic columns, the task fails. Select Yes or No based on your topic structure. Check for conflicts before setting this parameter. See Modify the naming rules for additional columns.
Source Objects	Select objects and click to move them to Selected Objects. Only databases can be selected as objects.
Selected Objects	Right-click an object to rename it, specify WHERE filter conditions, or select the DML/DDL operations to migrate. Click Batch Edit to rename multiple objects at once. Renaming an object may break dependent objects.

Warning

If you select Ignore Errors and Proceed, data consistency is not guaranteed and your data may be at risk.

Step 6: Configure advanced settings

Click Next: Advanced Settings and configure:

Parameter	Description
Set Alerts	Select Yes to receive notifications when the task fails or migration latency exceeds the threshold. Set the threshold and alert contacts. See Configure monitoring and alerting.
Capitalization of Object Names in Destination Instance	Controls the capitalization of database, table, and column names in the destination. Default: DTS default policy. See Specify the capitalization of object names in the destination instance.
Specify the retry time range for failed connections	The duration DTS retries failed connections after the task starts. Valid values: 10–1,440 minutes. Default: 720 minutes. Set this to more than 30 minutes. If DTS reconnects within this period, the task resumes. Otherwise, the task fails. If multiple tasks share the same source or destination, the shortest retry time range applies to all. DTS charges continue during retries.

Step 7: Run the precheck

Click Next: Save Task Settings and Precheck. To preview the OpenAPI parameters before proceeding, hover over the button and click Preview OpenAPI parameters.

DTS runs a precheck before starting the task. The task can start only after passing all checks.

If a check fails, click View Details next to the failed item, fix the issue, then click Precheck Again.
If a check returns an alert that can be ignored, click Confirm Alert Details > Ignore > OK, then click Precheck Again.

Ignoring alert items may result in data inconsistency.

Step 8: Purchase the instance

Wait until the success rate reaches 100%, then click Next: Purchase Instance. On the Purchase Instance page, configure:

Parameter	Description
Resource Group	The resource group for the migration instance. Default: default resource group. See What is Resource Management?.
Instance Class	Controls migration speed. Select based on your data volume and latency requirements. See Specifications of data migration instances.

Step 9: Start the task

Read and accept the Data Transmission Service (Pay-as-you-go) Service Terms, then click Buy and Start.

The task appears in the task list. Monitor migration progress there.

What's next

After the migration completes and you have verified data consistency, switch your application workload to the destination DataHub project. Stop or release the data migration task before the switch to prevent DTS from overwriting destination data if the task retries.