Use Data Transmission Service (DTS) to stream MySQL change events into a DataHub project with near-zero downtime. This is the recommended approach for building real-time analytics pipelines or Change Data Capture (CDC) workflows on top of an ApsaraDB RDS for MySQL source.
Prerequisites
Before you begin, make sure you have:
-
An ApsaraDB RDS for MySQL instance. See Create an ApsaraDB RDS for MySQL instance and Overview of data migration scenarios for supported versions.
-
A DataHub project to receive the migrated data. See Get started with DataHub and Manage projects.
Migration types
DTS supports the following migration types for this scenario. Select the combination that matches your goal.
| Type | What it does | When to use |
|---|---|---|
| Schema migration | Copies table schemas from source to destination | Required as a first step when using incremental migration |
| Incremental data migration (recommended) | Continuously replicates changes (INSERT, UPDATE, DELETE) after schema migration |
CDC pipelines, near-zero-downtime cutover, ongoing replication |
Full data migration is not available for the DataHub destination. Select schema migration, incremental data migration, or both.
Why use incremental data migration? It replicates changes in real time without interrupting the source application, making it the right choice for any workload that cannot tolerate downtime.
SQL operations supported for incremental migration
| Operation type | SQL statements |
|---|---|
| DML | INSERT, UPDATE, DELETE |
| DDL | ADD COLUMN |
Required permissions
| Database | Required permission |
|---|---|
| Source ApsaraDB RDS for MySQL instance | Read access on objects to migrate |
Billing
| Migration type | Task configuration fee | Internet traffic fee |
|---|---|---|
| Schema migration | Free | Charged only when data leaves Alibaba Cloud over the Internet. See Billing overview. |
Limitations
DTS does not migrate foreign keys. Cascade and delete operations defined in the source database are not replicated to the destination.
Source database requirements
-
The source server must have enough outbound bandwidth. Low bandwidth reduces migration speed.
-
Tables must have a
PRIMARY KEYorUNIQUEconstraint, and all fields must be unique. Without this, the destination may contain duplicate records. -
If you select tables as objects to migrate and need to rename tables or columns in the destination, a single task supports up to 1,000 tables. For more than 1,000 tables, split them across multiple tasks or migrate the entire database instead.
For incremental data migration, enable binary logging in the ApsaraDB RDS console and set the following parameters. If either value is incorrect, the precheck fails and the task cannot start.
| Parameter | Required value | Reference |
|---|---|---|
binlog_format |
row |
Modify the parameters of an ApsaraDB RDS for MySQL instance |
binlog_row_image |
full |
Same as above |
For self-managed MySQL deployed in a dual-primary cluster, set log_slave_updates to ON so DTS can capture all binary logs. See Create an account for a self-managed MySQL database and configure binary logging.
Binary log retention requirements — if DTS cannot read a required binary log, the task fails and data may be lost or inconsistent:
| Migration scope | Minimum retention period |
|---|---|
| Incremental data migration only | More than 24 hours |
| Schema migration + incremental data migration | At least 7 days (reducible to 24 hours after schema migration completes) |
See View and delete binary log files of an ApsaraDB RDS for MySQL instance.
During schema migration: Do not run DDL operations that change database or table schemas. Doing so causes the task to fail.
Other limitations
-
Only tables can be selected as objects to migrate.
-
Do not use tools such as
pt-online-schema-changefor DDL operations during migration — this may cause the task to fail. -
If only DTS writes to the destination, Data Management (DMS) can perform online DDL on source tables. See Perform lock-free DDL operations.
WarningIf other tools also write to the destination, do not use DMS for online DDL. Data loss in the destination may occur.
-
DTS uses
ROUND(COLUMN, PRECISION)to readFLOATandDOUBLEcolumns. Default precision: 38 digits forFLOAT, 308 digits forDOUBLE. Verify that these defaults meet your requirements before starting. -
DTS retries failed tasks for up to 7 days. Before switching your workload to the destination, stop or release the task — or run
REVOKEto remove DTS write permissions. Otherwise, source data may overwrite the destination after a retry.
Special cases for self-managed MySQL sources
-
A primary/secondary failover during migration causes the task to fail.
-
If no DML operations occur on the source for an extended period, migration latency readings may be inaccurate. Run a DML operation on the source to refresh the latency value. If you select an entire database as the migration object, create a heartbeat table that receives data every second.
-
DTS periodically runs
CREATE DATABASE IF NOT EXISTS `test`on the source to advance the binary log position.
Create a data migration task
Step 1: Go to the Data Migration Tasks page
-
Log on to the Data Management (DMS) console.
-
In the top navigation bar, click DTS.
-
In the left-side navigation pane, choose DTS (DTS) > Data Migration.
Console layout may vary. See Simple mode and Configure the DMS console based on your business requirements. Alternatively, go directly to the Data Migration Tasks page of the new DTS console.
Step 2: Select a region
From the drop-down list next to Data Migration Tasks, select the region where your data migration instance resides.
In the new DTS console, select the region in the upper-left corner.
Step 3: Configure source and destination databases
Click Create Task. Read the limits displayed at the top of the configuration page before proceeding.
Skipping the displayed limits may cause task failure or data inconsistency.
Source database parameters:
| Parameter | Description |
|---|---|
| Task Name | DTS auto-assigns a name. Specify a descriptive name to identify the task easily. The name does not need to be unique. |
| Select an existing DMS database instance | Optional. If you select an existing instance, DTS applies its settings automatically. |
| Database Type | Select MySQL. |
| Connection Type | Select Alibaba Cloud Instance. |
| Instance Region | The region where the source ApsaraDB RDS for MySQL instance resides. |
| Cross-account | Select No for same-account migration. |
| RDS Instance ID | The ID of the source ApsaraDB RDS for MySQL instance. |
| Database Account | An account with read permissions on the objects to migrate. |
| Database Password | The password for the database account. |
| Encryption | Select Non-encrypted or SSL-encrypted. For SSL encryption, enable it on the RDS instance first. See Configure SSL encryption for an ApsaraDB RDS for MySQL instance. |
Destination database parameters:
| Parameter | Description |
|---|---|
| Select an existing DMS database instance | Optional. If you select an existing instance, DTS applies its settings automatically. |
| Database Type | Select DataHub. |
| Connection Type | Select Alibaba Cloud Instance. |
| Instance Region | The region where the DataHub project resides. |
| Project | The destination DataHub project. |
Step 4: Test connectivity
Click Test Connectivity and Proceed.
DTS automatically adds its CIDR blocks to the IP address whitelist of Alibaba Cloud database instances or the security group rules of Elastic Compute Service (ECS) instances. For self-managed databases hosted in data centers or on third-party clouds, manually add the DTS CIDR blocks to the database IP address whitelist. See Add the CIDR blocks of DTS servers to the security settings of on-premises databases.
Adding DTS CIDR blocks to whitelists or security groups introduces security exposure. Before proceeding, take appropriate measures: strengthen account and password security, restrict exposed ports, authenticate API calls, audit the IP address whitelist and ECS security group rules regularly, and consider connecting the database to DTS through Express Connect, VPN Gateway, or Smart Access Gateway (SAG).
Step 5: Select migration objects
Configure the following settings:
| Parameter | Description |
|---|---|
| Synchronization Type | Select Schema Migration and/or Incremental Data Migration. Full Data Migration is not available for DataHub. If you do not select incremental data migration, avoid writing to the source during migration to maintain data consistency. |
| Processing Mode of Conflicting Tables | Precheck and Report Errors: checks for tables with identical names in source and destination. The task cannot start if conflicts exist. Use object name mapping to resolve conflicts without deleting destination tables. Ignore Errors and Proceed: skips the conflict check. During incremental migration, conflicting records overwrite existing destination records. If schemas differ, the task may fail or migrate only partial columns. |
| Apply New Naming Rules of Additional Columns | DTS adds system columns to the destination DataHub topic. If these column names conflict with existing topic columns, the task fails. Select Yes or No based on your topic structure. Check for conflicts before setting this parameter. See Modify the naming rules for additional columns. |
| Source Objects | Select objects and click |
| Selected Objects | Right-click an object to rename it, specify WHERE filter conditions, or select the DML/DDL operations to migrate. Click Batch Edit to rename multiple objects at once. Renaming an object may break dependent objects. |
If you select Ignore Errors and Proceed, data consistency is not guaranteed and your data may be at risk.
Step 6: Configure advanced settings
Click Next: Advanced Settings and configure:
| Parameter | Description |
|---|---|
| Set Alerts | Select Yes to receive notifications when the task fails or migration latency exceeds the threshold. Set the threshold and alert contacts. See Configure monitoring and alerting. |
| Capitalization of Object Names in Destination Instance | Controls the capitalization of database, table, and column names in the destination. Default: DTS default policy. See Specify the capitalization of object names in the destination instance. |
| Specify the retry time range for failed connections | The duration DTS retries failed connections after the task starts. Valid values: 10–1,440 minutes. Default: 720 minutes. Set this to more than 30 minutes. If DTS reconnects within this period, the task resumes. Otherwise, the task fails. If multiple tasks share the same source or destination, the shortest retry time range applies to all. DTS charges continue during retries. |
Step 7: Run the precheck
Click Next: Save Task Settings and Precheck. To preview the OpenAPI parameters before proceeding, hover over the button and click Preview OpenAPI parameters.
DTS runs a precheck before starting the task. The task can start only after passing all checks.
-
If a check fails, click View Details next to the failed item, fix the issue, then click Precheck Again.
-
If a check returns an alert that can be ignored, click Confirm Alert Details > Ignore > OK, then click Precheck Again.
Ignoring alert items may result in data inconsistency.
Step 8: Purchase the instance
Wait until the success rate reaches 100%, then click Next: Purchase Instance. On the Purchase Instance page, configure:
| Parameter | Description |
|---|---|
| Resource Group | The resource group for the migration instance. Default: default resource group. See What is Resource Management?. |
| Instance Class | Controls migration speed. Select based on your data volume and latency requirements. See Specifications of data migration instances. |
Step 9: Start the task
Read and accept the Data Transmission Service (Pay-as-you-go) Service Terms, then click Buy and Start.
The task appears in the task list. Monitor migration progress there.
What's next
After the migration completes and you have verified data consistency, switch your application workload to the destination DataHub project. Stop or release the data migration task before the switch to prevent DTS from overwriting destination data if the task retries.