Use Data Transmission Service to synchronize data from PolarDB for MySQL to RDS for MySQL - Data Transmission Service

Data Transmission Service (DTS) lets you synchronize data continuously from a PolarDB for MySQL cluster to an ApsaraDB RDS for MySQL instance or a self-managed MySQL database. This covers schema synchronization, full data synchronization, and incremental data synchronization.

Supported destination databases

The following destination database types are supported. This topic uses an ApsaraDB RDS for MySQL instance as the destination. The same procedure applies to other destination types listed below.

ApsaraDB RDS for MySQL instance
Self-managed database hosted on Elastic Compute Service (ECS)
Self-managed database connected over Express Connect, VPN Gateway, or Smart Access Gateway
Self-managed database connected over Database Gateway
Self-managed database connected over Cloud Enterprise Network (CEN)

Potential impacts

Before you start a synchronization task, be aware of the following impacts on your source and destination databases:

Full data synchronization reads all data from the source database once, which increases load on the source. Schedule full data synchronization during off-peak hours to reduce the impact on production workloads.
Concurrent INSERT operations during initial full data synchronization cause table fragmentation in the destination database. After the initial synchronization completes, the used tablespace in the destination is larger than in the source.
DTS adds the CIDR blocks of DTS servers to the whitelist of Alibaba Cloud database instances automatically. For self-managed databases, add the DTS CIDR blocks manually before starting the task.

Warning

Adding DTS CIDR blocks to your database whitelist or security group rules introduces security exposure. Take preventive measures, including but not limited to: strengthening credential security, restricting exposed ports, authenticating API calls, and regularly reviewing whitelist rules. Alternatively, connect the database to DTS over Express Connect, VPN Gateway, or Smart Access Gateway.

Prerequisites

Before you begin, make sure that:

A PolarDB for MySQL cluster is created. For more information, see Purchase a subscription cluster and Purchase a pay-as-you-go cluster.
The destination ApsaraDB RDS for MySQL instance is created. For more information, see Create an ApsaraDB RDS for MySQL instance.
The available storage space of the destination instance is larger than the total data size of the source cluster.

Billing

Synchronization type	Fee
Schema synchronization and full data synchronization	Free of charge
Incremental data synchronization	Charged. For more information, see Billing overview.

Supported SQL operations

Operation type	SQL statements
DML	INSERT, UPDATE, DELETE
DDL	ALTER TABLE, ALTER VIEW, CREATE FUNCTION, CREATE INDEX, CREATE PROCEDURE, CREATE TABLE, CREATE VIEW, DROP INDEX, DROP TABLE, RENAME TABLE, TRUNCATE TABLE

Important

The RENAME TABLE operation may cause data inconsistency. If you select a table as the synchronization object and rename it during synchronization, that table's data stops syncing to the destination. To avoid this, select the database (not individual tables) as the synchronization object and make sure both the source and destination databases of the RENAME TABLE operation are included in the selected objects.

Limitations

Source database requirements

Tables must have a PRIMARY KEY or UNIQUE constraint with all unique fields. Otherwise, duplicate records may appear in the destination. If tables have neither, enable the Exactly-Once write feature when configuring a two-way synchronization instance. For more information, see Synchronize tables without primary keys or UNIQUE constraints.
If you select tables as synchronization objects and need to rename tables or columns in the destination, a single task supports up to 1,000 tables. For more than 1,000 tables, configure multiple tasks or select the entire database as the synchronization object.
For incremental data synchronization, binary logging must be enabled and the loose_polar_log_bin parameter must be set to on. For more information, see Enable binary logging and Modify parameters.
Do not execute DDL statements that change database or table schemas during schema synchronization or full data synchronization. Doing so causes the task to fail.

Enabling binary logging on a PolarDB for MySQL cluster incurs storage charges for binary log files. Set the binary log retention period based on your task type: at least 24 hours for incremental-only synchronization, or at least 7 days for full and incremental synchronization. After full data synchronization completes, you can adjust the retention period to more than 24 hours. Insufficient retention periods may cause DTS to fail to obtain binary logs, which can result in task failure. In exceptional circumstances, data inconsistency or loss may occur. Make sure that you configure the retention period of binary logs based on the preceding requirements. Otherwise, the service reliability or performance stated in the Service Level Agreement (SLA) of DTS may not be guaranteed.

Synchronization object limits

DTS does not synchronize read-only nodes of the source PolarDB for MySQL cluster.
DTS does not synchronize Object Storage Service (OSS) external tables from the source cluster.
If synchronization objects include data with four-byte characters (such as rare characters or emojis), the destination tables must use the UTF8mb4 character set. If you use schema synchronization, set the character_set_server parameter in the destination database to UTF8mb4.
Column names in MySQL databases are not case-sensitive. If multiple columns in the source database have names that differ only in capitalization, their data is written to the same column in the destination, which can produce unexpected results.

DDL operation limits

Do not use pt-online-schema-change to perform DDL operations on source tables during synchronization. Doing so causes the task to fail.
If no external data sources write to the destination database during synchronization, use Data Management (DMS) to perform online DDL operations on source tables. For more information, see Perform lock-free DDL operations.
If external data sources write to the destination database during synchronization and you also run online DDL statements, data loss may occur in the destination.
If DDL statements fail to execute in the destination, the task continues running. Review failed DDL statements in the task logs. For more information, see View task logs.

Two-way synchronization rules

A two-way synchronization instance contains a forward task and a reverse task. When an object is included in both tasks:

Only one task can synchronize both full data and incremental data. The other task synchronizes incremental data only.
Data synced by the current task is not used as source data for the other task.

Other notes

After synchronization completes (the Status changes to Completed), run analyze table <Table name> to verify data integrity. In cases such as a high-availability (HA) switchover in the source database, data may have been written only to memory, causing data loss.
If a DTS task fails, DTS technical support attempts to restore the task within 8 hours. During restoration, the task may be restarted and task parameters may be modified. Database parameters are not modified.

During schema synchronization, DTS synchronizes foreign keys from the source database to the destination. During full data synchronization and incremental data synchronization, DTS temporarily disables foreign key constraint checks and cascade operations at the session level. Performing cascade update or delete operations on the source during synchronization may cause data inconsistency.

DTS executes CREATE DATABASE IF NOT EXISTS \test\`` on the source database on a scheduled basis to advance the binary log file position.

Configure a data synchronization task

Step 1: Go to the Data Synchronization Tasks page

Log on to the Data Management (DMS) console.
In the top navigation bar, click Data + AI.
In the left-side navigation pane, choose DTS (DTS) > Data Synchronization.

Operations may vary based on the DMS console mode and layout. For more information, see Simple mode and Customize the layout and style of the DMS console. You can also go directly to the Data Synchronization Tasks page of the new DTS console.

Step 2: Select the region

On the right side of Data Synchronization Tasks, select the region where the data synchronization instance resides.

In the new DTS console, select the region in the top navigation bar.

Step 3: Create a task

Click Create Task to open the task configuration page.

Optional: Click New Configuration Page in the upper-right corner.

Skip this step if Back to Previous Version is displayed in the upper-right corner. The new configuration page is recommended, as some parameters differ between versions.

Step 4: Configure source and destination databases

Warning

After configuring the source and destination databases, read the Limits displayed on the page before proceeding. Skipping this step may cause the task to fail or result in data inconsistency.

Configure the following parameters:

Source database

Parameter	Description
Task Name	The name of the DTS task. DTS generates a name automatically. Specify a descriptive name for easy identification. The name does not need to be unique.
Select a DMS database instance.	Select an existing database instance to auto-populate parameters, or configure the database manually. To register a database, click Add DMS Database Instance in the DMS console, or register on the Database Connections page in the DTS console. For more information, see Register an Alibaba Cloud database instance, Register a database hosted on a third-party cloud service or a self-managed database, and Manage database connections.
Database Type	Select PolarDB for MySQL.
Access Method	Select Alibaba Cloud Instance.
Instance Region	The region where the source PolarDB for MySQL cluster resides.
Replicate Data Across Alibaba Cloud Accounts	Select No if using a database in the current Alibaba Cloud account.
PolarDB Cluster ID	The ID of the source PolarDB for MySQL cluster.
Database Account	The account for the source cluster. The account must have read permissions on the objects to be synchronized.
Database Password	The password for the database account.
Encryption	Whether to encrypt the connection. For more information about SSL encryption, see Configure SSL encryption.

Destination database

Parameter	Description
Select a DMS database instance.	Select an existing database instance to auto-populate parameters, or configure the database manually. To register a database, click Add DMS Database Instance in the DMS console, or register on the Database Connections page in the DTS console. For more information, see Register an Alibaba Cloud database instance, Register a database hosted on a third-party cloud service or a self-managed database, and Manage database connections.
Database Type	Select MySQL.
Access Method	Select Alibaba Cloud Instance.
Instance Region	The region where the destination ApsaraDB RDS for MySQL instance resides.
Replicate Data Across Alibaba Cloud Accounts	Select No if using a database in the current Alibaba Cloud account.
RDS Instance ID	The ID of the destination ApsaraDB RDS for MySQL instance.
Database Account	The account for the destination instance. The account must have read and write permissions on the destination database.
Database Password	The password for the database account.
Encryption	Select Non-encrypted or SSL-encrypted. To use SSL encryption, enable it on the ApsaraDB RDS for MySQL instance before configuring the DTS task. For more information, see Use a cloud certificate to enable SSL encryption.

Step 5: Test connectivity and configure objects

Click Test Connectivity and Proceed at the bottom of the page.

DTS automatically adds its server CIDR blocks to the whitelist of Alibaba Cloud database instances. For ECS-hosted self-managed databases, DTS also adds CIDR blocks to the ECS security group rules—make sure the ECS instance can reach the database, and manually add CIDR blocks to each ECS instance if the database spans multiple instances. For on-premises or third-party cloud databases, manually add DTS CIDR blocks to the database whitelist. For more information, see the CIDR blocks of DTS servers section.

After connectivity passes, configure the objects to synchronize:

Parameter	Description
Synchronization Types	Select Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. After the precheck, DTS synchronizes historical data from the source to the destination as the baseline for incremental synchronization.
Method to Migrate Triggers in Source Database	(Available only when Schema Synchronization is selected) The method to synchronize triggers. Configure based on your requirements. For more information, see Synchronize or migrate triggers from the source database.
Synchronization Topology	Select One-way Synchronization.
Processing Mode of Conflicting Tables	Precheck and Report Errors: checks whether the destination has tables with the same names as the source. If identical names exist, an error is returned and the task cannot start. Use object name mapping to rename conflicting tables if they cannot be deleted or renamed. For more information, see Map object names. Ignore Errors and Proceed: skips the name conflict check. Warning This option may cause data inconsistency. During full data synchronization, records with matching primary or unique keys in the destination are retained. During incremental synchronization, they are overwritten. If schemas differ, initialization may partially fail or the task may fail.
Capitalization of Object Names in Destination Instance	The case policy for database, table, and column names in the destination. Defaults to DTS default policy. For more information, see Specify the capitalization of object names in the destination instance.
Source Objects	Select objects from the Source Objects section and click to add them to Selected Objects. You can select columns, tables, or databases. Selecting tables or columns excludes views, triggers, and stored procedures from synchronization.
Selected Objects	Right-click an object to rename it or set filter conditions. Click Batch Edit to rename multiple objects at once. For more information, see Map object names and Specify filter conditions. Renaming an object with object name mapping may cause dependent objects to fail synchronization.

Step 6: Configure advanced settings

Click Next: Advanced Settings and configure the following:

Parameter	Description
Dedicated Cluster for Task Scheduling	By default, DTS schedules tasks to the shared cluster. For improved task stability, purchase a dedicated cluster. For more information, see What is a DTS dedicated cluster.
Copy the temporary table of the Online DDL tool that is generated in the source table to the destination database.	Controls how DTS handles temporary tables from online DDL operations (DMS or gh-ost). Important pt-online-schema-change is not supported. Yes: synchronizes temporary tables from online DDL operations. This may extend the synchronization duration if online DDL generates large amounts of data. No, Adapt to DMS Online DDL: skips temporary tables; only syncs the original DDL from DMS. Tables in the destination may be locked. No, Adapt to gh-ost: skips temporary tables; only syncs original DDL from gh-ost. You can use default or custom regular expressions to filter shadow tables of the gh-ost tool and tables that are not required. Tables in the destination may be locked.
Retry Time for Failed Connections	How long DTS retries failed connections after the task starts. Valid values: 10–1440 minutes. Default: 720 minutes. Set to more than 30 minutes. If DTS reconnects within the retry window, the task resumes; otherwise, the task fails. When multiple tasks share the same source or destination database, the shortest retry time takes precedence. Note: DTS instance charges apply during retries.
Retry Time for Other Issues	How long DTS retries failed DDL or DML operations. Valid values: 1–1440 minutes. Default: 10 minutes. Set to more than 10 minutes. Must be less than the Retry Time for Failed Connections value.
Enable Throttling for Full Data Migration	(Available only when Full Data Synchronization is selected) Throttles full synchronization by configuring Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) to reduce load on the destination.
Enable Throttling for Incremental Data Synchronization	Throttles incremental synchronization by configuring RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Environment Tag	An optional tag to identify the DTS instance.
Whether to delete SQL operations on heartbeat tables of forward and reverse tasks	Yes: does not write heartbeat SQL operations to the source, but a latency may be displayed for the instance. No: writes heartbeat SQL operations to the source, which may affect physical backup and cloning of the source database.
Configure ETL	Whether to enable extract, transform, and load (ETL). Yes: enter data processing statements in the code editor. For more information, see Configure ETL in a data migration or data synchronization task. No: ETL is not enabled. For more information, see What is ETL?
Monitoring and Alerting	No: alerting is not enabled. Yes: configure alert thresholds and notification settings. Alerts are sent when the task fails or synchronization latency exceeds the threshold. For more information, see the Configure monitoring and alerting when you create a DTS task section.

Step 7: Configure data verification (optional)

Click Next Step: Data Verification to set up data verification. For more information, see Configure a data verification task.

Step 8: Run the precheck

Click Next: Save Task Settings and Precheck.

To preview the API parameters for this task configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.

The task must pass the precheck before it can start.

If the precheck fails, click View Details next to each failed item, resolve the issues, and rerun the precheck.

If an alert is triggered: for alerts that cannot be ignored, resolve the issue and rerun the precheck. For alerts that can be ignored, click Confirm Alert Details, then click Ignore in the dialog box, confirm, and click Precheck Again.

Step 9: Purchase a data synchronization instance

Wait until the Success Rate reaches 100%, then click Next: Purchase Instance.

On the buy page, configure the following parameters:

Section	Parameter	Description
New Instance Class	Billing Method	Subscription: pay upfront; more cost-effective for long-term use. Pay-as-you-go: billed hourly; suitable for short-term use. Release the instance when it's no longer needed to stop charges.
	Resource Group Settings	The resource group for the synchronization instance. Default: default resource group. For more information, see What is Resource Management?
	Instance Class	DTS provides instance classes at different synchronization speeds. Select based on your requirements. For more information, see Instance classes of data synchronization instances.
	Subscription Duration	(Available only for the Subscription billing method) The subscription duration: 1–9 months, 1 year, 2 years, 3 years, or 5 years.

Read and select Data Transmission Service (Pay-as-you-go) Service Terms, then click Buy and Start. In the confirmation dialog box, click OK.

The task appears in the task list. You can monitor its progress there.