Sync PolarDB MySQL to AnalyticDB PostgreSQL for Real-Time Analytics - DTS

Use Data Transmission Service (DTS) to replicate data from a PolarDB for MySQL cluster into an AnalyticDB for PostgreSQL instance for analytics workloads. DTS handles schema creation, full data load, and ongoing incremental replication in a single task.

Prerequisites

Before you begin, ensure that you have:

A destination AnalyticDB for PostgreSQL instance with available storage space larger than the storage space used by the source PolarDB for MySQL cluster. See Create an instance.
A database created in the destination AnalyticDB for PostgreSQL instance to store the synchronized data. See CREATE DATABASE.
Binary logging enabled on the source PolarDB for MySQL cluster, with loose_polar_log_bin set to ON. See Enable binary logging and Set cluster and node parameters.
Binary log retention set to at least 3 days on the source cluster (7 days recommended). See Modify the retention period.
Database accounts with the required permissions. See Permissions required for database accounts.

Enabling binary logging on a PolarDB for MySQL cluster consumes storage space and incurs fees.

Billing

Synchronization type	Pricing
Schema synchronization and full data synchronization	Free of charge
Incremental data synchronization	Charged. See Billing overview.

Permissions required for database accounts

Database	Required permissions	How to create and authorize
Source PolarDB for MySQL cluster	Read and write permissions for the objects to sync	Create and manage database accounts and manage database account passwords
Destination AnalyticDB for PostgreSQL instance	Read and write permissions for the destination database	Create and manage users and manage user permissions. You can also use the initial account or an account with RDS_SUPERUSER permissions.

Supported SQL operations for incremental synchronization

Operation type	SQL statements
DML	INSERT, UPDATE, DELETE. DTS converts UPDATE statements to REPLACE INTO when writing to the destination. For UPDATE operations on primary keys, DTS converts them to DELETE and INSERT statements.
DDL	CREATE TABLE, ALTER TABLE, TRUNCATE TABLE, DROP TABLE

Limitations

Source database

Tables to synchronize must have a primary key or a UNIQUE constraint with unique fields. Without this, duplicate data may appear in the destination database.
For table-level synchronization where you need to edit the tables (such as mapping column names), a single task supports a maximum of 1,000 tables. If you exceed this limit, split the tables across multiple tasks or configure a task to synchronize the entire database.
Binary log retention must be at least 3 days. If DTS cannot obtain the binary logs, the task may fail—in extreme cases causing data inconsistency or data loss. Issues from a retention period shorter than required are not covered by the DTS Service-Level Agreement (SLA).
Do not run DDL operations that modify primary keys or add table comments (for example, ALTER TABLE table_name COMMENT='Table comment';) during synchronization. These operations will fail.
Do not run DDL operations that change database or table schemas during schema synchronization or full synchronization. The task will fail. During full synchronization, DTS queries the source database, creating metadata locks that can block DDL operations on the source.
If the source database contains date values of 0000-00-00 00:00:00, the task may fail. DTS converts these values to null in the destination. As a workaround, change the source values to 0001-01-01 00:00:00 or configure the destination field to allow null values.

Other limitations

Only table-level synchronization is supported.
The following data types are not supported: BIT, VARBIT, GEOMETRY, ARRAY, UUID, TSQUERY, TSVECTOR, TXID_SNAPSHOT.
Prefix indexes are not supported. If the source has prefix indexes, synchronization may fail.
Synchronization from a read-only node of the source PolarDB for MySQL cluster is not supported.
OSS foreign tables cannot be synchronized from the source.
The following object types are not supported: PARTITION, VIEW, PROCEDURE, FUNCTION, TRIGGER, FK, INDEX.
Append-Only (AO) tables are not supported as destination tables.
Primary/secondary failover is not supported during initial full data synchronization. If a failover occurs, reconfigure the task.
Online DDL tools: pt-online-schema-change is not supported and will cause the task to fail. For online DDL in temporary table mode or function-based indexes on unique key columns, data loss or task failure may occur.
If you use column mapping for non-full table synchronization, or if the source and destination table schemas are inconsistent, data in columns missing from the destination table will be lost.
If the table to synchronize has a primary key, the primary key column in the destination table must match the source table. If the table lacks a primary key, the primary key column in the destination table must match the distribution key.
The unique key in the destination table—including the primary key column—must include all columns in the distribution key.
Initial full data synchronization runs concurrent INSERT operations, which causes fragmentation. As a result, the tablespace of the destination instance may be larger than that of the source after full synchronization completes.
Schedule synchronization during off-peak hours. Full data synchronization consumes read and write resources on both databases and may increase the database load.
For table-level synchronization, do not write data other than the data from DTS to the destination database. This may cause data inconsistency between source and destination.
For table-level synchronization, do not use pt-online-schema-change for online DDL operations on synchronization objects in the source database.
For table-level synchronization, if no data other than data from DTS is written to the destination database, you can use Data Management (DMS) for online DDL operations. See Change schemas without locking tables.
If a task fails, DTS support staff will attempt to restore it within 8 hours. They may restart the task or adjust DTS task parameters (not database parameters). Parameters that may be adjusted are listed in Modify instance parameters.

Conflict resolution behavior

Phase	When primary or unique key conflicts	Result
Full data synchronization	Schemas consistent: record exists in destination with same primary/unique key	DTS retains the destination record and skips the source record
Incremental synchronization	Schemas consistent: record exists in destination with same primary/unique key	DTS overwrites the destination record with the source record
Either phase	Schemas inconsistent	Data initialization may fail, resulting in partial or complete synchronization failure

Usage notes

DTS periodically runs CREATE DATABASE IF NOT EXISTS \test\`` on the source database to advance the binary log offset.
When using DMS or gh-ost for online DDL changes, configure the Copy the temporary table of the Online DDL tool advanced setting accordingly.

Create a synchronization task

The task configuration has eight steps: open the task page, connect source and destination databases, select synchronization objects, configure advanced settings, optionally set up data verification, optionally configure destination table fields, run the precheck, and purchase the instance.

Step 1: Open the task configuration page

Log on to the Data Management (DMS) console.
In the top navigation bar, click Data + AI.
In the left-side navigation pane, choose DTS (DTS) > Data Synchronization.
The navigation path may vary based on the DMS console mode and layout. See Simple mode console and Customize the layout and style of the DMS console. You can also go directly to the Data Synchronization Tasks page of the DTS console.
Click Create Task.

Step 2: Configure source and destination databases

On the task configuration page, fill in the following fields:

Category	Field	Description
N/A	Task Name	DTS generates a name automatically. Specify a descriptive name for easy identification. The name does not need to be unique.
Source Database	Select Existing Connection	Select a registered database instance from the drop-down list to auto-populate the fields below, or configure the fields manually. In the DMS console, this field is labeled Select a DMS database instance.
	Database Type	Select PolarDB for MySQL.
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	Select the region where the source PolarDB for MySQL cluster resides.
	Replicate Data Across Alibaba Cloud Accounts	Select No if the source instance belongs to the current Alibaba Cloud account.
	PolarDB Cluster ID	Select the ID of the source PolarDB for MySQL cluster.
	Database Account	Enter the database account for the source cluster. See Permissions required for database accounts.
	Database Password	Enter the password for the database account.
Destination Database	Select Existing Connection	Select a registered database instance from the drop-down list to auto-populate the fields below, or configure the fields manually. In the DMS console, this field is labeled Select a DMS database instance.
	Database Type	Select AnalyticDB for PostgreSQL.
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	Select the region where the destination AnalyticDB for PostgreSQL instance resides.
	Instance ID	Select the ID of the destination AnalyticDB for PostgreSQL instance.
	Database Name	Enter the name of the destination database to receive the synchronized data.
	Database Account	Enter the database account for the destination instance. See Permissions required for database accounts.
	Database Password	Enter the password for the database account.

After filling in all fields, click Test Connectivity and Proceed.

Make sure that the DTS IP address blocks have been added to the security settings of both the source and destination databases—either automatically by DTS or manually. See Add the IP address whitelist of DTS servers.

Step 3: Configure synchronization objects

On the Configure Objects page, configure the following settings:

Synchronization types

DTS always selects Incremental Data Synchronization. Also select Schema Synchronization and Full Data Synchronization (selected by default). After the precheck, DTS initializes the destination instance with the full data of the selected source objects, which serves as the baseline for subsequent incremental synchronization. With all three types enabled, DTS:

Copies the schema to the destination.
Loads a full snapshot of the selected source objects as the baseline.
Applies ongoing incremental changes from the binary log.

Objects and operations

Field	Description
DDL and DML Operations to Be Synchronized	Select the DDL or DML operations to synchronize at the instance level. To select operations for a specific database or table, right-click the object in the Selected Objects box. See Supported SQL operations for incremental synchronization.
Processing Mode of Conflicting Tables	Precheck and Report Errors: Checks for tables with the same names in the destination. If any are found, the precheck reports an error and the task does not start. If you cannot delete or rename the conflicting table, use object name mapping to rename it in the destination. Ignore Errors and Proceed: Skips the duplicate-table check. Use with caution—this may cause data inconsistency. See Conflict resolution behavior for details.
Storage Engine Type	Select the storage engine for destination tables. The default is Beam. This field is available only if the destination AnalyticDB for PostgreSQL kernel version is v7.0.6.6 or later and Schema Synchronization is selected.
Capitalization of Object Names in Destination Instance	Configure the case policy for database, table, and column names in the destination. The default is DTS default policy. See Case policy for destination object names.
Source Objects	Click objects in the Source Objects box, then click the right arrow to move them to the Selected Objects box. Select databases or tables as synchronization objects.
Selected Objects	To rename a single object, right-click it in the Selected Objects box. See Map a single object name. To rename multiple objects in bulk, click Batch Edit in the upper-right corner. See Map multiple object names in bulk. To filter rows in a table, right-click the table and configure filter conditions. See Set filter conditions. If you use object name mapping, synchronization of dependent objects may fail.

Step 4: Configure advanced settings

Click Next: Advanced Settings and configure the following options:

Field	Description
Dedicated Cluster for Task Scheduling	By default, DTS uses a shared cluster. To improve task stability, purchase a dedicated cluster. See What is a DTS dedicated cluster?.
Copy the temporary table of the Online DDL tool that is generated in the source table to the destination database.	If the source uses DMS or gh-ost for online DDL changes, select how DTS handles the temporary tables generated. Yesalert notifications: Synchronizes the temporary tables. Large temporary tables may cause synchronization latency. No, Adapt to DMS Online DDL: Does not synchronize temporary tables. Synchronizes only the original DDL statements from DMS. This causes table locks on the destination. No, Adapt to gh-ost: Does not synchronize temporary tables. Synchronizes only the original DDL statements from gh-ost. This causes table locks on the destination. Note: DTS does not support pt-online-schema-change for online DDL changes; using it will cause the task to fail.
Retry Time for Failed Connections	If the connection to the source or destination fails, DTS retries immediately. Default: 720 minutes. Range: 10–1,440 minutes. Set to 30 minutes or more. If multiple DTS instances share a source or destination, DTS applies the shortest configured retry duration across all instances. DTS charges for task runtime during retries.
Retry Time for Other Issues	If a non-connection issue occurs (for example, a DDL or DML execution error), DTS retries immediately. Default: 10 minutes. Range: 1–1,440 minutes. Set to 10 minutes or more. This value must be less than Retry Time for Failed Connections.
Enable Throttling for Full Data Synchronization	Limit the full synchronization rate to reduce load on the databases. Set Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected. You can also adjust the rate while the task is running.
Enable Throttling for Incremental Data Synchronization	Limit the incremental synchronization rate by setting RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Environment Tag	Select a tag to identify the instance based on your business needs.
Whether to delete SQL operations on heartbeat tables of forward and reverse tasks	Yes: DTS does not write heartbeat SQL to the source database. The DTS instance may show latency. No: DTS writes heartbeat SQL to the source database. This may interfere with operations such as physical backups and cloning.
Configure ETL	Enable the extract, transform, and load (ETL) feature to transform data in flight. Yes: Enter data processing statements in the code editor. See Configure ETL in a data migration or data synchronization task. No: Disable ETL. See What is ETL?.
Monitoring and Alerting	Yes: Configure alerts for task failures or latency exceeding a threshold. Set the alert threshold and notification contacts. See Configure monitoring and alerting during task configuration. No: No alerts.

Step 5: Configure data verification (optional)

Click Data Verification to set up a data verification task. See Configure data verification.

Step 6: Configure destination table fields (optional)

Click Next: Configure Database and Table Fields to set the Type, Primary Key Column, and Distribution Key for each destination table in AnalyticDB for PostgreSQL.

This step is available only when Schema Synchronization is selected. Set Definition Status to All to modify the fields. To form a composite primary key, specify multiple columns in Primary Key Column, then select one or more of those columns as the Distribution Key. See Manage tables and Define table distribution.

Step 7: Run the precheck

Click Next: Save Task Settings and Precheck.

To preview the API parameters for this task configuration, hover over the Next: Save Task Settings and Precheck button and click Preview OpenAPI parameters.

Before a synchronization task starts, DTS performs a precheck. The task starts only if the precheck passes.

If the precheck fails, click View Details next to the failed item, fix the issue, and rerun the precheck.
If the precheck returns warnings:
- For non-ignorable warnings, click View Details, fix the issue, and rerun the precheck.
- For ignorable warnings, click Confirm Alert Details, then Ignore, then OK, then Precheck Again to proceed. Ignoring warnings may lead to data inconsistencies. Proceed with caution.

Step 8: Purchase and start the instance

When the Success Rate reaches 100%, click Next: Purchase Instance.

On the Purchase page, select the billing method and instance class:

Category	Parameter	Description
New Instance Class	Billing Method	Subscription: Pay upfront for a set duration. Cost-effective for long-term, continuous tasks. Monthly options: 1–9 months. Yearly options: 1, 2, 3, or 5 years. Pay-as-you-go: Billed hourly for actual usage. Suitable for short-term or test tasks.
Resource Group Settings	—	The resource group for the instance. Default: default resource group. See What is Resource Management?.
Instance Class	—	Select a specification based on your required synchronization rate. See Data synchronization link specifications.

Read and select Data Transmission Service (Pay-as-you-go) Service Terms, then click Buy and Start and confirm in the dialog box.

Monitor the task progress on the Data Synchronization Tasks page.Data Synchronization Tasks page of the new DTS console

What's next

To verify that data is synchronized correctly, configure a data verification task. See Configure data verification.
To modify synchronization objects or task parameters after the task starts, see Modify instance parameters.