Use Data Transmission Service to synchronize data from PolarDB for MySQL to MaxCompute - Data Transmission Service

Use Data Transmission Service (DTS) to replicate data from a PolarDB for MySQL cluster to a MaxCompute project. DTS handles schema synchronization, initial full data load, and ongoing incremental change capture in a single managed task.

Before you begin

Before you begin, ensure that you have:

A MaxCompute project. See Activate MaxCompute and DataWorks and Create a MaxCompute project.
The MaxCompute whitelist configured to allow DTS access. See Configure a whitelist for Alibaba Cloud services to access MaxCompute.
An AccessKey pair for the Alibaba Cloud account that owns the destination MaxCompute project. See Create an AccessKey pair.

As an alternative to using the Alibaba Cloud account's AccessKey pair directly, create a RAM user and set the RAM user as the super administrator for the MaxCompute project.

Billing

Synchronization type	Fee
Schema synchronization and full data synchronization	Free
Incremental data synchronization	Charged. See Billing overview.

How synchronization works

DTS runs the synchronization in three sequential phases:

Schema synchronization — DTS replicates table schemas from PolarDB for MySQL to MaxCompute. Each source table receives a _base suffix in MaxCompute (for example, customer becomes customer_base).
Full data synchronization — DTS copies all existing rows from the source table to the corresponding _base table. This full baseline table is the starting point for incremental sync.
Incremental data synchronization — DTS creates a _log table for each source table (for example, customer_log) and continuously writes change records from the source binary log into that table.

For the structure of _log tables, see Structure of incremental data tables.

Supported synchronization topologies

One-way one-to-one synchronization
One-way one-to-many synchronization
One-way cascade synchronization
One-way many-to-one synchronization

For the full list, see Synchronization topologies.

SQL operations that can be synchronized

Operation type	SQL statements
DML	INSERT, UPDATE, DELETE
DDL	ADD COLUMN (ADD COLUMN with attribute columns cannot be synchronized)

DTS does not synchronize foreign keys. Cascade and delete operations on the source are not replicated to the destination.

Limitations

Source Database

Limitation	Details
Primary or unique key required	Tables must have PRIMARY KEY or UNIQUE constraints with all fields unique; otherwise duplicate records may appear in the destination.
Table limit per task	If you select individual tables as sync objects and need to rename them in the destination, a single task supports up to 1,000 tables. For more than 1,000 tables, configure multiple tasks or sync at the database level.
Binary log requirements	See the table below.
DDL during sync	Do not run DDL statements that change database or table schemas during schema synchronization or full data synchronization. The task will fail.

Binary log requirements for the source PolarDB for MySQL cluster

Parameter	Required value	Notes
`loose_polar_log_bin`	`ON`	Binary logging must be enabled. If not set, the precheck fails and the task cannot start. Enabling binary logging incurs storage charges. See Enable binary logging and Modify parameters.
Binary log retention period	At least 3 days (7 days recommended)	Retention periods shorter than 3 days risk data inconsistency or loss and may affect DTS service level agreement (SLA) guarantees. See Modify the retention period.

Other limitations

DTS does not synchronize read-only nodes of the source PolarDB for MySQL cluster.
DTS does not synchronize Object Storage Service (OSS) external tables from the source cluster.
Evaluate the impact on source and destination database performance before starting sync. Run synchronization during off-peak hours when possible. Initial full data synchronization uses read and write resources on both ends and may increase server load.
After initial full data synchronization completes, the size of used tablespace of the destination database is larger than that of the source due to fragmentation from concurrent INSERT operations.
Do not use tools such as pt-online-schema-change to run DDL operations on source tables during sync. The task will fail.
If no other sources write to the destination during sync, you can use Data Management (DMS) to run online DDL operations on source tables. See Perform lock-free DDL operations.
If other sources write to the destination while you also run online DDL via DMS, data loss may occur in the destination.
MaxCompute does not support PRIMARY KEY constraints. If network errors occur, DTS may write duplicate records to the MaxCompute project.
If a DTS task fails, DTS technical support will try to restore the task within 8 hours. During restoration, the task may restart and task parameters (not database parameters) may be modified. For the parameters that may change, see the Modify instance parameters section.

Special cases

DTS periodically executes CREATE DATABASE IF NOT EXISTS \test\`` on the source database to advance the binary log position.

Create a data synchronization task

Step 1: Go to the Data Synchronization page

Use either console to open the Data Synchronization page and select the region where the synchronization instance will reside.

DTS console

Log on to the DTS console.
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner of the page, select the region in which the data synchronization instance resides.

DMS console

Note

The actual operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode and Customize the layout and style of the DMS console.

Log on to the DMS console.
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list to the right of Data Synchronization Tasks, select the region in which the data synchronization instance resides.

Step 2: Configure source and destination databases

Click Create Task to open the task configuration page.
Configure the source and destination databases using the following parameters.

Section	Parameter	Description
N/A	Task Name	The name of the DTS task. DTS generates a name automatically. Specify a descriptive name for easy identification. The name does not need to be unique.
Source Database	Select Existing Connection	If the source is already registered with DTS, select it from the drop-down list and DTS populates the remaining parameters automatically. See Manage database connections. In the DMS console, select the instance from the Select a DMS database instance list. If the instance is not registered, fill in the parameters below manually.
	Database Type	Select PolarDB for MySQL.
	Connection Type	Select Alibaba Cloud Instance.
	Instance Region	The region of the source PolarDB for MySQL cluster.
	Cross-account	Select No to synchronize within the same Alibaba Cloud account.
	PolarDB Cluster ID	The ID of the source PolarDB for MySQL cluster.
	Database Account	The account for the source cluster. The account must have read permissions on the objects to be synchronized.
	Database Password	The password for the database account.
	Encryption	Optional. Enable SSL encryption for the connection to the source cluster based on your security requirements. See Configure SSL encryption.
Destination Database	Select Existing Connection	If the destination is already registered with DTS, select it from the drop-down list. See Manage database connections. If not registered, fill in the parameters below manually.
	Database Type	Select MaxCompute.
	Connection Type	Select Alibaba Cloud Instance.
	Instance Region	The region of the destination MaxCompute project.
	Project	The name of the MaxCompute project.
	AccessKey ID of Alibaba Cloud Account	The AccessKey ID from your prerequisites.
	AccessKey Secret of Alibaba Cloud Account	The AccessKey secret from your prerequisites.

Click Test Connectivity and Proceed.

DTS must be able to access both the source and destination databases. If needed, add DTS server CIDR blocks to the security settings of your databases. See Add the CIDR blocks of DTS servers.

Click OK to complete the ODPS (MaxCompute) account authorization, then click Test Connectivity and Proceed again.

Step 3: Configure sync objects

In the Configure Objects step, set the following parameters.

Parameter	Description
Synchronization Types	By default, Incremental Data Synchronization is selected. Also select Schema Synchronization and Full Data Synchronization. All three are required to establish the baseline data and capture ongoing changes.
Naming Rules of Additional Columns	DTS adds metadata columns to the destination table. If those column names conflict with existing column names, the task fails. Select New Rule or Previous Rule based on your requirements. Before selecting, verify there are no name conflicts between additional columns and existing destination columns. See Naming rules for additional columns.
Partition Definition of Incremental Data Table	Select partition names based on your requirements. See Partition.
Processing Mode of Conflicting Tables	Precheck and Report Errors: Checks whether destination tables have the same names as source tables. If identical names exist, an error is returned and the task cannot start. To resolve conflicts without deleting destination tables, use the object name mapping feature. See Map object names. Ignore Errors and Proceed: Skips the name conflict check. If the source and destination tables have the same schema, DTS skips conflicting records during full sync but overwrites them during incremental sync. If schemas differ, data initialization may partially fail. Use with caution.
Capitalization of Object Names in Destination Instance	Controls the capitalization of database, table, and column names in the destination. The default is DTS default policy. See Specify the capitalization of object names in the destination instance.
Source Objects	Select tables from the Source Objects section and click the right-arrow icon to move them to the Selected Objects section.
Selected Objects	To rename a single object in the destination, right-click it in the Selected Objects section. See Map the name of a single object. To rename multiple objects at once, click Batch Edit in the upper-right corner. See Map multiple object names at a time.

Click Next: Advanced Settings and configure the following parameters.

Parameter	Description
Dedicated Cluster for Task Scheduling	By default, DTS schedules the task to the shared cluster. To improve task stability, purchase and specify a dedicated cluster. See What is a DTS dedicated cluster.
Retry Time for Failed Connections	How long DTS retries after a connection failure. Valid values: 10–1440 minutes. Default: 720 minutes. We recommend that you set this parameter to a value greater than 30. If DTS reconnects within this window, the task resumes; otherwise the task fails. If multiple tasks share the same source or destination database, the shortest retry time applies. DTS charges for the instance while retrying.
Retry Time for Other Issues	How long DTS retries after DDL or DML operation failures. Valid values: 1–1440 minutes. Default: 10 minutes. We recommend that you set this parameter to a value greater than 10. This value must be smaller than Retry Time for Failed Connections.
Enable Throttling for Full Data Synchronization	Limits resource usage during full data synchronization by configuring Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected.
Enable Throttling for Incremental Data Synchronization	Limits resource usage during incremental sync by configuring RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Whether to delete SQL operations on heartbeat tables of forward and reverse tasks	Controls whether DTS writes heartbeat SQL operations to the source database. Yesalert notification settings: Does not write heartbeat operations. A latency indicator may appear for the DTS instance. No: Writes heartbeat operations. Physical backup and cloning of the source database may be affected.
Environment Tag	An optional tag to identify the DTS instance.
Configure ETL	Enables the extract, transform, and load (ETL) feature. Yes: Opens a code editor for data processing statements. See Configure ETL in a data migration or data synchronization task. No: Skips ETL configuration. See What is ETL?
Monitoring and Alerting	Configures alerts for task failures or latency exceeding a threshold. No: Disables alerting. Yes: Enables alerting. Set the alert threshold and notification settings. See Configure monitoring and alerting when you create a DTS task.

Step 4: Run the precheck

Click Next: Save Task Settings and Precheck.

To preview the API parameters for this task configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters before proceeding.

DTS runs a precheck before the task can start.

If the precheck fails, click View Details next to the failed item, resolve the issue, then click Precheck Again.
If an alert is triggered:
- If the alert cannot be ignored, click View Details, fix the issue, then rerun the precheck.
- If the alert can be ignored, click Confirm Alert Details, then click Ignore in the dialog box, confirm with OK, and click Precheck Again. Ignoring an alert may result in data inconsistency.

Step 5: Purchase an instance

Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
On the buy page, configure the billing and instance settings.

Section	Parameter	Description
New Instance Class	Billing method	Subscription: Pay upfront for a fixed period. More cost-effective for long-term use. Pay-as-you-go: Billed hourly. Suitable for short-term use. Release the instance when no longer needed to stop billing.
	Instance class	Controls synchronization speed. Select a class based on your throughput requirements. See Instance classes of data synchronization instances.
	Subscription duration	Available for subscription billing only. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years.
Resource group settings	—	The resource group for the instance. Default: default resource group. See What is Resource Management?

Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the confirmation dialog.

The task appears in the task list. Monitor its progress there.

Structure of incremental data tables

Run set odps.sql.allow.fullscan=true; in MaxCompute to allow full table scans on the project before querying incremental data tables.

Each incremental data table (_log table) contains the original source columns plus the following metadata fields added by DTS.

Field	Description
`record_id`	A unique, incremental ID for each log entry. For UPDATE operations, DTS splits the change into two records (before and after values) that share the same `record_id`.
`operation_flag`	The operation type: `I` (INSERT), `D` (DELETE), or `U` (UPDATE).
`utc_timestamp`	The timestamp of the binary log entry in UTC.
`before_flag`	Indicates whether the row values represent the state before the update. Valid values: `Y` or `N`.
`after_flag`	Indicates whether the row values represent the state after the update. Valid values: `Y` or `N`.