All Products
Search
Document Center

Data Transmission Service:Synchronize data from a PolarDB for MySQL cluster to a MaxCompute project

Last Updated:Mar 28, 2026

Use Data Transmission Service (DTS) to replicate data from a PolarDB for MySQL cluster to a MaxCompute project. DTS handles schema synchronization, initial full data load, and ongoing incremental change capture in a single managed task.

Before you begin

Before you begin, ensure that you have:

As an alternative to using the Alibaba Cloud account's AccessKey pair directly, create a RAM user and set the RAM user as the super administrator for the MaxCompute project.

Billing

Synchronization typeFee
Schema synchronization and full data synchronizationFree
Incremental data synchronizationCharged. See Billing overview.

How synchronization works

DTS runs the synchronization in three sequential phases:

  1. Schema synchronization — DTS replicates table schemas from PolarDB for MySQL to MaxCompute. Each source table receives a _base suffix in MaxCompute (for example, customer becomes customer_base).

  2. Full data synchronization — DTS copies all existing rows from the source table to the corresponding _base table. This full baseline table is the starting point for incremental sync.

  3. Incremental data synchronization — DTS creates a _log table for each source table (for example, customer_log) and continuously writes change records from the source binary log into that table.

For the structure of _log tables, see Structure of incremental data tables.

Supported synchronization topologies

  • One-way one-to-one synchronization

  • One-way one-to-many synchronization

  • One-way cascade synchronization

  • One-way many-to-one synchronization

For the full list, see Synchronization topologies.

SQL operations that can be synchronized

Operation typeSQL statements
DMLINSERT, UPDATE, DELETE
DDLADD COLUMN (ADD COLUMN with attribute columns cannot be synchronized)
DTS does not synchronize foreign keys. Cascade and delete operations on the source are not replicated to the destination.

Limitations

Source Database

LimitationDetails
Primary or unique key requiredTables must have PRIMARY KEY or UNIQUE constraints with all fields unique; otherwise duplicate records may appear in the destination.
Table limit per taskIf you select individual tables as sync objects and need to rename them in the destination, a single task supports up to 1,000 tables. For more than 1,000 tables, configure multiple tasks or sync at the database level.
Binary log requirementsSee the table below.
DDL during syncDo not run DDL statements that change database or table schemas during schema synchronization or full data synchronization. The task will fail.

Binary log requirements for the source PolarDB for MySQL cluster

ParameterRequired valueNotes
loose_polar_log_binONBinary logging must be enabled. If not set, the precheck fails and the task cannot start. Enabling binary logging incurs storage charges. See Enable binary logging and Modify parameters.
Binary log retention periodAt least 3 days (7 days recommended)Retention periods shorter than 3 days risk data inconsistency or loss and may affect DTS service level agreement (SLA) guarantees. See Modify the retention period.

Other limitations

  • DTS does not synchronize read-only nodes of the source PolarDB for MySQL cluster.

  • DTS does not synchronize Object Storage Service (OSS) external tables from the source cluster.

  • Evaluate the impact on source and destination database performance before starting sync. Run synchronization during off-peak hours when possible. Initial full data synchronization uses read and write resources on both ends and may increase server load.

  • After initial full data synchronization completes, the size of used tablespace of the destination database is larger than that of the source due to fragmentation from concurrent INSERT operations.

  • Do not use tools such as pt-online-schema-change to run DDL operations on source tables during sync. The task will fail.

  • If no other sources write to the destination during sync, you can use Data Management (DMS) to run online DDL operations on source tables. See Perform lock-free DDL operations.

  • If other sources write to the destination while you also run online DDL via DMS, data loss may occur in the destination.

  • MaxCompute does not support PRIMARY KEY constraints. If network errors occur, DTS may write duplicate records to the MaxCompute project.

  • If a DTS task fails, DTS technical support will try to restore the task within 8 hours. During restoration, the task may restart and task parameters (not database parameters) may be modified. For the parameters that may change, see the Modify instance parameters section.

Special cases

DTS periodically executes CREATE DATABASE IF NOT EXISTS \test\`` on the source database to advance the binary log position.

Create a data synchronization task

Step 1: Go to the Data Synchronization page

Use either console to open the Data Synchronization page and select the region where the synchronization instance will reside.

DTS console

  1. Log on to the DTS console.

  2. In the left-side navigation pane, click Data Synchronization.

  3. In the upper-left corner of the page, select the region in which the data synchronization instance resides.

DMS console

Note

The actual operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode and Customize the layout and style of the DMS console.

  1. Log on to the DMS console.

  2. In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.

  3. From the drop-down list to the right of Data Synchronization Tasks, select the region in which the data synchronization instance resides.

Step 2: Configure source and destination databases

  1. Click Create Task to open the task configuration page.

  2. Configure the source and destination databases using the following parameters.

SectionParameterDescription
N/ATask NameThe name of the DTS task. DTS generates a name automatically. Specify a descriptive name for easy identification. The name does not need to be unique.
Source DatabaseSelect Existing ConnectionIf the source is already registered with DTS, select it from the drop-down list and DTS populates the remaining parameters automatically. See Manage database connections. In the DMS console, select the instance from the Select a DMS database instance list. If the instance is not registered, fill in the parameters below manually.
Database TypeSelect PolarDB for MySQL.
Connection TypeSelect Alibaba Cloud Instance.
Instance RegionThe region of the source PolarDB for MySQL cluster.
Cross-accountSelect No to synchronize within the same Alibaba Cloud account.
PolarDB Cluster IDThe ID of the source PolarDB for MySQL cluster.
Database AccountThe account for the source cluster. The account must have read permissions on the objects to be synchronized.
Database PasswordThe password for the database account.
EncryptionOptional. Enable SSL encryption for the connection to the source cluster based on your security requirements. See Configure SSL encryption.
Destination DatabaseSelect Existing ConnectionIf the destination is already registered with DTS, select it from the drop-down list. See Manage database connections. If not registered, fill in the parameters below manually.
Database TypeSelect MaxCompute.
Connection TypeSelect Alibaba Cloud Instance.
Instance RegionThe region of the destination MaxCompute project.
ProjectThe name of the MaxCompute project.
AccessKey ID of Alibaba Cloud AccountThe AccessKey ID from your prerequisites.
AccessKey Secret of Alibaba Cloud AccountThe AccessKey secret from your prerequisites.
  1. Click Test Connectivity and Proceed.

DTS must be able to access both the source and destination databases. If needed, add DTS server CIDR blocks to the security settings of your databases. See Add the CIDR blocks of DTS servers.
  1. Click OK to complete the ODPS (MaxCompute) account authorization, then click Test Connectivity and Proceed again.

Step 3: Configure sync objects

  1. In the Configure Objects step, set the following parameters.

ParameterDescription
Synchronization TypesBy default, Incremental Data Synchronization is selected. Also select Schema Synchronization and Full Data Synchronization. All three are required to establish the baseline data and capture ongoing changes.
Naming Rules of Additional ColumnsDTS adds metadata columns to the destination table. If those column names conflict with existing column names, the task fails. Select New Rule or Previous Rule based on your requirements. Before selecting, verify there are no name conflicts between additional columns and existing destination columns. See Naming rules for additional columns.
Partition Definition of Incremental Data TableSelect partition names based on your requirements. See Partition.
Processing Mode of Conflicting TablesPrecheck and Report Errors: Checks whether destination tables have the same names as source tables. If identical names exist, an error is returned and the task cannot start. To resolve conflicts without deleting destination tables, use the object name mapping feature. See Map object names. Ignore Errors and Proceed: Skips the name conflict check. If the source and destination tables have the same schema, DTS skips conflicting records during full sync but overwrites them during incremental sync. If schemas differ, data initialization may partially fail. Use with caution.
Capitalization of Object Names in Destination InstanceControls the capitalization of database, table, and column names in the destination. The default is DTS default policy. See Specify the capitalization of object names in the destination instance.
Source ObjectsSelect tables from the Source Objects section and click the right-arrow icon to move them to the Selected Objects section.
Selected ObjectsTo rename a single object in the destination, right-click it in the Selected Objects section. See Map the name of a single object. To rename multiple objects at once, click Batch Edit in the upper-right corner. See Map multiple object names at a time.
  1. Click Next: Advanced Settings and configure the following parameters.

ParameterDescription
Dedicated Cluster for Task SchedulingBy default, DTS schedules the task to the shared cluster. To improve task stability, purchase and specify a dedicated cluster. See What is a DTS dedicated cluster.
Retry Time for Failed ConnectionsHow long DTS retries after a connection failure. Valid values: 10–1440 minutes. Default: 720 minutes. We recommend that you set this parameter to a value greater than 30. If DTS reconnects within this window, the task resumes; otherwise the task fails. If multiple tasks share the same source or destination database, the shortest retry time applies. DTS charges for the instance while retrying.
Retry Time for Other IssuesHow long DTS retries after DDL or DML operation failures. Valid values: 1–1440 minutes. Default: 10 minutes. We recommend that you set this parameter to a value greater than 10. This value must be smaller than Retry Time for Failed Connections.
Enable Throttling for Full Data SynchronizationLimits resource usage during full data synchronization by configuring Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected.
Enable Throttling for Incremental Data SynchronizationLimits resource usage during incremental sync by configuring RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Whether to delete SQL operations on heartbeat tables of forward and reverse tasksControls whether DTS writes heartbeat SQL operations to the source database. Yesalert notification settings: Does not write heartbeat operations. A latency indicator may appear for the DTS instance. No: Writes heartbeat operations. Physical backup and cloning of the source database may be affected.
Environment TagAn optional tag to identify the DTS instance.
Configure ETLEnables the extract, transform, and load (ETL) feature. Yes: Opens a code editor for data processing statements. See Configure ETL in a data migration or data synchronization task. No: Skips ETL configuration. See What is ETL?
Monitoring and AlertingConfigures alerts for task failures or latency exceeding a threshold. No: Disables alerting. Yes: Enables alerting. Set the alert threshold and notification settings. See Configure monitoring and alerting when you create a DTS task.

Step 4: Run the precheck

  1. Click Next: Save Task Settings and Precheck.

To preview the API parameters for this task configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters before proceeding.

DTS runs a precheck before the task can start.

  1. If the precheck fails, click View Details next to the failed item, resolve the issue, then click Precheck Again.

  2. If an alert is triggered:

    • If the alert cannot be ignored, click View Details, fix the issue, then rerun the precheck.

    • If the alert can be ignored, click Confirm Alert Details, then click Ignore in the dialog box, confirm with OK, and click Precheck Again. Ignoring an alert may result in data inconsistency.

Step 5: Purchase an instance

  1. Wait until Success Rate reaches 100%, then click Next: Purchase Instance.

  2. On the buy page, configure the billing and instance settings.

SectionParameterDescription
New Instance ClassBilling methodSubscription: Pay upfront for a fixed period. More cost-effective for long-term use. Pay-as-you-go: Billed hourly. Suitable for short-term use. Release the instance when no longer needed to stop billing.
Instance classControls synchronization speed. Select a class based on your throughput requirements. See Instance classes of data synchronization instances.
Subscription durationAvailable for subscription billing only. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years.
Resource group settingsThe resource group for the instance. Default: default resource group. See What is Resource Management?
  1. Read and select Data Transmission Service (Pay-as-you-go) Service Terms.

  2. Click Buy and Start, then click OK in the confirmation dialog.

The task appears in the task list. Monitor its progress there.

Structure of incremental data tables

Run set odps.sql.allow.fullscan=true; in MaxCompute to allow full table scans on the project before querying incremental data tables.

Each incremental data table (_log table) contains the original source columns plus the following metadata fields added by DTS.

FieldDescription
record_idA unique, incremental ID for each log entry. For UPDATE operations, DTS splits the change into two records (before and after values) that share the same record_id.
operation_flagThe operation type: I (INSERT), D (DELETE), or U (UPDATE).
utc_timestampThe timestamp of the binary log entry in UTC.
before_flagIndicates whether the row values represent the state before the update. Valid values: Y or N.
after_flagIndicates whether the row values represent the state after the update. Valid values: Y or N.