Use Data Transmission Service (DTS) to replicate data from a PolarDB for MySQL cluster to a MaxCompute project. DTS handles schema synchronization, initial full data load, and ongoing incremental change capture in a single managed task.
Before you begin
Before you begin, ensure that you have:
A MaxCompute project. See Activate MaxCompute and DataWorks and Create a MaxCompute project.
The MaxCompute whitelist configured to allow DTS access. See Configure a whitelist for Alibaba Cloud services to access MaxCompute.
An AccessKey pair for the Alibaba Cloud account that owns the destination MaxCompute project. See Create an AccessKey pair.
As an alternative to using the Alibaba Cloud account's AccessKey pair directly, create a RAM user and set the RAM user as the super administrator for the MaxCompute project.
Billing
| Synchronization type | Fee |
|---|---|
| Schema synchronization and full data synchronization | Free |
| Incremental data synchronization | Charged. See Billing overview. |
How synchronization works
DTS runs the synchronization in three sequential phases:
Schema synchronization — DTS replicates table schemas from PolarDB for MySQL to MaxCompute. Each source table receives a
_basesuffix in MaxCompute (for example,customerbecomescustomer_base).Full data synchronization — DTS copies all existing rows from the source table to the corresponding
_basetable. This full baseline table is the starting point for incremental sync.Incremental data synchronization — DTS creates a
_logtable for each source table (for example,customer_log) and continuously writes change records from the source binary log into that table.
For the structure of _log tables, see Structure of incremental data tables.
Supported synchronization topologies
One-way one-to-one synchronization
One-way one-to-many synchronization
One-way cascade synchronization
One-way many-to-one synchronization
For the full list, see Synchronization topologies.
SQL operations that can be synchronized
| Operation type | SQL statements |
|---|---|
| DML | INSERT, UPDATE, DELETE |
| DDL | ADD COLUMN (ADD COLUMN with attribute columns cannot be synchronized) |
DTS does not synchronize foreign keys. Cascade and delete operations on the source are not replicated to the destination.
Limitations
Source Database
| Limitation | Details |
|---|---|
| Primary or unique key required | Tables must have PRIMARY KEY or UNIQUE constraints with all fields unique; otherwise duplicate records may appear in the destination. |
| Table limit per task | If you select individual tables as sync objects and need to rename them in the destination, a single task supports up to 1,000 tables. For more than 1,000 tables, configure multiple tasks or sync at the database level. |
| Binary log requirements | See the table below. |
| DDL during sync | Do not run DDL statements that change database or table schemas during schema synchronization or full data synchronization. The task will fail. |
Binary log requirements for the source PolarDB for MySQL cluster
| Parameter | Required value | Notes |
|---|---|---|
loose_polar_log_bin | ON | Binary logging must be enabled. If not set, the precheck fails and the task cannot start. Enabling binary logging incurs storage charges. See Enable binary logging and Modify parameters. |
| Binary log retention period | At least 3 days (7 days recommended) | Retention periods shorter than 3 days risk data inconsistency or loss and may affect DTS service level agreement (SLA) guarantees. See Modify the retention period. |
Other limitations
DTS does not synchronize read-only nodes of the source PolarDB for MySQL cluster.
DTS does not synchronize Object Storage Service (OSS) external tables from the source cluster.
Evaluate the impact on source and destination database performance before starting sync. Run synchronization during off-peak hours when possible. Initial full data synchronization uses read and write resources on both ends and may increase server load.
After initial full data synchronization completes, the size of used tablespace of the destination database is larger than that of the source due to fragmentation from concurrent INSERT operations.
Do not use tools such as pt-online-schema-change to run DDL operations on source tables during sync. The task will fail.
If no other sources write to the destination during sync, you can use Data Management (DMS) to run online DDL operations on source tables. See Perform lock-free DDL operations.
If other sources write to the destination while you also run online DDL via DMS, data loss may occur in the destination.
MaxCompute does not support PRIMARY KEY constraints. If network errors occur, DTS may write duplicate records to the MaxCompute project.
If a DTS task fails, DTS technical support will try to restore the task within 8 hours. During restoration, the task may restart and task parameters (not database parameters) may be modified. For the parameters that may change, see the Modify instance parameters section.
Special cases
DTS periodically executes CREATE DATABASE IF NOT EXISTS \test\`` on the source database to advance the binary log position.
Create a data synchronization task
Step 1: Go to the Data Synchronization page
Use either console to open the Data Synchronization page and select the region where the synchronization instance will reside.
DTS console
Log on to the DTS console.
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner of the page, select the region in which the data synchronization instance resides.
DMS console
The actual operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console.
In the top navigation bar, move the pointer over Data + AI and choose .
From the drop-down list to the right of Data Synchronization Tasks, select the region in which the data synchronization instance resides.
Step 2: Configure source and destination databases
Click Create Task to open the task configuration page.
Configure the source and destination databases using the following parameters.
| Section | Parameter | Description |
|---|---|---|
| N/A | Task Name | The name of the DTS task. DTS generates a name automatically. Specify a descriptive name for easy identification. The name does not need to be unique. |
| Source Database | Select Existing Connection | If the source is already registered with DTS, select it from the drop-down list and DTS populates the remaining parameters automatically. See Manage database connections. In the DMS console, select the instance from the Select a DMS database instance list. If the instance is not registered, fill in the parameters below manually. |
| Database Type | Select PolarDB for MySQL. | |
| Connection Type | Select Alibaba Cloud Instance. | |
| Instance Region | The region of the source PolarDB for MySQL cluster. | |
| Cross-account | Select No to synchronize within the same Alibaba Cloud account. | |
| PolarDB Cluster ID | The ID of the source PolarDB for MySQL cluster. | |
| Database Account | The account for the source cluster. The account must have read permissions on the objects to be synchronized. | |
| Database Password | The password for the database account. | |
| Encryption | Optional. Enable SSL encryption for the connection to the source cluster based on your security requirements. See Configure SSL encryption. | |
| Destination Database | Select Existing Connection | If the destination is already registered with DTS, select it from the drop-down list. See Manage database connections. If not registered, fill in the parameters below manually. |
| Database Type | Select MaxCompute. | |
| Connection Type | Select Alibaba Cloud Instance. | |
| Instance Region | The region of the destination MaxCompute project. | |
| Project | The name of the MaxCompute project. | |
| AccessKey ID of Alibaba Cloud Account | The AccessKey ID from your prerequisites. | |
| AccessKey Secret of Alibaba Cloud Account | The AccessKey secret from your prerequisites. |
Click Test Connectivity and Proceed.
DTS must be able to access both the source and destination databases. If needed, add DTS server CIDR blocks to the security settings of your databases. See Add the CIDR blocks of DTS servers.
Click OK to complete the ODPS (MaxCompute) account authorization, then click Test Connectivity and Proceed again.
Step 3: Configure sync objects
In the Configure Objects step, set the following parameters.
| Parameter | Description |
|---|---|
| Synchronization Types | By default, Incremental Data Synchronization is selected. Also select Schema Synchronization and Full Data Synchronization. All three are required to establish the baseline data and capture ongoing changes. |
| Naming Rules of Additional Columns | DTS adds metadata columns to the destination table. If those column names conflict with existing column names, the task fails. Select New Rule or Previous Rule based on your requirements. Before selecting, verify there are no name conflicts between additional columns and existing destination columns. See Naming rules for additional columns. |
| Partition Definition of Incremental Data Table | Select partition names based on your requirements. See Partition. |
| Processing Mode of Conflicting Tables | Precheck and Report Errors: Checks whether destination tables have the same names as source tables. If identical names exist, an error is returned and the task cannot start. To resolve conflicts without deleting destination tables, use the object name mapping feature. See Map object names. Ignore Errors and Proceed: Skips the name conflict check. If the source and destination tables have the same schema, DTS skips conflicting records during full sync but overwrites them during incremental sync. If schemas differ, data initialization may partially fail. Use with caution. |
| Capitalization of Object Names in Destination Instance | Controls the capitalization of database, table, and column names in the destination. The default is DTS default policy. See Specify the capitalization of object names in the destination instance. |
| Source Objects | Select tables from the Source Objects section and click the right-arrow icon to move them to the Selected Objects section. |
| Selected Objects | To rename a single object in the destination, right-click it in the Selected Objects section. See Map the name of a single object. To rename multiple objects at once, click Batch Edit in the upper-right corner. See Map multiple object names at a time. |
Click Next: Advanced Settings and configure the following parameters.
| Parameter | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | By default, DTS schedules the task to the shared cluster. To improve task stability, purchase and specify a dedicated cluster. See What is a DTS dedicated cluster. |
| Retry Time for Failed Connections | How long DTS retries after a connection failure. Valid values: 10–1440 minutes. Default: 720 minutes. We recommend that you set this parameter to a value greater than 30. If DTS reconnects within this window, the task resumes; otherwise the task fails. If multiple tasks share the same source or destination database, the shortest retry time applies. DTS charges for the instance while retrying. |
| Retry Time for Other Issues | How long DTS retries after DDL or DML operation failures. Valid values: 1–1440 minutes. Default: 10 minutes. We recommend that you set this parameter to a value greater than 10. This value must be smaller than Retry Time for Failed Connections. |
| Enable Throttling for Full Data Synchronization | Limits resource usage during full data synchronization by configuring Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected. |
| Enable Throttling for Incremental Data Synchronization | Limits resource usage during incremental sync by configuring RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s). |
| Whether to delete SQL operations on heartbeat tables of forward and reverse tasks | Controls whether DTS writes heartbeat SQL operations to the source database. Yesalert notification settings: Does not write heartbeat operations. A latency indicator may appear for the DTS instance. No: Writes heartbeat operations. Physical backup and cloning of the source database may be affected. |
| Environment Tag | An optional tag to identify the DTS instance. |
| Configure ETL | Enables the extract, transform, and load (ETL) feature. Yes: Opens a code editor for data processing statements. See Configure ETL in a data migration or data synchronization task. No: Skips ETL configuration. See What is ETL? |
| Monitoring and Alerting | Configures alerts for task failures or latency exceeding a threshold. No: Disables alerting. Yes: Enables alerting. Set the alert threshold and notification settings. See Configure monitoring and alerting when you create a DTS task. |
Step 4: Run the precheck
Click Next: Save Task Settings and Precheck.
To preview the API parameters for this task configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters before proceeding.
DTS runs a precheck before the task can start.
If the precheck fails, click View Details next to the failed item, resolve the issue, then click Precheck Again.
If an alert is triggered:
If the alert cannot be ignored, click View Details, fix the issue, then rerun the precheck.
If the alert can be ignored, click Confirm Alert Details, then click Ignore in the dialog box, confirm with OK, and click Precheck Again. Ignoring an alert may result in data inconsistency.
Step 5: Purchase an instance
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
On the buy page, configure the billing and instance settings.
| Section | Parameter | Description |
|---|---|---|
| New Instance Class | Billing method | Subscription: Pay upfront for a fixed period. More cost-effective for long-term use. Pay-as-you-go: Billed hourly. Suitable for short-term use. Release the instance when no longer needed to stop billing. |
| Instance class | Controls synchronization speed. Select a class based on your throughput requirements. See Instance classes of data synchronization instances. | |
| Subscription duration | Available for subscription billing only. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years. | |
| Resource group settings | — | The resource group for the instance. Default: default resource group. See What is Resource Management? |
Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the confirmation dialog.
The task appears in the task list. Monitor its progress there.
Structure of incremental data tables
Run set odps.sql.allow.fullscan=true; in MaxCompute to allow full table scans on the project before querying incremental data tables.Each incremental data table (_log table) contains the original source columns plus the following metadata fields added by DTS.
| Field | Description |
|---|---|
record_id | A unique, incremental ID for each log entry. For UPDATE operations, DTS splits the change into two records (before and after values) that share the same record_id. |
operation_flag | The operation type: I (INSERT), D (DELETE), or U (UPDATE). |
utc_timestamp | The timestamp of the binary log entry in UTC. |
before_flag | Indicates whether the row values represent the state before the update. Valid values: Y or N. |
after_flag | Indicates whether the row values represent the state after the update. Valid values: Y or N. |