Use Data Transmission Service (DTS) to migrate data from a PolarDB for MySQL cluster to a MaxCompute project. DTS supports schema migration, full data migration, and incremental data migration—covering historical data and ongoing changes.
How it works
DTS uses a three-stage pipeline to move data from PolarDB for MySQL to MaxCompute:
Schema migration: DTS reads the source table schemas and creates corresponding tables in MaxCompute, appending the
_basesuffix to each table name. For example, thecustomertable becomescustomer_basein MaxCompute.Full data migration: DTS copies all existing rows from each source table to the corresponding
_basetable in MaxCompute. These tables are called full baseline tables and serve as the starting point for incremental sync.Incremental data migration: DTS creates a separate
_logtable for each migrated table (for example,customer_log) and streams ongoing INSERT, UPDATE, and DELETE operations from the source binary logs into these tables in real time.
Prerequisites
Before you begin, make sure you have:
An activated MaxCompute service and a MaxCompute project. See Activate MaxCompute and DataWorks and Create a MaxCompute project.
An IP address whitelist configured on the MaxCompute project to allow DTS access. See Configure an IP address whitelist to allow access from Alibaba Cloud services to MaxCompute.
An AccessKey pair for the Alibaba Cloud account that owns the destination MaxCompute project. See Create an AccessKey pair.
Instead of using the primary account AccessKey pair, create a RAM user and set it as the super administrator for the MaxCompute project.
Limitations
Task-blocking conditions (fix before starting)
These conditions cause the migration task to fail:
The server on which the source database is deployed must have sufficient outbound bandwidth. Otherwise, the data migration speed decreases.
Tables to be migrated must have a PRIMARY KEY or UNIQUE constraint with all fields unique. Without this, the destination may contain duplicate records.
If you select tables as migration objects and need to rename tables or columns, a single task supports up to 1,000 tables. Tasks with more than 1,000 tables in this mode return a request error. Configure multiple tasks or migrate at the database level instead.
For incremental data migration, binary logging must be enabled and the
loose_polar_log_binparameter must be set toon. If binary logging is not enabled, the precheck fails and the task cannot start. See Enable binary logging and Modify parameters.Do not execute DDL statements to change database or table schemas during schema migration or full data migration. The migration task fails if schemas change mid-task.
Do not use tools such as
pt-online-schema-changefor online DDL operations on objects being migrated. The migration task fails.
Enabling binary logging on a PolarDB for MySQL cluster incurs storage charges for the binary log files.
Data integrity risks (review before starting)
These conditions may cause data inconsistency or loss:
Binary logs must be retained for at least 3 days; retain them for 7 days to be safe. If DTS cannot read the binary logs, the task fails. In exceptional cases, data inconsistency or loss may occur. Retaining logs for fewer days than required means the Service Level Agreement (SLA) for DTS may not apply. See Modify the retention period.
Do not write data to the source database during full-only migration (schema migration + full data migration without incremental). Writing to the source during this phase causes data inconsistency. To safely write to the source during migration, include incremental data migration.
Do not write data from other sources to the destination MaxCompute project during migration. External writes cause data inconsistency.
MaxCompute does not support primary key constraints. If network errors occur, DTS may write duplicate records to the MaxCompute project.
DTS retries failed tasks for up to 7 days. Before switching workloads to the destination, stop or release failed tasks. Alternatively, revoke write permissions from the DTS accounts on the destination. If you do not, the source data overwrites destination data when a failed task resumes.
Operational limits
DTS does not migrate read-only nodes of the source PolarDB for MySQL cluster.
DTS does not migrate Object Storage Service (OSS) external tables from the source PolarDB for MySQL cluster.
DTS does not migrate foreign keys. Cascade and delete operations dependent on foreign keys are not replicated to the destination.
Migrate data during off-peak hours. Full data migration uses read and write resources on both the source and destination, which increases load on both database servers.
After full data migration, the destination tablespace is larger than the source because concurrent INSERT operations cause table fragmentation.
If a DTS instance fails, the DTS helpdesk attempts recovery within 8 hours. Recovery may involve restarting the instance or adjusting DTS instance parameters (not database parameters). For parameters that may be modified, see Modify instance parameters.
Billing
| Migration type | Task configuration fee | Internet traffic fee |
|---|---|---|
| Schema migration and full data migration | Free | Free, unless Access Method is set to Public IP Address. See Billing overview. |
| Incremental data migration | Charged. See Billing overview. | — |
SQL operations supported for incremental migration
| Operation type | SQL statements |
|---|---|
| DML | INSERT, UPDATE, DELETE |
| DDL | ADD COLUMN |
ADD COLUMN operations that include attribute columns cannot be migrated.
Required database account permissions
| Database | Required permissions | How to grant |
|---|---|---|
| PolarDB for MySQL | Read permissions on the objects to migrate | Create and manage database accounts and Manage database account passwords |
Create a migration task
Step 1: Open the Data Migration page
Use either the DTS console or the DMS console.
DTS console
Log on to the DTS console.DTS console
In the left-side navigation pane, click Data Migration.
In the upper-left corner, select the region where the migration instance will reside.
DMS console
The DMS console layout varies by mode. See Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console.DMS console
In the top navigation bar, move the pointer over Data + AI > DTS (DTS) > Data Migration.
From the drop-down list next to Data Migration Tasks, select the region where the migration instance will reside.
Step 2: Configure source and destination databases
Click Create Task.
Configure the following parameters.
General
| Parameter | Description |
|---|---|
| Task Name | A name for the DTS task. DTS generates a default name. Specify a descriptive name to identify the task easily. Task names do not need to be unique. |
Source database
| Parameter | Description |
|---|---|
| Select Existing Connection | If the source database is already registered with DTS, select it from the drop-down list. DTS populates the remaining fields automatically. Otherwise, configure the fields below. In the DMS console, select from Select a DMS database instance. |
| Database Type | Select PolarDB for MySQL. |
| Access Method | Select Alibaba Cloud Instance. |
| Instance Region | The region where the source PolarDB for MySQL cluster resides. |
| Replicate Data Across Alibaba Cloud Accounts | Select No when migrating within the same Alibaba Cloud account. |
| PolarDB Cluster ID | The ID of the source PolarDB for MySQL cluster. |
| Database Account | The database account for the source cluster. See Required database account permissions. |
| Database Password | The password for the database account. |
| Encryption | Whether to encrypt the connection to the source database. See Configure SSL encryption. |
Destination database
| Parameter | Description |
|---|---|
| Select Existing Connection | If the destination MaxCompute project is already registered with DTS, select it from the drop-down list. Otherwise, configure the fields below. |
| Database Type | Select MaxCompute. |
| Access Method | Select Alibaba Cloud Instance. |
| Instance Region | The region where the destination MaxCompute project resides. |
| Project | The name of the destination MaxCompute project. |
| AccessKey ID of Alibaba Cloud Account | The AccessKey ID from the AccessKey pair you prepared in the prerequisites. |
| AccessKey Secret of Alibaba Cloud Account | The AccessKey Secret from the AccessKey pair. |
Click Test Connectivity and Proceed.
Make sure the CIDR blocks of DTS servers are added to the security settings of the source and destination databases. DTS can add them automatically, or you can add them manually. See Add DTS server IP addresses to a whitelist.
Click OK to grant DTS permissions on the MaxCompute project. Then click Test Connectivity and Proceed.
Step 3: Configure migration objects
On the Configure Objects page, set the following parameters.
Migration types
| Parameter | Description |
|---|---|
| Migration Types | Select the migration types based on your use case: <br>- Schema Migration + Full Data Migration: migrates historical data only. The task completes and stops automatically.<br>- Schema Migration + Full Data Migration + Incremental Data Migration: migrates historical data and replicates ongoing changes in real time. The task runs continuously until you stop it. |
If you skip Schema Migration, create the destination database and tables manually before starting the task, and enable object name mapping in Selected Objects.
If you skip Incremental Data Migration, do not write to the source database during migration to avoid data inconsistency.
Table configuration
| Parameter | Description |
|---|---|
| Naming Rules of Additional Columns | After migration, DTS adds extra columns to the destination table. If an added column name conflicts with an existing column, the task fails. Select New Rule or Previous Rule based on your setup. Check for name conflicts before selecting. See Naming rules for additional columns. |
| Partition Definition of Incremental Data Table | Select partition names for the incremental data table. See Partition. |
| Processing Mode of Conflicting Tables | How DTS handles tables in the destination with the same name as tables in the source:<br>- Precheck and Report Errors: the task fails the precheck if name conflicts are found. To resolve conflicts without renaming destination tables, use object name mapping. See Map object names.<br>- Ignore Errors and Proceed: skips the conflict check. If source and destination schemas match and primary keys conflict, full migration skips the conflicting record (the destination record is kept), while incremental migration overwrites it. If schemas differ, only specific columns are migrated or the task fails. Use with caution. |
| Capitalization of Object Names in Destination Instance | The capitalization policy for database, table, and column names in the destination. Defaults to DTS default policy. See Specify the capitalization of object names in the destination instance. |
Object selection
| Parameter | Description |
|---|---|
| Source Objects | Select the objects to migrate. Click the |
| Selected Objects | To rename a single object in the destination, right-click it in Selected Objects. See Map the name of a single object. To rename multiple objects at once, click Batch Edit. See Map multiple object names at a time. To filter rows in a table, right-click the table and configure filter conditions. See Specify filter conditions. Note Renaming an object with object name mapping may cause dependent objects to fail migration. |
Click Next: Advanced Settings.
Step 4: Configure advanced settings
| Parameter | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | By default, DTS schedules the task to the shared cluster. For higher stability, purchase a dedicated cluster. See What is a DTS dedicated cluster. |
| Retry Time for Failed Connections | How long DTS retries when the source or destination database is unreachable. Valid values: 10–1,440 minutes. Default: 720 minutes. We recommend that you set the parameter to a value greater than 30. If DTS reconnects within the retry window, it resumes the task. If not, the task fails. Note When multiple tasks share a source or destination database, the most recently set retry time applies. DTS charges for the instance during retries. |
| Retry Time for Other Issues | How long DTS retries after DDL or DML failures. Valid values: 1–1,440 minutes. Default: 10 minutes. We recommend that you set the parameter to a value greater than 10. Must be smaller than Retry Time for Failed Connections. |
| Enable Throttling for Full Data Migration | Limits the read and write load on the source and destination during full data migration. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration (rows per second), and Data migration speed for full migration (MB/s). Only available when Full Data Migration is selected. |
| Enable Throttling for Incremental Data Migration | Limits the load during incremental data migration. Configure RPS of Incremental Data Migration and Data migration speed for incremental migration (MB/s). Only available when Incremental Data Migration is selected. |
| Whether to delete SQL operations on heartbeat tables of forward and reverse tasks | Controls whether DTS writes SQL operations to heartbeat tables in the source database while running. Yesalert notification settings: does not write to heartbeat tables (a latency indicator may appear on the instance). No: writes to heartbeat tables (may affect physical backup or cloning of the source). |
| Environment Tag | An optional tag to identify the DTS instance by environment (for example, production or test). |
| Configure ETL | Whether to enable the extract, transform, and load (ETL) feature. Yes: configure data processing statements in the code editor. See Configure ETL in a data migration or data synchronization task. No: skip ETL configuration. See What is ETL? |
| Monitoring and Alerting | Whether to configure alerts for the migration task. Yes: set alert thresholds and notification contacts. Alerts fire if the task fails or migration latency exceeds the threshold. See Configure monitoring and alerting when you create a DTS task. No: skip alerting. |
Step 5: Run the precheck
To preview the API parameters before saving, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
Click Next: Save Task Settings and Precheck.
DTS runs a precheck before the migration task can start. The task only starts after the precheck passes.
If a check item fails, click View Details next to it, fix the issue, and click Precheck Again.
If an alert fires on a check item and the item can be safely ignored, click Confirm Alert Details, then click Ignore in the dialog box and click OK. Then click Precheck Again. Ignoring an alert may cause data inconsistency.
Step 6: Purchase an instance and start the task
Wait for Success Rate to reach 100%, then click Next: Purchase Instance.
On the Purchase Instance page, configure the following parameters.
| Parameter | Description |
|---|---|
| Resource Group | The resource group for the migration instance. Defaults to the default resource group. See What is Resource Management? |
| Instance Class | The migration speed class. Select based on your throughput requirements. See Instance classes of data migration instances. |
Read and accept Data Transmission Service (Pay-as-you-go) Service Terms by selecting the check box.
Click Buy and Start, then click OK in the confirmation dialog.
The task appears on the Data Migration page.
Tasks without incremental migration stop automatically when full migration finishes. The status shows Completed.
Tasks with incremental migration run continuously. The status shows Running. Stop the task manually when you are ready to cut over.
Structure of incremental data tables
Run set odps.sql.allow.fullscan=true; in MaxCompute to allow full table scans on the project before querying incremental data tables.DTS writes incremental changes from the source to a _log table in MaxCompute (for example, customer_log). Each row in the incremental data table represents one change event and includes both the data columns from the source table and the following metadata columns.
| Field | Description |
|---|---|
record_id | The unique identifier of the incremental log entry. IDs auto-increment with each new entry. For UPDATE operations, DTS generates two entries (pre-update and post-update values) with the same record_id. |
operation_flag | The type of change: I (INSERT), D (DELETE), or U (UPDATE). |
utc_timestamp | The timestamp of the operation in UTC, taken from the binary log. |
before_flag | Whether the row contains pre-update values: Y or N. |
after_flag | Whether the row contains post-update values: Y or N. |