Synchronize data from MySQL to MaxCompute - Data Transmission Service

Billing

Synchronization type	Cost
Schema synchronization and full data synchronization	Free
Incremental data synchronization	Charged. See Billing overview.

Prerequisites

Before you begin, make sure you have:

A MaxCompute project. See Activate MaxCompute and DataWorks and Create a MaxCompute project.
An IP address whitelist configured on the MaxCompute project to allow access from DTS. See Configure an IP address whitelist to allow access from Alibaba Cloud services to MaxCompute.
An AccessKey pair for the Alibaba Cloud account that owns the MaxCompute project. See Create an AccessKey pair.

Instead of using the root account AccessKey, you can create a RAM user and set it as the super administrator for the MaxCompute project.

How it works

DTS synchronizes data in three sequential phases:

Schema synchronization — DTS replicates table schemas from the source database to MaxCompute, appending _base to each table name. For example, the source table customer becomes customer_base in MaxCompute.
Full data synchronization — DTS loads all historical data from the source into the _base tables (called *full baseline tables*). This snapshot is the starting point for incremental sync.
Incremental data synchronization — DTS creates a corresponding _log table for each synchronized table (for example, customer_log) and continuously writes change events from the source binary log into it.

For the schema of the _log table, see Incremental data table schema.

Supported SQL operations for incremental sync

Operation type	Statements
DML	INSERT, UPDATE, DELETE
DDL	ADD COLUMN (ADD COLUMN with attribute columns is not supported)

DTS does not synchronize foreign keys. Cascade and delete operations on the source are not reflected in the destination.

Required permissions

The database account for the source ApsaraDB RDS for MySQL instance must have read and write permissions. See Create an account and Modify account permissions.

If the account was not created in the ApsaraDB RDS for MySQL console, grant it: REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT.

Limitations

Source database requirements

Tables must have PRIMARY KEY or UNIQUE constraints with all fields unique. Without this, the destination may contain duplicate records.
When synchronizing tables with column or table renaming, a single task supports up to 1,000 tables. If you run a task to synchronize more than 1,000 tables, a request error occurs. Use multiple tasks or synchronize at the database level instead.
Do not run DDL statements during schema synchronization or full data synchronization. Doing so causes the task to fail.
Data generated by cascade operations or physical backup restoration is not synchronized.
If the source runs MySQL 8.0.23 or later and tables contain invisible columns, those columns are excluded and data loss occurs. Run ALTER TABLE <table_name> ALTER COLUMN <column_name> SET VISIBLE; to make them visible before starting the task. Tables without primary keys automatically receive invisible primary keys—make those visible too. See Invisible Columns and Generated Invisible Primary Keys.

Binary log requirements

Configure the following MySQL parameters before creating the task:

Parameter	Required value	Applies to
`binlog_row_image`	`full`	ApsaraDB RDS for MySQL and self-managed MySQL
`binlog_format`	`row`	Self-managed MySQL only
`log_slave_updates`	`ON`	Self-managed MySQL in a dual-primary cluster

Binary log retention requirements:

ApsaraDB RDS for MySQL: retain binary logs for at least 3 days; 7 days recommended.
Self-managed MySQL: retain binary logs for at least 7 days.

If binary logs are purged before DTS reads them, the task fails and data inconsistency or loss may occur. See Delete binary log files for how to set the retention period.

Important

For self-managed MySQL, you must also enable binary logging explicitly. For self-managed MySQL in a dual-primary cluster, setting log_slave_updates to ON ensures DTS can read all binary logs. See Create an account for a self-managed MySQL database and configure binary logging.

Other limitations

Synchronize during off-peak hours. Full data synchronization uses read and write resources on both source and destination, which increases server load.
After full data synchronization completes, the destination tablespace is larger than the source due to fragmentation from concurrent INSERT operations.
Do not run pt-online-schema-change or similar online DDL tools on synchronized objects while the task is running. The task will fail.
Writing data from other sources to the destination during synchronization may cause data inconsistency.
If the source has EncDB enabled, full data synchronization is not supported. Transparent Data Encryption (TDE) does not have this restriction—TDE instances support schema, full, and incremental synchronization.
If a DTS task fails, DTS technical support attempts to restore it within 8 hours. During this period, the task may be restarted and task parameters (not database parameters) may be modified.

Special cases for self-managed MySQL

A primary/secondary switchover while the task is running causes the task to fail.
Synchronization latency is calculated from the timestamp of the latest synced data in the destination versus the current source timestamp. If no DML operations run on the source for an extended period, the displayed latency may be inaccurate. To refresh it, run any DML operation on the source. If you're synchronizing an entire database, create a heartbeat table that receives updates every second.
DTS periodically executes CREATE DATABASE IF NOT EXISTS 'test' on the source to advance the binary log position.

Special cases for ApsaraDB RDS for MySQL

Read-only ApsaraDB RDS for MySQL 5.6 instances that do not record transaction logs cannot be used as the source.
DTS periodically executes CREATE DATABASE IF NOT EXISTS 'test' on the source to advance the binary log position.

Create a data synchronization task

Step 1: Open the Data Synchronization page

Use either console to open the task configuration page.

DTS console

Log on to the DTS console.
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the synchronization instance will reside.

DMS console

The exact navigation path depends on your DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console.

Log on to the DMS console.
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list to the right of Data Synchronization Tasks, select the region.

Step 2: Configure source and destination databases

Click Create Task, then configure the following parameters.

Section	Parameter	Description
N/A	Task Name	The task name. DTS auto-generates one. Specify a descriptive name for easy identification—uniqueness is not required.
Source Database	Select Existing Connection	Select a registered database instance to auto-populate parameters, or configure the connection manually.
	Database Type	Select MySQL.
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	The region of the source ApsaraDB RDS for MySQL instance.
	Replicate Data Across Alibaba Cloud Accounts	Select No (same-account scenario).
	RDS Instance ID	The ID of the source instance.
	Database Account	The account with the required permissions. See Required permissions.
	Database Password	The password for the database account.
	Encryption	Select Non-encrypted or SSL-encrypted. For SSL, enable SSL encryption on the RDS instance first. See Use a cloud certificate to enable SSL encryption.
Destination Database	Select Existing Connection	Select a registered instance or configure manually.
	Database Type	Select MaxCompute.
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	The region of the MaxCompute project.
	Project	The name of the destination MaxCompute project.
	AccessKey ID of Alibaba Cloud Account	The AccessKey ID of the account that owns the MaxCompute project.
	AccessKey Secret of Alibaba Cloud Account	The AccessKey secret for the account.

Step 3: Test connectivity

Click Test Connectivity and Proceed at the bottom of the page.

DTS needs access to both databases. If you're connecting to a self-managed database, DTS CIDR blocks must be added to its security settings. See Add the CIDR blocks of DTS servers. For self-managed databases not using Alibaba Cloud Instance access, click Test Connectivity in the CIDR Blocks of DTS Servers dialog.

Click OK to grant permissions to your MaxCompute account, then click Test Connectivity and Proceed.

Step 4: Configure objects

In the Configure Objects step, set the following parameters.

Parameter	Description
Synchronization Types	Select Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. Full data synchronization loads historical data before incremental sync begins.
Naming Rules of Additional Columns	DTS adds metadata columns to destination tables. If these conflict with existing column names, the task fails. Select New Rule or Previous Rule based on your setup. Check for conflicts before proceeding. See Naming rules for additional columns.
Partition Definition of Incremental Data Table	Select partition names based on your requirements. See Partition.
Processing Mode of Conflicting Tables	Precheck and Report Errors (default): fails the precheck if the destination has tables with the same names as the source. Ignore Errors and Proceed: skips this check. Use with caution—if schemas differ, only some columns may sync or the task may fail. If schemas match and a record has the same primary or unique key, full sync retains the existing destination record; incremental sync overwrites it.
Capitalization of Object Names in Destination Instance	Controls the case of database, table, and column names in MaxCompute. Default: DTS default policy. See Specify the capitalization of object names in the destination instance.
Source Objects	Select objects and click the arrow icon to move them to Selected Objects. You can select tables.
Selected Objects	Right-click an object to rename it or filter which SQL operations to sync. To rename multiple objects, click Batch Edit. To filter rows, right-click a table and specify WHERE conditions. See Map object names and Specify filter conditions.

Renaming an object with object name mapping may cause dependent objects to fail synchronization.

Step 5: Configure advanced settings

Click Next: Advanced Settings and configure the following parameters.

Parameter	Description
Dedicated Cluster for Task Scheduling	By default, DTS schedules tasks to the shared cluster. For higher stability, purchase a dedicated cluster. See What is a DTS dedicated cluster.
Retry Time for Failed Connections	How long DTS retries when a connection fails. Valid values: 10–1440 minutes. Default: 720 minutes. Set to at least 30 minutes. If multiple tasks share a source or destination, the shortest retry time applies. During retries, DTS charges for the instance.
Retry Time for Other Issues	How long DTS retries when DML or DDL operations fail. Valid values: 1–1440 minutes. Default: 10 minutes. Set to at least 10 minutes. Must be less than Retry Time for Failed Connections.
Enable Throttling for Full Data Synchronization	Throttle the QPS (queries per second) to the source, the RPS of full data migration, and the data migration speed (MB/s) to reduce load. Available only if Full Data Synchronization is selected.
Enable Throttling for Incremental Data Synchronization	Throttle the RPS and data synchronization speed (MB/s) for incremental sync to reduce destination load.
Environment Tag	Optional tag to identify the DTS instance by environment (for example, production or test).
Whether to delete SQL operations on heartbeat tables of forward and reverse tasks	Yes: DTS does not write heartbeat table operations to the source. A latency reading may be displayed. No: DTS writes heartbeat operations, which may affect physical backup and cloning of the source.
Configure ETL	Yes: enables extract, transform, and load (ETL). Enter processing statements in the code editor. See Configure ETL in a data migration or data synchronization task. No: disables ETL.
Monitoring and Alerting	Yes: configures alerts for task failures or latency exceeding a threshold. Set the threshold and notification contacts. See Configure monitoring and alerting. No: disables alerting.

Step 6: Run the precheck

Click Next: Save Task Settings and Precheck.

To preview the API parameters for this task configuration, hover over the button and click Preview OpenAPI parameters before proceeding.

DTS runs a precheck before the task can start.

If an item fails, click View Details to see the cause. Fix the issue and click Precheck Again.
If an item triggers an alert that can be ignored, click Confirm Alert Details, then click Ignore in the dialog, and click OK. Click Precheck Again to continue. Ignoring alerts may result in data inconsistency.

Step 7: Purchase and start the instance

Wait for Success Rate to reach 100%, then click Next: Purchase Instance.
On the purchase page, configure the following parameters.

Parameter	Description
Billing Method	Subscription: pay upfront for a fixed term. More cost-effective for long-term use. Pay-as-you-go: billed hourly. Suitable for short-term use—release the instance when done to stop charges.
Resource Group Settings	The resource group for the instance. Default: default resource group. See What is Resource Management?
Instance Class	Determines synchronization speed. See Instance classes of data synchronization instances.
Subscription Duration	Available for the subscription billing method. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years.

Read and accept Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start. In the confirmation dialog, click OK.

The task appears in the task list. You can monitor its progress there.

Incremental data table schema

Before querying the incremental data table, run the following command in MaxCompute to allow full table scans:

set odps.sql.allow.fullscan=true;

Each _log table stores change events from the source. In addition to the original column data, DTS adds the following metadata fields to every row:

Field	Description
`record_id`	Unique identifier for the log entry. Auto-increments for each new entry. For UPDATE operations, DTS generates two entries (before and after the change) with the same `record_id`.
`operation_flag`	The operation type: `I` (INSERT), `D` (DELETE), `U` (UPDATE).
`utc_timestamp`	The operation timestamp in UTC, corresponding to the binary log file timestamp.
`before_flag`	`Y` if the row contains pre-update values; `N` otherwise.
`after_flag`	`Y` if the row contains post-update values; `N` otherwise.

Reading UPDATE events: DTS writes two rows for each UPDATE—one with before_flag=Y (the original values) and one with after_flag=Y (the new values). Both rows share the same record_id.

Data Transmission Service:Synchronize data from an RDS for MySQL instance to a MaxCompute project