Synchronize MySQL data to SelectDB - Data Transmission Service

Data Transmission Service (DTS) continuously replicates data from an ApsaraDB RDS for MySQL instance to an ApsaraDB for SelectDB instance. The sync pipeline covers schema synchronization, an initial full data load, and ongoing incremental change-data capture (CDC), so your SelectDB analytics workload always queries up-to-date production data.

What this route supports

Capability	Details
Sync types	Schema synchronization, full data synchronization, incremental data synchronization
Supported DML	INSERT, UPDATE, DELETE
Supported DDL	ADD COLUMN, MODIFY COLUMN, CHANGE COLUMN, DROP COLUMN, DROP TABLE, TRUNCATE TABLE, RENAME TABLE
Destination engines	Unique engine, Duplicate engine
Max tables (with name mapping)	1,000 per task
Incremental latency (normal)	Less than 10 seconds
Billing	Schema sync and full sync: free. Incremental sync: charged

Prerequisites

Before you begin, make sure you have:

A destination ApsaraDB for SelectDB instance whose storage space is larger than the storage used by the source RDS for MySQL instance. See Create an instance
The required permissions on both databases. See Required permissions
Binary logging enabled on the source with binlog_row_image set to full. RDS for MySQL enables binary logging by default. Run the following SQL to confirm the parameter value:
```
SHOW VARIABLES LIKE 'binlog_row_image';
```
The result must show full. If it does not, update the parameter in the RDS console. See Set instance parameters
Binary logs retained for the required minimum period. A shorter retention period can cause the DTS task to fail. In extreme cases, data inconsistency or data loss may occur. Issues caused by insufficient binary log retention are not covered by the DTS Service-Level Agreement (SLA):
- RDS for MySQL: at least 3 days (7 days recommended). See Automatically delete local logs
- Self-managed MySQL: at least 7 days

Choose a destination engine

Before you configure the task, decide which SelectDB engine to use for each destination table. The engine determines how DTS handles duplicate data and UPDATE/DELETE operations.

Dimension	Unique engine	Duplicate engine
Deduplication	Automatic (based on unique keys)	Manual (use `_is_deleted`, `_version`, `_record_id`)
UPDATE/DELETE handling	Native UPDATE/DELETE in destination	Converted to INSERT; use additional columns to filter
Duplicate data risk	Low — if source and destination unique keys match	Higher — can occur after task retry, restart, or concurrent DML
When to use	Source tables have a primary key or unique key	Source tables have no primary key or unique key
Required setup	All unique keys in the destination must also exist in the source and be included in sync objects	Select Schema Synchronization and set Engine to duplicate for the table

Additional columns for the Duplicate engine

DTS automatically adds the following columns to destination tables that use the Duplicate engine. Use these columns to filter duplicate rows in queries.

Column	Data type	Value
`_is_deleted`	INT	0 for INSERT and UPDATE; 1 for DELETE
`_version`	BIGINT	0 during full sync; timestamp (seconds) from source binary log during incremental sync
`_record_id`	BIGINT	0 during full sync; unique, incrementing record ID from the incremental log during incremental sync

Required permissions for the database accounts

Database	Required permissions	How to grant
Source RDS for MySQL	Read and write permissions on the synchronization objects	Create an account and Modify account permissions
Destination ApsaraDB for SelectDB	`Usage_priv` (cluster access), `Select_priv`, `Load_priv`, `Alter_priv`, `Create_priv`, `Drop_priv`	Permission management and Basic permission management

If the source account was not created in the RDS for MySQL console, the account must also have the REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT permissions.

Billing

Synchronization type	Fee
Schema synchronization and full data synchronization	Free
Incremental data synchronization	Charged. See Billing overview

Limitations

Source database

All tables to be synchronized must have a primary key or a UNIQUE constraint. For tables with neither, select Schema Synchronization for Synchronization Types and set Engine to duplicate for those tables during configuration. Otherwise, the task may fail or data may be lost.

DTS adds extra columns to destination tables that use the Duplicate engine. See Additional columns for the Duplicate engine.
During initial schema synchronization and initial full data synchronization, do not run DDL operations that change database or table schemas. DTS queries the source during full sync, which creates metadata locks that may block DDL operations on the source.
Data changes that are not recorded in binary logs — such as data restored from a physical backup or data generated by cascade operations — are not synchronized.
If the source database is MySQL 8.0.23 or later and the data to be synchronized contains invisible columns, those columns cannot be read and data loss may occur. To make invisible columns visible, run:
```
ALTER TABLE <table_name> ALTER COLUMN <column_name> SET VISIBLE;
```
See Invisible Columns for details.
If you need to map table or column names during table-level sync, the task supports a maximum of 1,000 tables. Exceeding this limit causes a post-submission error. Split the tables across multiple tasks, or configure the task to synchronize the entire database instead.
If the always-encrypted (EncDB) feature is enabled for the source RDS for MySQL instance, initial full data synchronization is not supported. For RDS for MySQL instances with Transparent Data Encryption (TDE) enabled, schema sync, full sync, and incremental sync are all supported.
RDS for MySQL instances that do not record transaction logs, such as read-only instances of RDS for MySQL 5.6, are not supported as a source.
DTS periodically runs the CREATE DATABASE IF NOT EXISTS \test\`` command on the source RDS for MySQL instance to advance the binary log offset.
For self-managed MySQL sources:
- Enable binary logging, set binlog_format to row, and set binlog_row_image to full.
- For a primary/primary cluster (both nodes acting as primary and secondary to each other), enable log_slave_updates to make sure DTS can read all binary logs.
- Retain binary logs for at least 7 days.
- If a primary/secondary switchover occurs during synchronization, the task fails.
- If the source is an Amazon Aurora MySQL instance or another cluster-mode MySQL instance, the domain name or IP address configured for the task must always resolve to the read/write (RW) node address.
- DTS periodically runs the CREATE DATABASE IF NOT EXISTS \test\`` command on the source database to advance the binary log offset.
- DTS latency is calculated by comparing the timestamp of the last synchronized data record in the destination with the current timestamp. If no DML operations are performed on the source database for a long time, the displayed latency may be inaccurate. To update the latency reading, perform a DML operation on the source. If you synchronize the entire database, you can also create a heartbeat table — DTS updates the heartbeat table every second.

Destination

Only Unique or Duplicate engine tables are supported in the SelectDB destination. Synchronization of INDEX, PARTITION, VIEW, PROCEDURE, FUNCTION, TRIGGER, and FK objects is not supported.
Do not create a new cluster in the destination SelectDB instance during synchronization. If the task fails, restart the synchronization instance to resume.
Do not add backend (BE) nodes to the SelectDB instance during synchronization. If the task fails, restart the synchronization instance to resume.
Database and table names must start with a letter. Use the mapping feature to rename objects that do not comply.
Object names (databases, tables, or columns) that contain Chinese characters must be mapped to compliant names. Otherwise, the task may fail.
DDL operations that modify multiple columns at once, and consecutive DDL operations on the same table, are not supported.
In a multi-table merge scenario (multiple source tables synchronizing to a single destination table), all source tables must have the same schema to avoid data inconsistency or task failure.
Online DDL changes made with pt-online-schema-change on the source are not supported. Data loss or task failure may occur.
Online DDL changes made with DMS or the gh-ost tool are synchronized as the original DDL. No temporary table data is replicated, but the destination table may be locked temporarily.
Do not write data to the destination from other sources during synchronization. This causes data inconsistency between source and destination.
Before you synchronize data, evaluate the performance of the source and destination databases. We recommend that you perform data synchronization during off-peak hours to reduce the impact on source and destination database load.
During incremental synchronization, DTS uses a batch synchronization policy. By default, DTS writes to a single sync object at most once every 5 seconds, which may result in a normal synchronization latency of less than 10 seconds. To reduce latency, adjust the selectdb.reservoir.timeout.milliseconds parameter in the DTS console. The valid range is 1,000–10,000 milliseconds.

A shorter batching interval increases write frequency, which may increase load and write response time on the destination — and in turn increase DTS synchronization latency. Adjust based on actual destination load.
MySQL VARCHAR(M) uses character length. SelectDB VARCHAR(N) uses byte length. If you are not using schema synchronization, set the VARCHAR field length in SelectDB to four times the MySQL length. The maximum VARCHAR length in SelectDB is 65,533 bytes; fields that exceed this limit are converted to STRING.
Only the bucket_count parameter can be set in the Selected Objects box. The value must be a positive integer. The default is auto.
Initial full data synchronization uses concurrent INSERT operations, which causes fragmentation in destination tables. After full sync completes, the tablespace of the destination instance will be larger than that of the source.

Create a synchronization task

Step 1: Go to the Data Synchronization page

Use one of the following consoles:

DTS console

Log on to the DTS console.
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the data synchronization instance resides.

DMS console

The actual operations may vary based on the mode and layout of the DMS console. See Simple mode and Customize the layout and style of the DMS console.

Log on to the DMS console.
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list to the right of Data Synchronization Tasks, select the region.

Step 2: Configure source and destination connections

Click Create Task and configure the following parameters.

Section	Parameter	Value
—	Task Name	A descriptive name to identify the task. Names do not need to be unique
Source Database	Select Existing Connection	Select a registered instance, or configure manually if not registered
	Database Type	MySQL
	Access Method	Alibaba Cloud Instance
	Instance Region	The region of the source RDS for MySQL instance
	Replicate Data Across Alibaba Cloud Accounts	No (for same-account sync)
	RDS Instance ID	The ID of the source RDS for MySQL instance
	Database Account	The account with the required source permissions
	Database Password	The account password
	Encryption	Non-encrypted or SSL-encrypted. To use SSL encryption, enable SSL on the RDS for MySQL instance first. See Use a cloud certificate to enable SSL encryption
Destination Database	Select Existing Connection	Select a registered instance, or configure manually if not registered
	Database Type	SelectDB
	Access Method	Alibaba Cloud Instance
	Instance Region	The region of the destination SelectDB instance
	Replicate Data Across Alibaba Cloud Accounts	No (for same-account sync)
	Instance ID	The ID of the destination SelectDB instance
	Database Account	The account with the required destination permissions
	Database Password	The account password

Click Test Connectivity and Proceed.

Make sure the CIDR blocks of DTS servers are added to the security settings of both the source and destination databases. See Add DTS server IP addresses to a whitelist.

Step 3: Configure sync objects

In the Configure Objects step, set the following parameters.

Parameter	Description
Synchronization Types	Select Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. Full sync provides the baseline data for incremental sync. If you skip schema synchronization, create destination tables with the Unique or Duplicate model manually beforehand. See Data type mapping and Data model
Processing Mode of Conflicting Tables	Precheck and Report Errors (default): fails the precheck if a same-named table exists in the destination. Use this to prevent unintended overwrites. Ignore Errors and Proceed: skips the check. If schemas match, source records overwrite destination records with the same primary or unique key. If schemas differ, initialization may fail or only partial data may be synchronized
Capitalization of Object Names in Destination Instance	Controls how DTS capitalizes database, table, and column names in the destination. Default: DTS default policy. See Specify the capitalization of object names in the destination instance
Source Objects	Select objects at the database or table level and click the arrow icon to add them to Selected Objects
Selected Objects	Right-click an object to: rename it in the destination (see Map schemas, tables, and columns); set the `bucket_count` parameter if schema synchronization is enabled; select specific DML/DDL operations; or set a WHERE clause filter (see Set filter conditions)

Using the object name mapping feature may cause other objects that depend on the mapped object to fail to synchronize.

Click Next: Advanced Settings and configure the following parameters.

Parameter	Description
Dedicated Cluster for Task Scheduling	Leave blank to use the shared cluster. Purchase a dedicated cluster for higher stability. See What is a DTS dedicated cluster
Retry Time for Failed Connections	Time range (in minutes) DTS retries after a connection failure. Valid range: 10–1,440. Default: 720. Set to more than 30 minutes. If reconnection succeeds within this window, DTS resumes the task
Retry Time for Other Issues	Time range (in minutes) DTS retries after DDL or DML failures. Valid range: 1–1,440. Default: 10. Set to more than 10 minutes. Must be less than Retry Time for Failed Connections
Enable Throttling for Full Data Synchronization	Limits QPS to source and data transfer rate during full sync to reduce source and destination load. Available only when Full Data Synchronization is selected
Enable Throttling for Incremental Data Synchronization	Limits RPS and transfer rate during incremental sync to reduce destination load
Environment Tag	Optional. Tags the instance with an environment label
Whether to delete SQL operations on heartbeat tables of forward and reverse tasks	Yes: DTS does not write heartbeat table operations to the source. A latency indicator may appear on the instance. No: DTS writes heartbeat operations to the source. This may affect physical backup and cloning of the source
Configure ETL	Yes: enables the extract, transform, and load (ETL) feature and opens a code editor for data processing statements. See Configure ETL in a data migration or data synchronization task. No: disables ETL
Monitoring and Alerting	Yes: sends alerts when the task fails or synchronization latency exceeds the threshold. Configure alert thresholds and contacts. No: no alerts

Optional: Click Next: Configure Database and Table Fields to set the Primary Key Column, Distribution Key, and Engine for destination tables. This step is available only when Schema Synchronization is selected.

Set Definition Status to All to view and modify all tables.

For Primary Key Column, you can select multiple columns to form a composite primary key. Select one or more primary key columns as the Distribution Key.

For tables with neither a primary key nor a unique key, set Engine to duplicate.

Step 4: Run the precheck and purchase the instance

Click Next: Save Task Settings and Precheck.

To preview the API parameters for this task, move the pointer over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
Wait for the precheck to complete. If the precheck fails:
- Click View Details next to each failed item, address the issue, and click Precheck Again.
- For alert items that can be ignored, click Confirm Alert Details, then Ignore, then OK, then Precheck Again. Ignoring alerts may result in data inconsistency.
After the precheck reaches a Success Rate of 100%, click Next: Purchase Instance.

On the buy page, configure the following parameters:

Parameter	Description
Billing Method	Subscription: pay upfront; more cost-effective for long-term use. Subscription durations: 1–9 months, or 1, 2, 3, or 5 years. Pay-as-you-go: billed per hour; suitable for short-term use
Resource Group Settings	The resource group for the instance. Default: default resource group
Instance Class	The sync speed varies by instance class. See Instance classes of data synchronization instances

Read and select Data Transmission Service (Pay-as-you-go) Service Terms, then click Buy and Start. In the dialog box, click OK.

The task appears in the task list. DTS first runs schema synchronization and full data synchronization, then switches to incremental data synchronization.

If a DTS instance fails, DTS support will attempt to recover it within 8 hours. Recovery may involve restarting the instance or adjusting DTS instance parameters (not database parameters). For the parameters that may be modified, see Modify instance parameters.

Data type mapping

MySQL and SelectDB use different type systems. DTS converts types automatically during schema synchronization.

Category	MySQL data type	SelectDB data type	Notes
Integer	TINYINT	TINYINT
	TINYINT UNSIGNED	SMALLINT
	SMALLINT	SMALLINT
	SMALLINT UNSIGNED	INT
	MEDIUMINT	INT
	MEDIUMINT UNSIGNED	BIGINT
	INT	INT
	INT UNSIGNED	BIGINT
	BIGINT	BIGINT
	BIGINT UNSIGNED	LARGEINT
	BIT(M)	INT
Decimal	DECIMAL	DECIMAL	zerofill is not supported
	NUMERIC	DECIMAL
	FLOAT	FLOAT
	DOUBLE	DOUBLE
	BOOL / BOOLEAN	BOOLEAN
Date and time	DATE	DATEV2
	DATETIME[(fsp)]	DATETIMEV2
	TIMESTAMP[(fsp)]	DATETIMEV2
	TIME[(fsp)]	VARCHAR
	YEAR[(4)]	INT
String	CHAR / VARCHAR	VARCHAR	Converted to VARCHAR(4×n). If length exceeds 65,533 bytes, converted to STRING. Default length when unspecified: VARCHAR(65533)
	BINARY / VARBINARY	STRING
	TINYTEXT / TEXT / MEDIUMTEXT / LONGTEXT	STRING
	TINYBLOB / BLOB / MEDIUMBLOB / LONGBLOB	STRING
	ENUM	STRING
	SET	STRING
	JSON	STRING

Important

CHAR and VARCHAR(n) are converted to VARCHAR(4×n) to prevent data loss, because MySQL VARCHAR(M) measures character length while SelectDB VARCHAR(N) measures byte length.