Use Data Transmission Service (DTS) to synchronize data from an ApsaraDB RDS for MySQL instance to an ApsaraDB for ClickHouse cluster. DTS handles full data initialization and continuous incremental sync via Change Data Capture (CDC), so your ClickHouse cluster stays current without manual intervention.
Prerequisites
Before you begin, make sure that you have:
Authorized DTS to access Alibaba Cloud resources. For more information, see Authorize DTS to access Alibaba Cloud resources.
Created an ApsaraDB for ClickHouse cluster running V20.8 or later with available storage that exceeds the total data size of the source RDS MySQL instance. For more information, see Create an ApsaraDB for ClickHouse cluster.
Created a destination database in the ApsaraDB for ClickHouse cluster. For more information, see Create a database.
Give the destination database the same name as the source database in RDS MySQL. If the names differ, use the name mapping feature in the Selected Objects section during the Configure Objects and Advanced Settings step to map the source database name to the destination database name. For more information, see Map object names.
Billing
| Synchronization type | Pricing |
|---|---|
| Schema synchronization and full data synchronization | Free of charge |
| Incremental data synchronization | Charged. For more information, see Billing overview. |
Supported SQL operations
| Operation type | SQL statements |
|---|---|
| DML | INSERT, UPDATE, DELETE |
| DDL | CREATE TABLE, DROP TABLE, TRUNCATE TABLE, ADD COLUMN, MODIFY COLUMN, DROP COLUMN |
DDL synchronization behavior
Understanding how each DDL statement behaves during synchronization helps you avoid unexpected data loss or task failures.
| DDL operation | Behavior |
|---|---|
| CREATE TABLE | Propagated automatically. The destination table is created with DTS-added fields (_sign, _version, _is_deleted). |
| DROP TABLE | Propagated automatically. The destination table is dropped. |
| TRUNCATE TABLE | Propagated automatically. All rows in the destination table are deleted. |
| ADD COLUMN | Propagated automatically. New rows replicated after the change include the added column. |
| MODIFY COLUMN | Propagated automatically. |
| DROP COLUMN | Propagated automatically. |
| RENAME TABLE | Not supported. The task may fail if the source runs RENAME TABLE. |
| INDEX, PARTITION, VIEW, PROCEDURE, FUNCTION, TRIGGER, FK | Supported. |
Data type mappings
MySQL and ApsaraDB for ClickHouse support different data types, so a one-to-one mapping is not always possible. DTS maps MySQL types to the closest equivalent types supported by ApsaraDB for ClickHouse during initial schema synchronization. For more information, see Data type mappings for initial schema synchronization.
Database account permissions
| Database | Required permissions | How to configure |
|---|---|---|
| Source RDS MySQL | Read permissions on all objects to be synchronized | Create an account and Modify account permissions |
| Destination ApsaraDB for ClickHouse cluster (v22.8 or later) | Read and write permissions on the destination database. A privileged account meets the requirements. | Community-compatible Edition account management |
| Destination ApsaraDB for ClickHouse cluster (v21.8) | Read/Write And Settings and Allow DDL | Community-compatible Edition account management |
If the source database account was not created through the RDS MySQL console, make sure the account has the REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT permissions.
Limitations
Source database limitations
| Limitation | Details |
|---|---|
| Primary key required | Tables without a primary key cannot be synchronized. |
| Table-level sync limit | When synchronizing at the table level with object edits (such as name mapping), a single task supports a maximum of 1,000 tables. If you exceed this limit, split the tables across multiple tasks or configure the task to synchronize the entire database. |
| Binary log retention | Retain local binary logs for at least 3 days (7 days recommended) for RDS MySQL, and at least 7 days for self-managed MySQL. If DTS cannot retrieve the required binary logs, the task fails. In extreme cases, this may cause data inconsistency or data loss. Issues caused by insufficient retention periods are not covered under the DTS SLA. To configure the retention period for an RDS MySQL instance, see Automatically delete local logs. |
| No DDL during synchronization | Do not run DDL operations that change database or table schemas during schema synchronization or full synchronization—the task will fail. During full synchronization, DTS queries the source database, which creates metadata locks that may block DDL operations. |
| Non-binlog data | Data generated by operations that do not write to binary logs—such as physical backup restores or cascade operations—is not synchronized. If this occurs, remove the affected objects from the sync task and add them back, if your business allows it. For more information, see Modify synchronization objects. |
| Invisible columns (MySQL 8.0.23+) | DTS may not read invisible hidden columns in MySQL 8.0.23 or later, which may cause data loss. Run ALTER TABLE <table_name> ALTER COLUMN <column_name> SET VISIBLE; to make hidden columns visible before synchronizing. For more information, see Invisible columns. |
Binary log configuration
Configure these parameters before starting the sync task:
For ApsaraDB RDS for MySQL:
ApsaraDB RDS for MySQL enables binary logging by default. Verify the following:
Set
binlog_row_imagetofull. If it is not, the precheck fails and the task cannot start. For instructions, see Configure instance parameters.Retain local binary logs for at least 3 days (7 days recommended).
For self-managed MySQL:
Enable binary logging.
Set
binlog_formattorow.Set
binlog_row_imagetofull.For dual-primary clusters, also enable
log_slave_updatesso DTS can capture all binary log events.Retain local binary logs for at least 7 days.
For instructions, see Create an account and configure binary logging for a self-managed MySQL database.
Other limitations
| Limitation | Details |
|---|---|
| Non-standard DDL syntax | If the DDL statements of the source RDS MySQL instance do not follow standard MySQL syntax, the task may fail or data may be lost. |
| Online DDL in temporary table mode | If the source database uses online DDL in temporary table mode—including multi-table merge scenarios or function-based indexes on unique key columns—data loss or task failure may occur. |
| Online DDL tools | DMS and gh-ost: DTS synchronizes only the original DDL statements, which may cause destination tables to be locked temporarily. pt-online-schema-change is not supported when synchronizing at the table level—if used at the source, data may be lost or the task may fail. You can use Data Management (DMS) to perform online DDL operations instead. For more information, see Change schemas without locking tables. |
| Primary key or unique key conflicts | During full synchronization, DTS retains the destination record and skips the conflicting source record. During incremental synchronization, DTS overwrites the destination record with the source record. If table schemas are inconsistent, data initialization may fail, resulting in partial or complete synchronization failure. |
| Time-type data range | ApsaraDB for ClickHouse enforces range limits on time-type data. If the time data in RDS MySQL falls outside these ranges, the synchronized values will be incorrect. See Time information. |
| Partition key | The partition key cannot be a nullable field. Supported types: BIGINT, INT, TIMESTAMP, DATETIME, and DATE. |
| Database limit | The number of databases to synchronize cannot exceed 256, the limit of ApsaraDB for ClickHouse. |
| Naming conventions | Database, table, and column names must comply with ApsaraDB for ClickHouse naming conventions. For more information, see Object naming conventions. |
| DTS-added fields | DTS adds _sign, _is_deleted, and _version fields to destination tables during schema synchronization. If you do not select Schema Synchronization, manually create the destination tables and add these fields. See Table and field information. |
| Performance impact | Full data synchronization consumes read and write resources from both the source and destination databases. Run the synchronization during off-peak hours to minimize impact. |
| External writes to destination | Do not allow any data source other than DTS to write to the destination database during synchronization. Doing so will cause data inconsistency. |
| Always-Encrypted | Full data synchronization is not supported for RDS MySQL instances with Always-Encrypted enabled. Instances with Transparent Data Encryption (TDE) enabled support all synchronization types. |
| Task failure recovery | If a task fails, DTS support staff will attempt to restore it within 8 hours. They may restart the task or adjust DTS task parameters (not database parameters) during restoration. |
Usage notes
For self-managed MySQL sources:
If a primary/secondary switchover occurs during synchronization, the task fails.
DTS calculates latency by comparing the timestamp of the last synchronized record against the current time. If no DML operations run for an extended period, latency reporting may become inaccurate. Run a DML operation in the source database to reset the reported latency. If you select a full database for synchronization, create a heartbeat table and update it every second.
DTS periodically runs
CREATE DATABASE IF NOT EXISTS \test\`` in the source database to advance the binary log offset.If the source is Amazon Aurora MySQL or another clustered MySQL instance, make sure the domain name or IP address used in the task configuration always resolves to a read/write (RW) node.
For ApsaraDB RDS for MySQL sources:
Read-only instances—such as ApsaraDB RDS for MySQL 5.6 read-only instances—that do not record transaction logs cannot serve as source databases.
DTS periodically runs
CREATE DATABASE IF NOT EXISTS \test\`` in the source database to advance the binary log offset.
Create a synchronization task
The task configuration has seven steps. After the task starts, DTS first runs a full copy of the source data, then continuously applies incremental changes.
Step 1: Open the data synchronization page
Open the Data Synchronization task list in the destination region using one of the following methods:
DTS console
Log on to the DTS console.
In the left navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the synchronization instance will be located.
DMS console
The steps may vary depending on the mode and layout of the DMS console. For more information, see Simple mode console and Customize the layout and style of the DMS console.
Log on to the DMS console.
In the top menu bar, choose Data + AI > DTS (DTS) > Data Synchronization.
To the right of Data Synchronization Tasks, select the region of the synchronization instance.
Step 2: Create a task
Click Create Task to open the task configuration page.
Step 3: Configure source and destination databases
| Category | Parameter | Description |
|---|---|---|
| None | Task Name | DTS generates a name automatically. Specify a descriptive name for easy identification. The name does not need to be unique. |
| Source Database | Select Existing Connection | Select a registered database instance from the drop-down list to auto-fill the connection details. In the DMS console, this parameter is labeled Select a DMS database instance. If no registered instance is available, configure the connection manually. |
| Database Type | Select MySQL. | |
| Access Method | Select Alibaba Cloud Instance. | |
| Instance Region | Select the region where the source RDS MySQL instance resides. | |
| Replicate Data Across Alibaba Cloud Accounts | Select No (same Alibaba Cloud account). | |
| RDS Instance ID | Select the ID of the source RDS MySQL instance. | |
| Database Account | Enter the database account. For permission requirements, see Database account permissions. | |
| Database Password | Enter the password for the database account. | |
| Encryption | Select Non-encrypted or SSL-encrypted. If you select SSL-encrypted, enable SSL encryption on the RDS MySQL instance first. For more information, see Use a cloud certificate to quickly enable SSL link encryption. | |
| Destination Database | Select Existing Connection | Select a registered database instance from the drop-down list. In the DMS console, this parameter is labeled Select a DMS database instance. If no registered instance is available, configure the connection manually. |
| Database Type | Select ClickHouse. | |
| Access Method | Select Alibaba Cloud Instance. | |
| Instance Region | Select the region where the destination ApsaraDB for ClickHouse cluster resides. | |
| Replicate Data Across Alibaba Cloud Accounts | Select No (same Alibaba Cloud account). | |
| Cluster Type | Select the type of the ApsaraDB for ClickHouse cluster. | |
| Cluster ID | Select the ID of the destination ApsaraDB for ClickHouse cluster. | |
| Database Account | Enter the database account. For permission requirements, see Database account permissions. | |
| Database Password | Enter the password for the database account. |
Step 4: Test connectivity
Click Test Connectivity and Proceed at the bottom of the page.
Add the CIDR blocks of DTS servers to the security settings of both the source and destination databases to allow access. This can be done automatically or manually. For more information, see Add the IP address whitelist of DTS servers.
If the source or destination is a self-managed database (that is, Access Method is not Alibaba Cloud Instance), also click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.
Step 5: Configure objects and advanced settings
Configure objects
On the Configure Objects page, specify the synchronization scope.
| Parameter | Description |
|---|---|
| Synchronization Types | DTS always selects Incremental Data Synchronization. By default, Schema Synchronization and Full Data Synchronization are also selected. After the precheck, DTS initializes the destination cluster with a full copy of the source data, which serves as the baseline for incremental synchronization. |
| Processing Mode of Conflicting Tables | Precheck and Report Errors: Checks for tables with identical names in the destination database. If any are found, an error is reported during precheck and the task does not start. If you cannot delete or rename the conflicting table, use name mapping to assign a different name in the destination. For more information, see Database table column name mapping. Ignore Errors and Proceed: Skips the duplicate table name check. During full synchronization, DTS retains the destination record and skips conflicting source records. During incremental synchronization, DTS overwrites destination records. If table schemas are inconsistent, data initialization may fail. Use with caution. |
| Capitalization of Object Names in Destination Instance | Sets the case-sensitivity policy for database, table, and column names in the destination. The default is DTS default policy. For more information, see Case policy for destination object names. |
| Source Objects | Click the objects to synchronize in the Source Objects box, then click the arrow to move them to the Selected Objects box. Select objects at the database or table level. |
| Selected Objects | To rename a single object in the destination, right-click it in the Selected Objects box. To rename multiple objects in bulk, click Batch Edit in the upper-right corner. To filter data, right-click a table in the Selected Objects box and set filter conditions. For more information, see Set filter conditions. Note that using name mapping may cause dependent objects to fail synchronization. |
Click Next: Advanced Settings.
Advanced settings
| Parameter | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | DTS uses a shared cluster by default. For greater task stability, purchase a dedicated cluster. For more information, see What is a DTS dedicated cluster?. |
| Time zone of destination database | Select the time zone for DateTime data written to the ApsaraDB for ClickHouse cluster. |
| Retry Time for Failed Connections | Sets how long DTS retries after a connection failure. Default: 720 minutes. Range: 10–1,440 minutes. Set to 30 minutes or more. If the connection is restored within this period, the task resumes automatically. If multiple DTS instances share a source or destination, DTS uses the shortest configured retry duration across all instances. DTS charges for task runtime during retries—release the DTS instance promptly after you release the source or destination instances. |
| Retry Time for Other Issues | Sets how long DTS retries after non-connection errors (such as DDL or DML execution failures). Default: 10 minutes. Range: 1–1,440 minutes. Must be less than Retry Time for Failed Connections. |
| Enable Throttling for Full Data Synchronization | Limits the full synchronization rate to reduce load on the destination database. Set Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Only available when Full Data Synchronization is selected. You can also adjust the rate after the task starts. |
| Enable Throttling for Incremental Data Synchronization | Limits the incremental synchronization rate. Set RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s). |
| Whether to delete SQL operations on heartbeat tables of forward and reverse tasks | Controls whether DTS writes heartbeat SQL to the source database. Yes: Does not write heartbeat SQL—the DTS instance may show higher latency. No: Writes heartbeat SQL—this may interfere with physical backups and cloning on the source. |
| Environment Tag | Optionally tag the instance to identify its environment. |
| Configure ETL | Yes: Enables extract, transform, and load (ETL). Enter data processing statements in the code editor. For more information, see Configure ETL in a data migration or data synchronization task. No: Disables ETL. |
| Monitoring and Alerting | Yes: Configures alerts. Set the alert threshold and notification contacts. For more information, see Configure monitoring and alerting during task configuration. No: No alerts configured. |
Configure database and table fields
Click Next: Configure Database and Table Fields to configure the Type, Primary Key Column, Sort Key, Distribution Key, and Partition Key for each table being synchronized to ClickHouse.
DTS provides a default configuration. Set Definition Status to All to review and modify all tables.
The Primary Key Column and Sort Key support composite keys—select multiple fields from the drop-down list.
Only one field can be selected as the Distribution Key.
Select one or more columns from the Primary Key Column as the Partition Key.
The Partition Key is optional, but if specified, it cannot be a nullable field. Supported types: BIGINT, INT, TIMESTAMP, DATETIME, and DATE. For the calculation logic, see Partition key calculation logic.
For more information about primary key columns, sort keys, and partition keys, see CREATE TABLE.
Step 6: Save the task and run the precheck
To preview the API parameters for this task, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters in the tooltip.
Click Next: Save Task Settings and Precheck.
DTS runs a precheck before the task starts. The task only starts if the precheck passes.
If the precheck fails, click View Details next to the failed item, fix the issue, and rerun the precheck.
If the precheck generates warnings:
For non-ignorable warnings, click View Details, fix the issue, and rerun the precheck.
For ignorable warnings, click Confirm Alert Details > Ignore > OK, then click Precheck Again. Ignoring warnings may lead to data inconsistencies. Proceed with caution.
Step 7: Purchase the synchronization instance
When the Success Rate reaches 100%, click Next: Purchase Instance.
On the Purchase page, select the billing method and instance class.
| Category | Parameter | Description |
|---|---|---|
| New Instance Class | Billing method | Subscription: Pay upfront for a fixed duration. Suitable for long-term, continuous tasks. Monthly options: 1–9 months. Yearly options: 1, 2, 3, or 5 years. Pay-as-you-go: Billed hourly for actual usage. Suitable for short-term or test tasks. |
| Resource Group Settings | — | The resource group for the instance. Default: default resource group. For more information, see What is Resource Management?. |
| Instance Class | — | Affects synchronization performance. Select based on your business requirements. For more information, see Data synchronization link specifications. |
Read and select the Data Transmission Service (Pay-as-you-go) Service Terms checkbox.
Click Buy and Start, then click OK.
Monitor the task progress on the data synchronization page.
Appendix
Time information
| Data type | Minimum value | Maximum value |
|---|---|---|
| Date | 1970-01-01 00:00:00 | 2149-06-06 00:00:00 |
| Date32 | 1925-01-01 00:00:00 | 2283-11-11 00:00:00 |
| DateTime | 1970-01-01 08:00:00 | 2106-02-07 14:28:15 |
| DateTime64 | 1925-01-01 08:00:00 | 2283-11-12 07:59:59 |
Table and field information
Table requirements
If you do not use the object name mapping feature, the tables you create must meet the following requirements.
If the destination table specifies an ENGINE, it must be ENGINE = ReplicatedReplacingMergeTree(_version, _is_deleted). Otherwise, data inconsistency may occur.
Community Edition instance: Create one local table and one distributed table. The distributed table name must match the source table name. The local table name must follow the format
<distributed_table_name>_local.Enterprise Edition instance: Create a table with the same name as the source table.
Fields added by DTS
DTS adds the following fields to destination tables during schema synchronization. To query data and filter out deleted records, run:
SELECT * FROM table_name FINAL WHERE _sign > 0;The FINAL keyword filters out rows with duplicate sort keys, and _sign > 0 excludes deleted records.
| Version | Field | Data type | Default value | Description |
|---|---|---|---|---|
| Community Edition earlier than v23.8 | _sign | Int8 | 1 | DML operation type. INSERT and UPDATE = 1. DELETE = -1. |
| Community Edition earlier than v23.8 | _version | UInt64 | 1 | Timestamp when the record was written to ClickHouse. |
| Enterprise Edition and Community Edition v23.8 or later | _sign | Int8 | 1 | DML operation type. INSERT and UPDATE = 1. DELETE = -1. |
| Enterprise Edition and Community Edition v23.8 or later | _is_deleted | UInt8 | 0 | Whether the record is deleted. INSERT and UPDATE = 0. DELETE = 1. |
| Enterprise Edition and Community Edition v23.8 or later | _version | UInt64 | 1 | Timestamp when the record was written to ClickHouse. |
Partition key calculation logic
| Source field type | Calculation logic |
|---|---|
| BIGINT | intDiv(field, 18014398509481984) |
| INT | intDiv(field, 4194304) |
| TIMESTAMP | toYYYYMM(field) |
| DATETIME | toYYYYMM(field) |
| DATE | toYYYYMM(field) |