Data Transmission Service (DTS) lets you synchronize data from a MySQL database, such as a self-managed MySQL database or an ApsaraDB RDS for MySQL instance, to an ApsaraDB for ClickHouse cluster. ApsaraDB for ClickHouse is a columnar database built for online analytical processing (OLAP) that delivers aggregate analysis and large-table queries at speeds an order of magnitude faster than other analytic databases. This guide walks you through configuring a synchronization task that performs an initial full load followed by continuous incremental CDC synchronization.
Before you begin
Before you begin, ensure that you have:
A destination ApsaraDB for ClickHouse cluster running version 20.8 or later. See Create a cluster
Storage space in the ClickHouse cluster that exceeds the storage space used by the source RDS MySQL instance
A database account on the source RDS MySQL instance with read permissions on the objects to be synchronized. If the account was not created through the RDS console, it must also have the REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT permissions. See Create an account and Modify account permissions
A database account on the destination ApsaraDB for ClickHouse cluster with the required permissions:
Version 22.8 or later: read and write permissions (a privileged account meets this requirement)
Version 21.8: Read/Write And Settings and Allow DDL
Binary logging configured correctly on the source database:
ApsaraDB RDS for MySQL: Binary logging is enabled by default. The
binlog_row_imageparameter must be set tofull. See Modify instance parametersSelf-managed MySQL: You must manually enable the binary logging feature and set
binlog_formattorowandbinlog_row_imagetofull. If the source is a self-managed MySQL database deployed in a dual-primary cluster, you must also setlog_slave_updatestoON. See Create an account for a self-managed MySQL database and configure binary loggingBinary logs must be retained for at least 3 days for ApsaraDB RDS for MySQL (7 days recommended), and at least 7 days for self-managed MySQL databases. See Delete binary log files
Billing
| Synchronization type | Fee |
|---|---|
| Schema synchronization and full data synchronization | Free |
| Incremental data synchronization | Charged. See Billing overview. |
Supported SQL operations
| Operation type | SQL statement |
|---|---|
| DML | INSERT, UPDATE, DELETE |
| DDL | CREATE TABLE, DROP TABLE, TRUNCATE TABLE; ADD COLUMN, MODIFY COLUMN, DROP COLUMN |
Data type mappings
MySQL and ApsaraDB for ClickHouse support different data types, so one-to-one type mapping is not always possible. During schema synchronization, DTS maps source types to the closest supported types in ClickHouse. For the full mapping table, see Data type mappings for initial schema synchronization.
Limitations
Source database limitations
| Constraint | Details |
|---|---|
| Primary key required | Tables without a primary key cannot be synchronized. |
| Table-level sync limit | When synchronizing at the table level with object name mapping, a single task supports a maximum of 1,000 tables. If you exceed this limit, split the tables into multiple tasks or configure the task to synchronize the entire database. |
| Binary logging requirements | For ApsaraDB RDS for MySQL, binlog_row_image must be set to full. Otherwise, the precheck fails and the task cannot start. For self-managed MySQL databases, you must enable binary logging and set binlog_format to row and binlog_row_image to full. For self-managed MySQL databases deployed in a dual-primary cluster, log_slave_updates must also be set to ON. |
| Binary log retention | ApsaraDB RDS for MySQL: retain binary logs for at least 3 days (7 days recommended). Self-managed MySQL databases: retain binary logs for at least 7 days. If DTS cannot obtain the binary logs, the task may fail or data inconsistency may occur. |
| No DDL during initial sync | Do not run DDL statements that change database or table schemas during schema synchronization or full data synchronization. |
| MySQL 8.0.23 and later | If the source MySQL version is 8.0.23 or later and the data includes invisible columns, those columns cannot be read and data loss occurs. To make a column visible, run ALTER TABLE <table_name> ALTER COLUMN <column_name> SET VISIBLE;. Tables without explicit primary keys automatically generate invisible primary keys — make them visible before synchronization. |
| EncDB | If EncDB is enabled on the RDS MySQL instance, full data synchronization cannot be performed. Instances with Transparent Data Encryption (TDE) enabled support schema synchronization, full data synchronization, and incremental data synchronization. |
| Read-only instances | A read-only ApsaraDB RDS for MySQL 5.6 instance (which does not record transaction logs) cannot be used as the source database. |
Destination and other limitations
| Constraint | Details |
|---|---|
| RENAME TABLE | The RENAME TABLE operation is not supported. |
| Supported object types | Synchronization of INDEX, PARTITION, VIEW, PROCEDURE, FUNCTION, TRIGGER, and FK is supported. |
| Non-standard DDL | If the source RDS MySQL instance uses DDL statements that do not follow standard MySQL syntax, the synchronization task may fail or data may be lost. |
| Online DDL with temporary tables | If you run online DDL operations that use temporary tables (such as table merges) on the source, data may be lost at the destination or the synchronization instance may fail. |
| Online DDL tools | DTS supports online DDL changes made with Data Management (DMS) or gh-ost — it synchronizes the original DDL statements to the destination. DTS does not support synchronizing online DDL changes made with pt-online-schema-change; if such changes exist, data may be lost at the destination or the synchronization instance may fail. |
| ClickHouse database limit | The number of databases to be synchronized cannot exceed the ClickHouse limit of 256. |
| Object naming | Database, table, and column names must comply with ClickHouse naming conventions. |
| Extra fields added by DTS | During schema synchronization, DTS adds _sign, _is_deleted, and _version fields to destination tables. If you skip schema synchronization, create the destination tables manually and add these fields. See Table and field information. |
| Partition Key type and nullability | The Partition Key cannot be a nullable field, or the task will fail. Supported types: BIGINT, INT, TIMESTAMP, DATETIME, and DATE. |
| ClickHouse time range | Time-type data in ApsaraDB for ClickHouse has range limits. If the source MySQL time data is outside the supported range, the synchronized values will be incorrect. See Time information. |
| Writes to destination during sync | Do not allow any data outside of DTS to be written to the destination database during synchronization. Doing so causes data inconsistency between the source and destination. |
| Primary/unique key conflicts | During full data synchronization, if a record in the destination has the same primary key or unique key value as a source record, the destination record is retained (not overwritten). During incremental data synchronization, the destination record is overwritten. If the source and destination schemas differ, data initialization may fail. |
| Synchronization performance | Evaluate source and destination database performance before starting. Run data synchronization during off-peak hours to avoid increased database load. |
| DTS task failure recovery | If a DTS task fails, DTS support attempts to restore it within 8 hours. The task may be restarted and task parameters may be modified during restoration. |
Usage notes
DTS executes the
CREATE DATABASE IF NOT EXISTS 'test'statement in the source database on a schedule to advance the binary log file position.DTS calculates synchronization latency based on the timestamp of the latest synchronized data at the destination and the current timestamp at the source. If no DML operations are performed on the source for a long time, latency readings may be inaccurate. Run a DML operation on the source to refresh the latency. If you synchronize an entire database, create a heartbeat table that is updated every second.
If the source database is a self-managed MySQL database and you perform a primary/secondary switchover on the source while the data synchronization task is running, the task fails.
Configure a data synchronization task
Step 1: Open the Data Synchronization page
Use one of the following methods:
DTS console
Log on to the DTS console.DTS console
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the synchronization instance resides.
DMS console
Operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console.DMS console
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list to the right of Data Synchronization Tasks, select the region where the instance resides.
Step 2: Configure source and destination databases
Click Create Task.
Configure the parameters described in the following table.
Task name
| Parameter | Description |
|---|---|
| Task Name | Enter a descriptive name for the task. DTS generates a name automatically; you do not need to change it, and it does not need to be unique. |
Source database
| Parameter | Description |
|---|---|
| Select Existing Connection | If the source instance is already registered with DTS, select it from the drop-down list. DTS populates the remaining parameters automatically. Otherwise, configure the parameters below. |
| Database Type | Select MySQL. |
| Access Method | Select Alibaba Cloud Instance. |
| Instance Region | Select the region where the source RDS MySQL instance resides. |
| Replicate Data Across Alibaba Cloud Accounts | Select No for same-account synchronization. |
| RDS Instance ID | Select the ID of the source RDS MySQL instance. |
| Database Account | Enter the database account. See Before you begin for permission requirements. |
| Database Password | Enter the account password. |
| Encryption | Select Non-encrypted or SSL-encrypted. To use SSL encryption, enable SSL on the RDS MySQL instance first. See Use a cloud certificate to enable SSL encryption. |
Destination database
| Parameter | Description |
|---|---|
| Select Existing Connection | If the destination cluster is already registered with DTS, select it from the drop-down list. DTS populates the remaining parameters automatically. Otherwise, configure the parameters below. |
| Database Type | Select ClickHouse. |
| Access Method | Select Alibaba Cloud Instance. |
| Instance Region | Select the region where the destination ApsaraDB for ClickHouse cluster resides. |
| Replicate Data Across Alibaba Cloud Accounts | Select No for same-account synchronization. |
| Cluster Type | Select the cluster type. |
| Cluster ID | Select the ID of the destination cluster. |
| Database Account | Enter the database account. See Before you begin for permission requirements. |
| Database Password | Enter the account password. |
Click Test Connectivity and Proceed.
Make sure that the CIDR blocks of DTS servers are allowed by the security settings of the source and destination databases. DTS adds these CIDR blocks automatically for Alibaba Cloud instances. For self-managed databases, add them manually. See Add the CIDR blocks of DTS servers.
Step 3: Select synchronization objects
In the Configure Objects step, configure the following settings.
| Configuration | Description |
|---|---|
| Synchronization Types | Select Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. DTS synchronizes historical data during the initial full load, which is required for subsequent incremental synchronization. |
| Processing Mode of Conflicting Tables | Precheck and Report Errors (default): the precheck fails if the destination contains tables with the same names as source tables. Ignore Errors and Proceed: skips this check. > Warning Selecting Ignore Errors and Proceed may cause data inconsistency. |
| Capitalization of Object Names in Destination Instance | By default, DTS default policy is used. Select another option if you need to match the capitalization of the source or destination. See Specify the capitalization of object names. |
| Source Objects | Select objects from the Source Objects section and click the icon to add them to Selected Objects. Select at the database or table level. |
| Selected Objects | To rename a single object, right-click it. To rename multiple objects, click Batch Edit. See Map object names. To filter rows, right-click a table and set filter conditions. See Set filter conditions. Note
|
If you use object name mapping, other objects that depend on mapped objects may fail to synchronize.
Click Next: Advanced Settings and configure the following parameters.
| Configuration | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | By default, DTS schedules the task to a shared cluster. For higher stability, purchase a dedicated cluster. See What is a DTS dedicated cluster. |
| Time zone of destination database | Select the time zone for DateTime data written to the ClickHouse cluster. |
| Retry Time for Failed Connections | The time range (in minutes) DTS retries if the source or destination database fails to connect after the task starts. Valid values: 10–1440. Default: 720. Set this to more than 30. |
| Retry Time for Other Issues | The time range (in minutes) DTS retries failed DDL or DML operations. Valid values: 1–1440. Default: 10. Set this to more than 10. Must be less than Retry Time for Failed Connections. |
| Enable Throttling for Full Data Synchronization | Configure QPS (queries per second), RPS (rows per second), and data speed limits to reduce load on the source and destination databases during full data synchronization. Available only when Full Data Synchronization is selected. |
| Enable Throttling for Incremental Data Synchronization | Configure RPS and data speed limits for incremental data synchronization to reduce destination database load. |
| Whether to delete SQL operations on heartbeat tables of forward and reverse tasks | Yesalert notification settings: DTS does not write heartbeat SQL to the source, but latency readings may be displayed. No: DTS writes heartbeat SQL to the source, which may affect source database features such as physical backup and cloning. |
| Environment Tag | (Optional) Select a tag to identify the instance. |
| Configure ETL | Select Yes to configure extract, transform, and load (ETL) processing. See Configure ETL in a data migration or data synchronization task. Select No to skip. |
| Monitoring and Alerting | Select Yes to receive alerts if the task fails or latency exceeds a threshold. Configure the threshold and notification settings. See Configure monitoring and alerting. |
Click Next: Configure Database and Table Fields to configure ClickHouse table settings. For each table to synchronize, set the Type, Primary Key Column, Sort Key, Distribution Key, and Partition Key:
DTS provides default configurations. Set Definition Status to All to review and modify them.
Primary Key Column and Sort Key support composite keys — select multiple fields from the drop-down lists.
Select one or more columns from Primary Key Column as the Partition Key. Only one field can be selected as the Distribution Key.
The Partition Key is optional but cannot be a nullable field. Only BIGINT, INT, TIMESTAMP, DATETIME, and DATE types are supported as partition keys. See Calculation logic for partition keys.
For details on primary key columns, sort keys, and partition keys, see CREATE TABLE.
Step 4: Run the precheck
Click Next: Save Task Settings and Precheck.
To preview the API parameters for this configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
DTS runs a precheck before the task can start.
If the precheck fails, click View Details next to each failed item, resolve the issues, and rerun the precheck.
If an alert is triggered:
For alerts that cannot be ignored: click View Details, fix the issue, and rerun the precheck.
For alerts that can be ignored: click Confirm Alert Details > Ignore > OK, then click Precheck Again. Ignoring alerts may cause data inconsistency.
Step 5: Purchase the instance
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
On the buy page, configure the following parameters.
| Parameter | Description |
|---|---|
| Billing Method | Subscription: pay upfront for a set period. More cost-effective for long-term use. Pay-as-you-go: billed hourly. Release the instance when no longer needed to stop charges. |
| Resource Group Settings | The resource group for the synchronization instance. Default: default resource group. See What is Resource Management? |
| Instance Class | Select a class based on your required synchronization speed. See Instance classes of data synchronization instances. |
| Subscription Duration | Available for the Subscription billing method. Options: 1–9 months, or 1, 2, 3, or 5 years. |
Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the confirmation dialog.
The task appears in the task list. Monitor its progress from there.
Appendix
Time information
| Data type | Minimum value | Maximum value |
|---|---|---|
| Date | 1970-01-01 00:00:00 | 2149-06-06 00:00:00 |
| Date32 | 1925-01-01 00:00:00 | 2283-11-11 00:00:00 |
| DateTime | 1970-01-01 08:00:00 | 2106-02-07 14:28:15 |
| DateTime64 | 1925-01-01 08:00:00 | 2283-11-12 07:59:59 |
Table and field information
Table information
If you create destination tables manually (without using object name mapping), the tables must meet the following requirements.
If the destination table includes an ENGINE clause, it must be ENGINE = ReplicatedReplacingMergeTree(_version, _is_deleted). Otherwise, data inconsistency may occur.
ClickHouse Community Edition instance: create one local table and one distributed table. The distributed table name must match the source table name. The local table name must be
<distributed_table_name>_local.ClickHouse Enterprise Edition instance: create a table with the same name as the source table.
Field information
DTS adds the following fields to each destination table. To query data, run:
SELECT * FROM table_name FINAL WHERE _sign > 0;The WHERE _sign > 0 condition filters out deleted records. The FINAL keyword deduplicates rows with the same sort key.
| Version | Name | Data type | Default | Description |
|---|---|---|---|---|
| Community Edition earlier than 23.8 | _sign | Int8 | 1 | DML operation type. INSERT and UPDATE: 1. DELETE: -1. |
| Community Edition earlier than 23.8 | _version | UInt64 | 1 | Timestamp when the data was written to ClickHouse. |
| Enterprise Edition and Community Edition 23.8 and later | _sign | Int8 | 1 | DML operation type. INSERT and UPDATE: 1. DELETE: -1. |
| Enterprise Edition and Community Edition 23.8 and later | _is_deleted | UInt8 | 0 | Whether the record is deleted. INSERT and UPDATE: 0. DELETE: 1. |
| Enterprise Edition and Community Edition 23.8 and later | _version | UInt64 | 1 | Timestamp when the data was written to ClickHouse. |
Calculation logic for partition keys
| Source field type | Partition key calculation |
|---|---|
| BIGINT | intDiv(tablePartKey, 18014398509481984) |
| INT | intDiv(tablePartKey, 4194304) |
| TIMESTAMP, DATETIME, DATE | toYYYYMM(tablePartKey) |