Synchronize data from RDS for MySQL to ApsaraDB for ClickHouse - ApsaraDB for ClickHouse

How it works

Zero-ETL handles three phases automatically:

Schema synchronization — DTS reads the source schema and creates corresponding tables in ClickHouse. It adds _sign, _is_deleted, and _version fields to each destination table.
Full data synchronization — DTS copies all existing rows from the source to ClickHouse.
Incremental data synchronization — DTS reads binary logs from the source MySQL instance and applies INSERT, UPDATE, and DELETE changes to ClickHouse continuously.

When an UPDATE or DELETE runs on the source, ClickHouse appends a new record and uses _sign, _is_deleted, and _version to track the change. The destination database may be larger than the source as a result. To query only current rows, add FINAL after the table name or filter on _sign or _is_deleted.

Supported pipeline

Source	Destination	Access method	Supported phases
ApsaraDB RDS for MySQL	ApsaraDB for ClickHouse	Alibaba Cloud instance	Schema sync, full sync, incremental sync

Billing

Zero-ETL synchronization pipelines are free of charge.

Prerequisites

Before you begin, make sure that:

The ApsaraDB for ClickHouse cluster and the RDS for MySQL instance are in the same region
A database account exists for the source RDS for MySQL instance. See Create an account for an ApsaraDB RDS for MySQL instance
A database account exists for the destination ClickHouse cluster. See Create an account for an ApsaraDB for ClickHouse cluster

Limits

Source database (RDS for MySQL)

Constraint	Details
Primary keys	All tables to be synchronized must have primary keys
Table rename	`RENAME TABLE` operations are not synchronized
Table-level sync limit	Synchronizing individual tables (not entire databases) supports up to 1,000 tables per task. For more than 1,000 tables, split into multiple tasks or synchronize at the database level
DDL during schema sync or full sync	Do not run DDL operations while schema synchronization or full data synchronization is in progress. Doing so causes the task to fail
`pt-online-schema-change`	If you select one or more tables (instead of an entire database) as the objects to be synchronized, do not use `pt-online-schema-change` or similar tools for DDL operations on those tables during synchronization. Otherwise, data may fail to be synchronized
Non-standard DDL	DDL statements on the source that do not comply with standard MySQL syntax may cause the task to fail or result in data loss
Binary logging	`binlog_row_image` must be set to `full`. Binary logging is enabled by default for ApsaraDB RDS for MySQL. For self-managed MySQL, also set `binlog_format=row`. For a self-managed MySQL database in a primary/primary architecture, enable `log_slave_updates`
Binary log retention	Retain local binary logs for at least 3 days (7 days recommended) for ApsaraDB RDS for MySQL; at least 7 days for self-managed MySQL. Tasks fail if DTS cannot read the required binary logs, and issues caused by insufficient retention are outside the DTS Service-Level Agreement (SLA)
Invisible columns (MySQL 8.0.23+)	Invisible columns are not synchronized and data may be lost. Run `ALTER TABLE <table_name> ALTER COLUMN <column_name> SET VISIBLE;` to make them visible before starting the task
Always-confidential (EncDB)	Full data synchronization is not supported when EncDB is enabled. Transparent Data Encryption (TDE) is supported for schema synchronization, full data synchronization, and incremental data synchronization
Read-only instances without transaction logs	Cannot be used as the source (for example, read-only RDS for MySQL 5.6 instances)
Heartbeat statement	The zero-ETL task periodically runs `CREATE DATABASE IF NOT EXISTS \`test\`` on the source to advance the binary log offset
Changes not in binary logs	Changes applied through physical backup recovery or cascade operations are not synchronized. If this occurs, remove and re-add the affected object from the synchronization scope

If you synchronize data from multiple instances to a single ApsaraDB for ClickHouse cluster, make sure that the synchronization objects in different tasks do not overlap.

General limits

Constraint	Details
Max databases per cluster	Up to 256 databases can be synchronized to a single ApsaraDB for ClickHouse cluster
Naming conventions	Database, table, and column names must comply with ApsaraDB for ClickHouse naming conventions. See Limits on object naming conventions
DATETIME range	DATETIME values in the source must fall within the time ranges supported by the destination cluster. See the Time range section

Data type mappings

RDS for MySQL and ApsaraDB for ClickHouse data types do not map one-to-one. During schema synchronization, DTS maps source field types to the closest supported type in ClickHouse. For the full mapping table, see Data type mappings for schema synchronization.

Usage notes

Maximum number of zero-ETL tasks per cluster

The maximum number of zero-ETL tasks you can create for a cluster depends on the edition:

Enterprise Edition: ceil(lower CCU limit / 8). For a cluster with a lower ClickHouse Compute Unit (CCU) limit of 22, the calculation is ceil(22 / 8) = 3.
Community Edition: ceil(total CPU cores / 8). For a two-node cluster where each node has 8 cores, the calculation is ceil(16 / 8) = 2.

If a cluster reaches its limit, delete unused tasks or create additional synchronization tasks directly in the DTS console.

Table structure in the destination

During schema synchronization, the zero-ETL feature adds _sign, _is_deleted, and _version fields to each destination table.

For Community-Compatible Edition clusters, the task creates both a local table and a distributed table for each source table:

Distributed table: Same name as the source table. Use this table for queries.
Local table: Named <distributed_table_name>_local. Used internally by ClickHouse for shard-level storage.

Scheduling and performance

Run the task during off-peak hours. During full data synchronization, DTS uses read and write resources on both the source and destination, which may increase server load.

Preparations

Before creating a zero-ETL task, set up the required service-linked role and Resource Access Management (RAM) user permissions.

Step 1: Create the service-linked role

Create the AliyunServiceRoleForClickHouseZeroETL service-linked role.

When you select a database instance ID during task configuration, the console prompts you to create this role automatically. Manual creation is not required in that case.

Step 2: Grant RAM user permissions

To allow a RAM user to create zero-ETL tasks, grant the following permissions. See Manage RAM user permissions.

Source RDS for MySQL: AliyunRDSFullAccess
Destination ClickHouse cluster: AliyunClickHouseFullAccess
DTS: Create a custom policy using the following script. See Create a custom permission policy.

{
    "Version": "1",
    "Statement": [
        {
            "Action": "dts:*",
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": "ram:PassRole",
            "Resource": "*",
            "Effect": "Allow",
            "Condition": {
                "StringEquals": {
                    "acs:Service": "dts.aliyuncs.com"
                }
            }
        }
    ]
}

Create and start a zero-ETL task

The overall flow has five steps: navigate to the zero-ETL page, configure source and destination databases, select objects to synchronize, configure table fields, and run the precheck before starting.

Step 1: Go to the zero-ETL page

Log on to the ApsaraDB for ClickHouse console.
In the upper-left corner, select the region of your cluster.
On the Cluster List page, click the List of Community Edition Instances tab, then click the target cluster ID.
In the left navigation pane, click Zero-ETL (Seamless Integration).

Step 2: Configure source and destination databases

Click Create Zero-ETL Task.
Enter a Task Name.

Configure the source and destination databases.

Configure the source and destination databases based on the following parameters. After you complete the configurations, click Next.

Source database

Parameter	Description
Database type	RDS for MySQL (only supported option)
Access method	Alibaba Cloud Instance (only supported option)
Instance region	Region of the source RDS for MySQL instance
RDS instance ID	ID of the source RDS for MySQL instance
Database account	Database account for the RDS for MySQL instance
Database password	Password for the database account of the RDS for MySQL instance
Encryption	Select Non-encrypted or SSL-encrypted. If you select SSL-encrypted, enable SSL encryption for the RDS for MySQL instance first. See Use a cloud certificate to quickly enable SSL link encryption.

Destination database

Parameter	Description
Database type	ClickHouse
Connection type	Alibaba Cloud instance (only supported option)
Instance region	Region of the destination cluster
Cluster ID	ID of the destination cluster
Cluster type	Community Edition or Enterprise Edition
Database account	Database account for the destination cluster
Database password	Password for the database account of the destination cluster

Click Test Connectivity and Proceed.

Step 3: Select objects to synchronize

In the Source Objects section, select the databases or tables to synchronize. Click to move them to the Selected Objects section.

Click Next: Configure Database and Table Fields.

Step 4: Configure table fields

On the Configurations for Databases, Tables, and Columns page, set the following for each table:

Field	Description
Type	Table engine type
Primary key column	One or more columns forming the primary key. Composite keys are supported
Sort key	One or more columns for sorting. Composite keys are supported
Distribution key	A single column for data distribution across shards
Partition key	(Optional) A non-empty column for partitioning. The column cannot be blank if set

By default, only tables with undefined configurations are shown. To view all tables, set Definition Status to All. The partition key must be selected from the primary key column. For details on these fields, see CREATE TABLE.

Click Next: Save Task Settings and Precheck. The task is saved regardless of whether the precheck passes.

Step 5: Run the precheck and start the task

When Success Rate reaches 100%, click Start.

If the precheck fails, fix the reported issues in the source or destination, then modify and rerun the precheck from the zero-ETL page.

After the task starts, the zero-ETL page shows the task ID/name, source/destination, and status.

Monitor zero-ETL tasks

Use one or more of the following methods to track task health. For ongoing production workloads, configure alerts or event subscriptions so you receive automatic notifications when issues occur.

Method	Best for	Limitation
Active viewing in the ClickHouse console	Reviewing replication performance, synchronization details, and task logs	No automatic notifications
Synchronization latency alerts in CloudMonitor	Receiving automatic alerts when latency exceeds a threshold	Monitors latency only
Event subscription in CloudMonitor	Receiving notifications on task failure and recovery	Monitors failure and recovery only

Monitor a task in the ApsaraDB for ClickHouse console

Log on to the ApsaraDB for ClickHouse console.
In the upper-left corner, select the region of your cluster.
On the Cluster List page, click the List of Community Edition Instances tab, then click the cluster ID.
In the left navigation pane, click Zero-ETL (Seamless Integration).
Find the task and click Task Details in the Actions column.

The task details page shows replication performance, synchronization status, and logs.

Set up synchronization latency alerts in CloudMonitor

Create an alert rule in CloudMonitor. See Create an alert rule in the CloudMonitor console. Set the following parameters:

Parameter	Value
Product	Clickhouse - ZeroETL Latency
Monitoring metrics	Synchronization Latency

To view current latency, log on to the CloudMonitor console. In the ClickHouse - ZeroETL Latency list, click Monitoring Charts in the Actions column for the target cluster.

Subscribe to zero-ETL task events in CloudMonitor

Subscribe to system events to receive automatic notifications when a task fails or recovers. See Manage event subscriptions.

When creating a subscription policy, set the following parameters:

Event	Parameter	Value
Task failure	Subscription type	System Event
	Product	ApsaraDB Clickhouse
	Event type	Abnormal
	Event name	ZeroETL task abnormal
Task recovery	Subscription type	System Event
	Product	ApsaraDB Clickhouse
	Event type	Restore
	Event name	ZeroETLTaskRestore

FAQ

Why is the destination database larger than the source after synchronization?

ClickHouse does not overwrite rows on UPDATE or DELETE. Instead, it appends a new record and uses _sign, _is_deleted, and _version to track the change. The destination accumulates more rows than the source over time.

To query only the latest data, add FINAL after the table name, or filter on _sign or _is_deleted (the exact field depends on your cluster version). For details, see Field information.

Why do tables named `<table>_local` appear in the destination database?

For Community-Compatible Edition clusters, zero-ETL creates both a local table and a distributed table for each source table. The distributed table (named after the source table) is the one you query. The _local table is used internally by ClickHouse for shard-level storage.

What happens if the binary log retention period is too short?

DTS uses binary logs to apply incremental changes. If the binary logs needed to continue replication have already been purged, the task fails and cannot resume. In this case, remove the affected database or table from the synchronization scope and re-add it to trigger a fresh full synchronization. For ApsaraDB RDS for MySQL, retain binary logs for at least 3 days (7 days recommended). For self-managed MySQL, retain binary logs for at least 7 days.

What causes a zero-ETL task to fail with a binary log offset error?

The task loses its position in the binary log if the log file is purged before DTS reads it, typically due to a short retention period. When this happens, the task cannot resume automatically. Remove the affected table or database from the synchronization scope and re-add it to trigger a fresh full synchronization. To prevent recurrence, for ApsaraDB RDS for MySQL, retain binary logs for at least 3 days (7 days recommended); for self-managed MySQL, retain binary logs for at least 7 days.