All Products
Search
Document Center

ApsaraDB for ClickHouse:Synchronize data from RDS for MySQL to ApsaraDB for ClickHouse using zero-ETL

Last Updated:Mar 30, 2026

Zero-ETL streams data from ApsaraDB RDS for MySQL into ApsaraDB for ClickHouse in real time without requiring you to build or maintain a data pipeline. The feature is powered by Data Transmission Service (DTS) and is free of charge.

How it works

Zero-ETL handles three phases automatically:

  1. Schema synchronization — DTS reads the source schema and creates corresponding tables in ClickHouse. It adds _sign, _is_deleted, and _version fields to each destination table.

  2. Full data synchronization — DTS copies all existing rows from the source to ClickHouse.

  3. Incremental data synchronization — DTS reads binary logs from the source MySQL instance and applies INSERT, UPDATE, and DELETE changes to ClickHouse continuously.

When an UPDATE or DELETE runs on the source, ClickHouse appends a new record and uses _sign, _is_deleted, and _version to track the change. The destination database may be larger than the source as a result. To query only current rows, add FINAL after the table name or filter on _sign or _is_deleted.

Supported pipeline

Source Destination Access method Supported phases
ApsaraDB RDS for MySQL ApsaraDB for ClickHouse Alibaba Cloud instance Schema sync, full sync, incremental sync

Billing

Zero-ETL synchronization pipelines are free of charge.

Prerequisites

Before you begin, make sure that:

Limits

Source database (RDS for MySQL)

Constraint Details
Primary keys All tables to be synchronized must have primary keys
Table rename RENAME TABLE operations are not synchronized
Table-level sync limit Synchronizing individual tables (not entire databases) supports up to 1,000 tables per task. For more than 1,000 tables, split into multiple tasks or synchronize at the database level
DDL during schema sync or full sync Do not run DDL operations while schema synchronization or full data synchronization is in progress. Doing so causes the task to fail
pt-online-schema-change If you select one or more tables (instead of an entire database) as the objects to be synchronized, do not use pt-online-schema-change or similar tools for DDL operations on those tables during synchronization. Otherwise, data may fail to be synchronized
Non-standard DDL DDL statements on the source that do not comply with standard MySQL syntax may cause the task to fail or result in data loss
Binary logging binlog_row_image must be set to full. Binary logging is enabled by default for ApsaraDB RDS for MySQL. For self-managed MySQL, also set binlog_format=row. For a self-managed MySQL database in a primary/primary architecture, enable log_slave_updates
Binary log retention Retain local binary logs for at least 3 days (7 days recommended) for ApsaraDB RDS for MySQL; at least 7 days for self-managed MySQL. Tasks fail if DTS cannot read the required binary logs, and issues caused by insufficient retention are outside the DTS Service-Level Agreement (SLA)
Invisible columns (MySQL 8.0.23+) Invisible columns are not synchronized and data may be lost. Run ALTER TABLE <table_name> ALTER COLUMN <column_name> SET VISIBLE; to make them visible before starting the task
Always-confidential (EncDB) Full data synchronization is not supported when EncDB is enabled. Transparent Data Encryption (TDE) is supported for schema synchronization, full data synchronization, and incremental data synchronization
Read-only instances without transaction logs Cannot be used as the source (for example, read-only RDS for MySQL 5.6 instances)
Heartbeat statement The zero-ETL task periodically runs CREATE DATABASE IF NOT EXISTS \test\`` on the source to advance the binary log offset
Changes not in binary logs Changes applied through physical backup recovery or cascade operations are not synchronized. If this occurs, remove and re-add the affected object from the synchronization scope
If you synchronize data from multiple instances to a single ApsaraDB for ClickHouse cluster, make sure that the synchronization objects in different tasks do not overlap.

General limits

Constraint Details
Max databases per cluster Up to 256 databases can be synchronized to a single ApsaraDB for ClickHouse cluster
Naming conventions Database, table, and column names must comply with ApsaraDB for ClickHouse naming conventions. See Limits on object naming conventions
DATETIME range DATETIME values in the source must fall within the time ranges supported by the destination cluster. See the Time range section

Data type mappings

RDS for MySQL and ApsaraDB for ClickHouse data types do not map one-to-one. During schema synchronization, DTS maps source field types to the closest supported type in ClickHouse. For the full mapping table, see Data type mappings for schema synchronization.

Usage notes

Maximum number of zero-ETL tasks per cluster

The maximum number of zero-ETL tasks you can create for a cluster depends on the edition:

  • Enterprise Edition: ceil(lower CCU limit / 8). For a cluster with a lower ClickHouse Compute Unit (CCU) limit of 22, the calculation is ceil(22 / 8) = 3.

  • Community Edition: ceil(total CPU cores / 8). For a two-node cluster where each node has 8 cores, the calculation is ceil(16 / 8) = 2.

If a cluster reaches its limit, delete unused tasks or create additional synchronization tasks directly in the DTS console.

Table structure in the destination

During schema synchronization, the zero-ETL feature adds _sign, _is_deleted, and _version fields to each destination table.

For Community-Compatible Edition clusters, the task creates both a local table and a distributed table for each source table:

  • Distributed table: Same name as the source table. Use this table for queries.

  • Local table: Named <distributed_table_name>_local. Used internally by ClickHouse for shard-level storage.

Scheduling and performance

Run the task during off-peak hours. During full data synchronization, DTS uses read and write resources on both the source and destination, which may increase server load.

Preparations

Before creating a zero-ETL task, set up the required service-linked role and Resource Access Management (RAM) user permissions.

Step 1: Create the service-linked role

Create the AliyunServiceRoleForClickHouseZeroETL service-linked role.

When you select a database instance ID during task configuration, the console prompts you to create this role automatically. Manual creation is not required in that case.

Step 2: Grant RAM user permissions

To allow a RAM user to create zero-ETL tasks, grant the following permissions. See Manage RAM user permissions.

  • Source RDS for MySQL: AliyunRDSFullAccess

  • Destination ClickHouse cluster: AliyunClickHouseFullAccess

  • DTS: Create a custom policy using the following script. See Create a custom permission policy.

{
    "Version": "1",
    "Statement": [
        {
            "Action": "dts:*",
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": "ram:PassRole",
            "Resource": "*",
            "Effect": "Allow",
            "Condition": {
                "StringEquals": {
                    "acs:Service": "dts.aliyuncs.com"
                }
            }
        }
    ]
}

Create and start a zero-ETL task

The overall flow has five steps: navigate to the zero-ETL page, configure source and destination databases, select objects to synchronize, configure table fields, and run the precheck before starting.

Step 1: Go to the zero-ETL page

  1. Log on to the ApsaraDB for ClickHouse console.

  2. In the upper-left corner, select the region of your cluster.

  3. On the Cluster List page, click the List of Community Edition Instances tab, then click the target cluster ID.

  4. In the left navigation pane, click Zero-ETL (Seamless Integration).

Step 2: Configure source and destination databases

  1. Click Create Zero-ETL Task.

  2. Enter a Task Name.

  3. Configure the source and destination databases.

    Configure the source and destination databases based on the following parameters. After you complete the configurations, click Next.

    Source database

    Parameter

    Description

    Database type

    RDS for MySQL (only supported option)

    Access method

    Alibaba Cloud Instance (only supported option)

    Instance region

    Region of the source RDS for MySQL instance

    RDS instance ID

    ID of the source RDS for MySQL instance

    Database account

    Database account for the RDS for MySQL instance

    Database password

    Password for the database account of the RDS for MySQL instance

    Encryption

    Select Non-encrypted or SSL-encrypted. If you select SSL-encrypted, enable SSL encryption for the RDS for MySQL instance first. See Use a cloud certificate to quickly enable SSL link encryption.

    Destination database

    Parameter

    Description

    Database type

    ClickHouse

    Connection type

    Alibaba Cloud instance (only supported option)

    Instance region

    Region of the destination cluster

    Cluster ID

    ID of the destination cluster

    Cluster type

    Community Edition or Enterprise Edition

    Database account

    Database account for the destination cluster

    Database password

    Password for the database account of the destination cluster

  4. Click Test Connectivity and Proceed.

Step 3: Select objects to synchronize

In the Source Objects section, select the databases or tables to synchronize. Click image to move them to the Selected Objects section.

image

Click Next: Configure Database and Table Fields.

Step 4: Configure table fields

On the Configurations for Databases, Tables, and Columns page, set the following for each table:

Field Description
Type Table engine type
Primary key column One or more columns forming the primary key. Composite keys are supported
Sort key One or more columns for sorting. Composite keys are supported
Distribution key A single column for data distribution across shards
Partition key (Optional) A non-empty column for partitioning. The column cannot be blank if set
By default, only tables with undefined configurations are shown. To view all tables, set Definition Status to All. The partition key must be selected from the primary key column. For details on these fields, see CREATE TABLE.

Click Next: Save Task Settings and Precheck. The task is saved regardless of whether the precheck passes.

Step 5: Run the precheck and start the task

When Success Rate reaches 100%, click Start.

If the precheck fails, fix the reported issues in the source or destination, then modify and rerun the precheck from the zero-ETL page.

After the task starts, the zero-ETL page shows the task ID/name, source/destination, and status.

Monitor zero-ETL tasks

Use one or more of the following methods to track task health. For ongoing production workloads, configure alerts or event subscriptions so you receive automatic notifications when issues occur.

Method Best for Limitation
Active viewing in the ClickHouse console Reviewing replication performance, synchronization details, and task logs No automatic notifications
Synchronization latency alerts in CloudMonitor Receiving automatic alerts when latency exceeds a threshold Monitors latency only
Event subscription in CloudMonitor Receiving notifications on task failure and recovery Monitors failure and recovery only

Monitor a task in the ApsaraDB for ClickHouse console

  1. Log on to the ApsaraDB for ClickHouse console.

  2. In the upper-left corner, select the region of your cluster.

  3. On the Cluster List page, click the List of Community Edition Instances tab, then click the cluster ID.

  4. In the left navigation pane, click Zero-ETL (Seamless Integration).

  5. Find the task and click Task Details in the Actions column.

The task details page shows replication performance, synchronization status, and logs.

image

Set up synchronization latency alerts in CloudMonitor

  1. Create an alert rule in CloudMonitor. See Create an alert rule in the CloudMonitor console. Set the following parameters:

    Parameter Value
    Product Clickhouse - ZeroETL Latency
    Monitoring metrics Synchronization Latency
  2. To view current latency, log on to the CloudMonitor console. In the ClickHouse - ZeroETL Latency list, click Monitoring Charts in the Actions column for the target cluster.

Subscribe to zero-ETL task events in CloudMonitor

Subscribe to system events to receive automatic notifications when a task fails or recovers. See Manage event subscriptions.

When creating a subscription policy, set the following parameters:

Event Parameter Value
Task failure Subscription type System Event
Product ApsaraDB Clickhouse
Event type Abnormal
Event name ZeroETL task abnormal
Task recovery Subscription type System Event
Product ApsaraDB Clickhouse
Event type Restore
Event name ZeroETLTaskRestore

FAQ

Why is the destination database larger than the source after synchronization?

ClickHouse does not overwrite rows on UPDATE or DELETE. Instead, it appends a new record and uses _sign, _is_deleted, and _version to track the change. The destination accumulates more rows than the source over time.

To query only the latest data, add FINAL after the table name, or filter on _sign or _is_deleted (the exact field depends on your cluster version). For details, see Field information.

Why do tables named `<table>_local` appear in the destination database?

For Community-Compatible Edition clusters, zero-ETL creates both a local table and a distributed table for each source table. The distributed table (named after the source table) is the one you query. The _local table is used internally by ClickHouse for shard-level storage.

What happens if the binary log retention period is too short?

DTS uses binary logs to apply incremental changes. If the binary logs needed to continue replication have already been purged, the task fails and cannot resume. In this case, remove the affected database or table from the synchronization scope and re-add it to trigger a fresh full synchronization. For ApsaraDB RDS for MySQL, retain binary logs for at least 3 days (7 days recommended). For self-managed MySQL, retain binary logs for at least 7 days.

What causes a zero-ETL task to fail with a binary log offset error?

The task loses its position in the binary log if the log file is purged before DTS reads it, typically due to a short retention period. When this happens, the task cannot resume automatically. Remove the affected table or database from the synchronization scope and re-add it to trigger a fresh full synchronization. To prevent recurrence, for ApsaraDB RDS for MySQL, retain binary logs for at least 3 days (7 days recommended); for self-managed MySQL, retain binary logs for at least 7 days.