Data Transmission Service (DTS) keeps a PolarDB for MySQL cluster in sync with an ApsaraDB for ClickHouse cluster in real time, enabling centralized analytics without disrupting your source database.
Synchronization types and billing
| Synchronization type | Billing |
|---|---|
| Schema synchronization | Free |
| Full data synchronization | Free |
| Incremental data synchronization | Charged. For details, see Billing overview. |
Prerequisites
Before you begin, make sure that:
An ApsaraDB for ClickHouse cluster running ClickHouse V20.8 or later is ready as the destination. See Create an ApsaraDB for ClickHouse cluster.
The destination cluster has more available storage space than the total data size in the source PolarDB for MySQL cluster.
Binary logging is enabled on the source cluster and
loose_polar_log_binis set toon. See Enable binary logging and Modify parameters.The source database accounts have the required permissions (see Permissions required for database accounts).
Binary logs on the source cluster must be retained for at least seven days. A shorter retention period can cause DTS to lose its position in the binlog, resulting in task failure or data loss. Enabling binary logging incurs storage charges on PolarDB for MySQL.
Permissions required for database accounts
| Database | Required permissions | References |
|---|---|---|
| Source PolarDB for MySQL cluster | Read permissions on all objects to be synchronized | Create and manage a database account and Manage the password of a database account |
| Destination ApsaraDB for ClickHouse V22.8 or later | Read and write permissions on the destination database (a privileged account works) | Manage database accounts in a Community-compatible Edition cluster |
| Destination ApsaraDB for ClickHouse V21.8 | Read, Write and Set Permissions and Enable DDL | Same as above |
Limitations
Source instance limits
The source server must have enough outbound bandwidth. Insufficient bandwidth slows down synchronization.
If you synchronize tables and want to rename or modify columns at the destination, a single task supports up to 1,000 tables. For more than 1,000 tables, split into multiple tasks or synchronize the entire database.
Do not run DDL operations on the source during full data synchronization. Doing so will fail the task.
Other limits
DTS does not synchronize read-only nodes or Object Storage Service (OSS) external tables from the source cluster.
DDL statements that are not standard MySQL syntax may cause the task to fail or data to be lost.
RENAME TABLE is not supported.
DATETIME values in the source must fall within the time ranges supported by ClickHouse (see Time range).
All columns selected as Partition Key must be non-empty fields.
Up to 256 databases can be synchronized to a single ApsaraDB for ClickHouse cluster.
Database, table, and column names must comply with ApsaraDB for ClickHouse naming conventions. See Limits on object naming conventions.
During schema synchronization, DTS adds
_sign,_is_deleted, and_versionfields to destination tables. If you disable Schema Synchronization, create the destination table manually and include these fields.Do not write data from other sources to the destination database during synchronization. Concurrent writes cause data inconsistency.
Do not use pt-online-schema-change to run DDL on tables being synchronized at the table level. Use Data Management (DMS) lock-free DDL instead.
Run synchronization during off-peak hours when possible. Full data synchronization uses read and write resources on both clusters.
If a task fails, DTS support will attempt to restore it within 8 hours and may restart the task or modify task parameters (not database parameters).
SQL operations that support data synchronization
DML
INSERT, UPDATE, and DELETE
DDL
| DDL operation | Supported |
|---|---|
| CREATE TABLE | Yes |
| TRUNCATE TABLE | Yes |
| ALTER TABLE | Yes |
| ADD COLUMN | Yes |
| MODIFY COLUMN | Yes |
| DROP TABLE | Yes |
| DROP COLUMN | Yes |
| RENAME TABLE | No |
Data type mappings
PolarDB for MySQL and ApsaraDB for ClickHouse support different data types. During initial schema synchronization, DTS maps source types to compatible destination types. See Data type mappings for initial schema synchronization.
Create a synchronization task
Step 1: Go to the data synchronization page
DTS console
Log on to the DTS console
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the synchronization instance will reside.
DMS console
The exact navigation path varies by DMS console mode. See Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list next to Data Synchronization Tasks, select the region where the synchronization instance will reside.
Step 2: Configure source and destination databases
Click Create Task, then fill in the following parameters.
| Section | Parameter | Description |
|---|---|---|
| N/A | Task Name | A name for the DTS task. DTS generates a name automatically. Use a descriptive name to identify the task easily. The name does not need to be unique. |
| Source Database | Select Existing Connection | Select a registered database instance from the drop-down list, or configure the connection manually if the instance is not registered. |
| Database Type | Select PolarDB for MySQL. | |
| Access Method | Select Alibaba Cloud Instance. | |
| Instance Region | The region where the source PolarDB for MySQL cluster resides. | |
| Replicate Data Across Alibaba Cloud Accounts | Select No for same-account synchronization. | |
| PolarDB Cluster ID | The ID of the source PolarDB for MySQL cluster. | |
| Database Account | The database account for the source cluster. | |
| Database Password | The password for the account. | |
| Encryption | Whether to encrypt the connection to the source database. See Configure SSL encryption. | |
| Destination Database | Select Existing Connection | Select a registered instance or configure the connection manually. |
| Database Type | Select ClickHouse. | |
| Access Method | Select Alibaba Cloud Instance. | |
| Instance Region | The region where the destination ApsaraDB for ClickHouse cluster resides. | |
| Replicate Data Across Alibaba Cloud Accounts | Select No for same-account synchronization. | |
| Cluster Type | The type of the destination ApsaraDB for ClickHouse cluster. | |
| Cluster ID | The ID of the destination ApsaraDB for ClickHouse cluster. | |
| Database Account | The database account for the destination cluster. | |
| Database Password | The password for the account. |
Click Test Connectivity and Proceed.
Make sure that the CIDR blocks of DTS servers are added to the security group rules of both source and destination databases. See Add the CIDR blocks of DTS servers.
Step 3: Configure objects to synchronize
In the Configure Objects step, set the following parameters.
| Parameter | Description |
|---|---|
| Synchronization Types | Select Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. Full data synchronization loads historical data before incremental synchronization begins. |
| Processing Mode of Conflicting Tables | Precheck and Report Errors (default): fails the precheck if the destination has tables with the same names as the source. Use object name mapping to avoid conflicts. Ignore Errors and Proceed: skips the conflict check. During full synchronization, existing records in the destination are kept; during incremental synchronization, they are overwritten. If schemas differ, the task may fail. Use with caution. |
| Capitalization of Object Names in Destination Instance | Controls case handling for database, table, and column names. Default: DTS default policy. See Specify the capitalization of object names in the destination instance. |
| Source Objects | Select the databases, tables, or columns to synchronize and click the arrow icon to add them to Selected Objects. |
| Selected Objects | Right-click an object to rename it, or click Batch Edit to rename multiple objects at once. Right-click a table to add SQL filter conditions. See Map object names and Set filter conditions |
If you use object name mapping, objects that depend on the renamed objects may fail to synchronize.
Click Next: Advanced Settings and configure the following parameters.
| Parameter | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | By default, DTS uses the shared cluster. For higher stability, purchase a dedicated cluster. See What is a DTS dedicated cluster. |
| Time zone of destination database | The time zone applied to DateTime data written to the destination ClickHouse cluster. |
| Retry Time for Failed Connections | How long DTS retries failed connections after the task starts. Valid values: 10–1440 minutes. Default: 720 minutes. Set to at least 30 minutes. If multiple tasks share the same source or destination, the shortest retry time applies. DTS charges continue during retries. |
| Retry Time for Other Issues | How long DTS retries failed DDL or DML operations. Valid values: 1–1440 minutes. Default: 10 minutes. Set to at least 10 minutes. This value must be smaller than Retry Time for Failed Connections. |
| Enable Throttling for Full Data Synchronization | Limits the read and write load during full synchronization. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected. |
| Enable Throttling for Incremental Data Synchronization | Limits the load during incremental synchronization. Configure RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s). |
| Whether to delete SQL operations on heartbeat tables of forward and reverse tasks | Yesalert notification settings: DTS does not write heartbeat SQL to the source, which may show a task latency. No: DTS writes heartbeat SQL, which may affect physical backup and cloning of the source. |
| Environment Tag | Optional tags to categorize the synchronization instance. |
| Configure ETL | Whether to enable the extract, transform, and load (ETL) feature. Yes: enter data processing statements in the code editor. See Configure ETL in a data migration or data synchronization task. No: skip ETL. |
| Monitoring and Alerting | Yes: configure alert thresholds and notification contacts so DTS alerts you when the task fails or synchronization latency exceeds the threshold. See Configure monitoring and alerting. No: no alerts. |
Click Next: Configure Database and Table Fields to set the Type, Primary Key Column, Sort Key, Distribution Key, and Partition Key for each ClickHouse destination table.
DTS auto-configures these fields by default. To change them, set Definition Status to All.
Primary Key Column and Sort Key accept multiple columns. Distribution Key accepts only one column.
Partition Key is optional. If configured, all selected columns must be non-empty fields.
For details on ClickHouse table parameters, see CREATE TABLE.
Step 4: Run the precheck
Click Next: Save Task Settings and Precheck.
To preview the API parameters for this task, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
DTS runs a precheck before starting the task. If the precheck fails:
Click View Details next to a failed item, fix the issue, and click Precheck Again.
For alert items that can be ignored, click Confirm Alert Details > Ignore > OK, then click Precheck Again. Ignoring alerts may cause data inconsistency.
Step 5: Purchase an instance
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
On the buy page, configure the following parameters.
| Parameter | Description |
|---|---|
| Billing Method | Subscription: pay upfront, more cost-effective for long-term use. Pay-as-you-go: billed hourly, suitable for short-term use. Release the instance when no longer needed to stop charges. |
| Resource Group Settings | The resource group for the synchronization instance. Default: default resource group. See What is Resource Management? |
| Instance Class | The instance class determines synchronization speed. See Instance classes of data synchronization instances. |
| Subscription Duration | Available for Subscription billing only. Options: 1–9 months, or 1, 2, 3, or 5 years. |
Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the confirmation dialog box.
The task appears in the task list. Monitor its progress there.
Query synchronized data
After synchronization completes, use the following statement to query data in the destination ClickHouse cluster:
SELECT * FROM table_name FINAL WHERE _sign > 0;The FINAL keyword merges rows with the same sort key and returns the latest version. The WHERE _sign > 0 filter excludes rows marked for deletion.
Appendix
Time range
| Data type | Minimum value | Maximum value |
|---|---|---|
| Date | 1970-01-01 00:00:00 | 2149-06-06 00:00:00 |
| Date32 | 1925-01-01 00:00:00 | 2283-11-11 00:00:00 |
| DateTime | 1970-01-01 08:00:00 | 2106-02-07 14:28:15 |
| DateTime64 | 1925-01-01 08:00:00 | 2283-11-12 07:59:59 |
Tables
If you create destination tables manually (without schema synchronization), the tables must meet the following requirements.
If the destination table includes the ENGINE parameter, set it to ENGINE = ReplicatedReplacingMergeTree(_version, _is_deleted). Any other value causes data inconsistency.
Community-compatible Edition: create a local table and a distributed table. The distributed table name must match the source table name. The local table name follows the
<distributed_table_name>_localformat.Enterprise Edition: create a table with the same name as the source table.
Fields
DTS adds the following metadata fields to destination tables.
ApsaraDB for ClickHouse Community-compatible Edition running V23.8 or earlier
| Field | Data type | Default value | Description |
|---|---|---|---|
_sign | Int8 | 1 | DML operation type: INSERT or UPDATE = 1; DELETE = -1. |
_version | UInt64 | 1 | The time when the row was written to the ClickHouse cluster. |
ApsaraDB for ClickHouse Enterprise Edition or Community-compatible Edition running V23.8 or later
| Field | Data type | Default value | Description |
|---|---|---|---|
_sign | Int8 | 1 | DML operation type: INSERT or UPDATE = 1; DELETE = -1. |
_is_deleted | UInt8 | 0 | Whether the row is deleted: INSERT or UPDATE = 0; DELETE = 1. |
_version | UInt64 | 1 | The timestamp when the row was written to the ClickHouse cluster. |