Synchronize RDS PostgreSQL Data to ClickHouse for Real-Time Analytics via DTS - Data Transmission Service

Data Transmission Service (DTS) continuously replicates data from an ApsaraDB RDS for PostgreSQL instance to an ApsaraDB for ClickHouse cluster using logical replication. This lets you run analytics on ClickHouse without impacting your PostgreSQL workload.

Prerequisites

Before you begin, make sure you have:

An ApsaraDB for ClickHouse cluster with storage space larger than the data volume to synchronize. See Create a cluster.
The wal_level parameter on the source ApsaraDB RDS for PostgreSQL instance set to logical. See Set instance parameters.

For supported source and destination database versions, see Synchronization solution overview.

Billing

Synchronization type	Pricing
Schema synchronization and full data synchronization	Free
Incremental data synchronization	Charged. See Billing overview.

Supported SQL operations for incremental synchronization

Operation type	SQL statements
DML	INSERT, UPDATE, DELETE
DDL	CREATE TABLE, DROP TABLE, TRUNCATE TABLE, ADD COLUMN, MODIFY COLUMN, DROP COLUMN

Required permissions

Database	Required permissions	How to configure
Source ApsaraDB RDS for PostgreSQL	Privileged account that is the owner of the database to synchronize	Create an account and create a database
Destination ApsaraDB for ClickHouse cluster (v22.8 or later)	Read and write permissions on the destination database. A privileged account meets this requirement.	Community-Compatible Edition account management
Destination ApsaraDB for ClickHouse cluster (v21.8)	Read/write And Set Permissions and Allow DDL	Community-Compatible Edition account management

Create a synchronization task

Step 1: Go to the synchronization task list

Use one of the following methods:

Go to the page from the DTS console

From the DTS console:

Log on to the Data Transmission Service (DTS) console.Data Transmission Service (DTS) console
In the left navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the synchronization instance resides.

Go to the page from the DMS console

From the DMS console:

Log on to Data Management (DMS).Data Management (DMS)
On the top menu bar, choose Data + AI > Data Transmission (DTS) > Data Synchronization.
To the right of Data Synchronization Tasks, select the region where the synchronization instance is located.

The actual operations may vary based on the mode and layout of the DMS console. See Simple mode console and Customize the layout and style of the DMS UI.

Step 2: Configure source and destination databases

Click Create Task.

Configure the source and destination databases.

Task settings:

Parameter	Description
Task Name	DTS auto-generates a name. Specify a descriptive name for easy identification. The name does not need to be unique.

Source database (PostgreSQL):

Parameter	Value
Select Existing Connection	Select a pre-added instance from the drop-down list, or configure the connection manually.
Database Type	PostgreSQL
Access Method	Alibaba Cloud Instance
Instance Region	Region of the source ApsaraDB RDS for PostgreSQL instance
Replicate Data Across Alibaba Cloud Accounts	No (same account)
Instance ID	ID of the source instance
Database Name	Name of the database containing the objects to synchronize
Database Account	Database account. See Required permissions.
Database Password	Password for the database account

Destination database (ClickHouse):

Parameter	Value
Select Existing Connection	Select a pre-added instance from the drop-down list, or configure the connection manually.
Database Type	ClickHouse
Access Method	Alibaba Cloud Instance
Instance Region	Region of the destination ApsaraDB for ClickHouse cluster
Replicate Data Across Alibaba Cloud Accounts	No (same account)
Cluster Type	Select based on your needs
Cluster ID	ID of the destination cluster
Database Account	Database account. See Required permissions.
Database Password	Password for the database account

Click Test Connectivity and Proceed.
DTS server IP address blocks must be added to the security settings of both databases before the connectivity test passes. This can be done automatically or manually. See Add the IP address blocks of DTS servers to a whitelist. If the source or destination is a self-managed database, also click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.

Step 3: Configure task objects

On the Configure Objects page, configure the synchronization settings.

Synchronization types:

Option	Description
Schema Synchronization	DTS creates the target table schema in ClickHouse, including the `_sign`, `_is_deleted`, and `_version` fields. Select this if you want DTS to manage the target table structure.
Full Data Synchronization	DTS initializes the destination with a full copy of the source data before starting incremental sync.
Incremental Data Synchronization	Selected by default. DTS continuously replicates changes from the source.

Select all three types for a complete initial load followed by ongoing synchronization.

Processing mode for conflicting tables:

Mode	Behavior
Precheck and Report Errors	Fails the precheck if a table with the same name already exists in the destination. Use this to avoid accidental overwrites. To resolve name conflicts without deleting the destination table, see Map table and column names.
Ignore Errors and Proceed	Skips the duplicate table check. If schemas match and primary keys conflict: full sync retains the destination record; incremental sync overwrites it with the source record. If schemas differ, partial or complete sync failure may occur.

Capitalization of Object Names in Destination Instance:

Configure the case sensitivity policy for database, table, and column names synchronized to the destination. The default is DTS default policy. You can also select the default policies of the source or destination database. See Case sensitivity policy for destination object names.

Object selection:

In the Source Objects box, click an object to synchronize, then click to move it to Selected Objects. You can select objects at the schema or table level.

In the Selected Objects box:

Right-click an object to rename it or map it to a different destination object. See Map table and column names.
Right-click a table to set a WHERE clause filter. See Set filter conditions.
Right-click an object to select specific SQL operations for incremental synchronization.
Click an object and then click to remove it from the selection.

Click Next: Advanced Settings and configure the options below.

Advanced settings:

Parameter	Description
Dedicated Cluster for Task Scheduling	By default, DTS schedules tasks on a shared cluster. For more stable performance, purchase a dedicated cluster. See What is a DTS dedicated cluster?
Retry Time for Failed Connections	Duration DTS retries after a connection failure. Default: 720 minutes. Range: 10–1,440 minutes. Set to 30 minutes or more. If multiple DTS instances share the same source or destination, the shortest retry duration applies to all of them.
Retry Time for Other Issues	Duration DTS retries after non-connection errors such as DDL or DML exceptions. Default: 10 minutes. Range: 1–1,440 minutes. Set to 10 minutes or more. Must be less than Retry Time for Failed Connections.
Enable Throttling for Full Data Synchronization	Limits the full sync rate to reduce load on the source and destination. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected. You can also adjust the rate after the task starts.
Enable Throttling for Incremental Data Synchronization	Limits the incremental sync rate. Configure RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Environment Tag	Optional tag to identify the instance.
Configure ETL	Enable the extract, transform, and load (ETL) feature to apply data processing logic during sync. See What is ETL? and Configure ETL in a data migration or data synchronization task.
Monitoring and Alerting	Configure alerts for task failures or latency exceeding a threshold. See alert notificationsConfigure monitoring and alerting during task configuration.

Click Next: Configure Database and Table Fields to configure ClickHouse-specific table settings.

ClickHouse table structure settings:

DTS provides a default configuration. Set Definition Status to All to review and modify all tables.

Setting	Description
Type	Table engine type
Primary Key Column	Supports composite keys. Select one or more fields. See CREATE TABLE.
Sort Key	Supports composite keys. Select one or more fields.
Distribution Key	Single field only.
Partition Key	Optional. Must not be a nullable field, or the task will fail. Supports BIGINT, INT, TIMESTAMP, DATETIME, and DATE types only. Select one or more columns from Primary Key Column as the partition key. For the calculation logic, see Partition key calculation logic.

Step 4: Run the precheck

Click Next: Save Task Settings and Precheck.

To view API parameters for the task configuration, hover over the button and click Preview OpenAPI parameters.

DTS runs a precheck before starting the synchronization task. The task starts only after all precheck items pass.

If the precheck fails, click View Details next to the failed item, fix the issue, then rerun the precheck.
If the precheck generates a warning for an item that cannot be ignored, click View Details, fix the issue, then rerun the precheck.
If the precheck generates a warning for an ignorable item, click Confirm Alert Details > Ignore > OK > Precheck Again to skip the warning. Skipping warnings may cause data inconsistency.

Step 5: Purchase the instance

When the Success Rate reaches 100%, click Next: Purchase Instance.

On the Purchase page, configure the instance.

Parameter	Description
Billing Method	Subscription: Pay upfront for 1–9 months or 1, 2, 3, or 5 years. More cost-effective for long-term use. Pay-as-you-go: Billed hourly. Release the instance when no longer needed to avoid charges.
Resource Group Configuration	Resource group for the instance. Default: default resource group. See What is Resource Management?
Link Specification	Determines synchronization throughput. Choose based on your data volume and latency requirements. See Data synchronization link specifications.

Read the Data Transmission Service (Pay-as-you-go) Service Terms and click Buy and Start. In the confirmation dialog box, click OK.

You can monitor task progress on the Data Synchronization page.

Limitations

Source database requirements

Bandwidth: The source database server must have outbound bandwidth of at least 100 Mbit/s. Lower bandwidth reduces synchronization speed.
Table structure: Tables to synchronize must have primary keys or UNIQUE constraints with unique fields. Otherwise, duplicate data may appear in the destination. If the destination table is not created by DTS (that is, Synchronization Types does not include Schema Synchronization), the destination table must have the same primary key or non-null UNIQUE constraint as the source table.
Database name: Database names cannot contain hyphens (-). For example, dts-testdata cannot be synchronized.
Unsupported objects: DTS does not synchronize TimescaleDB extension tables, tables with cross-schema inheritance, or tables with expression-based unique indexes.
Large table-level tasks: If table-level synchronization with object editing covers more than 5,000 tables in a single task, split them into multiple tasks or synchronize at the database level instead.
WAL retention:
- For incremental-only tasks: retain WAL logs for more than 24 hours.
- For full + incremental tasks: retain WAL logs for at least 7 days. After the initial full sync completes, you can reduce the retention to more than 24 hours.
- If the retention period is shorter than required and the task fails or data inconsistency occurs, the issue is not covered by the DTS Service-Level Agreement (SLA).
WAL configuration: Set wal_level to logical and enable Logical Replication Slot Failover for ApsaraDB RDS for PostgreSQL. This prevents logical replication interruptions during a failover.
DDL restrictions: Do not run DDL operations that change the schema of databases or tables during schema synchronization or full data synchronization. This causes the task to fail.
During full data synchronization, DTS queries the source database and acquires metadata locks, which may block DDL operations on the source.
Operation restrictions:
- Non-standard DDL syntax on synchronization objects may cause task failure or data loss.
- A major engine version upgrade on the source while the synchronization instance is running causes the instance to fail permanently. Reconfigure the synchronization instance.
- If a single incremental change exceeds 256 MB, the synchronization instance may fail permanently. Reconfigure the synchronization instance.
Long-running transactions: Long-running transactions combined with incremental synchronization prevent WAL logs from being cleared until the transactions commit. This may cause insufficient disk space on the source.
Partitioned tables: Include both the parent table and all child tables as synchronization objects. The parent table of a PostgreSQL partitioned table does not store data directly — all data is in the child tables. Missing child tables causes data inconsistency.

ClickHouse-specific requirements

Database limit: A single synchronization task can synchronize data from one database only. Configure separate tasks for each database. The total number of databases must not exceed the ClickHouse limit of 256.
Naming conventions: Database, table, and column names must comply with ClickHouse naming conventions. See Object naming convention limits.
Time-type data range: ClickHouse enforces range limits on time-type data. Values outside the supported range will be incorrect after synchronization. See Time data range limits.
Partition key: The partition key cannot be a nullable field. Only BIGINT, INT, TIMESTAMP, DATETIME, and DATE fields are supported.
Additional fields: During schema synchronization, DTS adds _sign, _is_deleted, and _version fields to the destination table. If you create the destination table manually (without Schema Synchronization), add these fields yourself. See Destination table and field structure.
REPLICA IDENTITY FULL: In the following three cases, run ALTER TABLE schema.table REPLICA IDENTITY FULL; on the tables to synchronize before writing data. This ensures data consistency. Do not run this command while the table is locked. If you skip the related precheck items, DTS runs this command automatically during instance initialization.
- When the synchronization instance starts for the first time.
- When the synchronization granularity is schema, and a new table is created in the schema or an existing table is rebuilt using RENAME.
- When you use the Modify Objects feature.
Replace schema and table with the actual schema and table names. Run this command during off-peak hours.

DTS internal behavior

Temporary tables: DTS creates the following temporary tables in the source database to track DDL statements, table structures, and heartbeat information. Do not delete these tables during synchronization — doing so will break the task. The tables are deleted automatically when the DTS instance is released. public.dts_pg_class, public.dts_pg_attribute, public.dts_pg_type, public.dts_pg_enum, public.dts_postgres_heartbeat, public.dts_ddl_command, public.dts_args_session, public.aliyun_dts_instance
Replication slot: DTS creates a replication slot with the prefix dts_sync_ in the source database to read incremental logs. DTS reads from this slot within 15 minutes. When the task fails or the instance is released, DTS attempts to clean up the replication slot automatically. If you change the database account password or remove the DTS IP address whitelist during synchronization, the replication slot cannot be cleaned up automatically. Clean it up manually to prevent disk space accumulation. If a failover occurs on the source, log on to the secondary database and clean up the replication slot manually.
Resource usage during full sync: Initial full data synchronization consumes read and write resources on both source and destination databases. Run the synchronization during off-peak hours when the CPU load on both databases is below 30%. Initial full sync runs concurrent INSERT operations, so the destination storage may be larger than the source due to table fragmentation.
External writes to destination: If data is written to the destination from a source other than DTS while the synchronization instance is running, data inconsistency may occur and the instance may fail.
Metadata validation: DTS validates data content but not metadata such as sequences. Validate metadata manually.
Task recovery: If the task fails, DTS technical support attempts recovery within 8 hours. This may involve restarting the task or modifying DTS task parameters. Database parameters are not changed. For parameters that may be modified, see Modify instance parameters.

Special cases

ApsaraDB RDS for PostgreSQL source: Do not change the endpoint or zone of the source instance during synchronization. This causes the task to fail.
Self-managed PostgreSQL source: The values of max_wal_senders and max_replication_slots must each be greater than the sum of currently active replication slots and the number of DTS instances to be created for this source.
Google Cloud SQL for PostgreSQL source: The database account must have the cloudsqlsuperuser permission. Select only objects that this account is authorized to manage, or grant the Owner permission for the objects by running:
An account with cloudsqlsuperuser permission cannot manage data owned by another account with cloudsqlsuperuser permission.
```
GRANT <owner_of_the_object_to_be_synchronized> TO <source_database_account_used_by_the_task>;
```

Appendix

Destination table and field structure

Table requirements

If you create the destination table manually (without using Schema Synchronization):

Important

If the destination table specifies an ENGINE, it must be ENGINE = ReplicatedReplacingMergeTree(_version, _is_deleted). Otherwise, data inconsistency may occur.

ClickHouse Community Edition: Create one local table and one distributed table. The distributed table name must match the source table name. The local table name must be <distributed_table_name>_local.
ClickHouse Enterprise Edition: Create a table with the same name as the source table.

DTS control fields

DTS adds three fields to the destination table to track row versions and deletion state. Use these fields in queries to get consistent results:

SELECT * FROM table_name FINAL WHERE _sign > 0;

The FINAL modifier deduplicates rows with the same sort key. The WHERE _sign > 0 condition filters out deleted rows.

How CDC operations map to ClickHouse rows:

ClickHouse uses an append-only model. DTS writes all change data capture (CDC) operations — including UPDATEs and DELETEs — as new versioned rows rather than modifying existing ones. The following example shows how three source operations appear in ClickHouse:

Source operation	Row ID	`_sign`	`_is_deleted`	`_version`	Notes
Initial INSERT (row 5)	5	1	0	1000	Row written at sync time
UPDATE row 5	5	1	0	1001	New row with higher `_version`; previous row retained
DELETE row 5	5	-1	1	1002	New row marking deletion; previous rows retained

Use FINAL in queries to merge these versioned rows and return the latest state of each row. The WHERE _sign > 0 filter excludes deleted rows from results.

Fields added by DTS:

For Community Edition earlier than 23.8:

Field	Data type	Default	Description
`_sign`	Int8	1	DML operation type: INSERT = 1, UPDATE = 1, DELETE = -1
`_version`	UInt64	1	Timestamp when the row was written to ClickHouse

For Enterprise Edition and Community Edition 23.8 and later:

Field	Data type	Default	Description
`_sign`	Int8	1	DML operation type: INSERT = 1, UPDATE = 1, DELETE = -1
`_is_deleted`	UInt8	0	Deletion marker: INSERT = 0, UPDATE = 0, DELETE = 1
`_version`	UInt64	1	Timestamp when the row was written to ClickHouse

Time data range limits

ClickHouse enforces the following limits on time-type data. Values outside these ranges will be incorrect after synchronization.

Data type	Minimum value	Maximum value
Date	1970-01-01 00:00:00	2149-06-06 00:00:00
Date32	1925-01-01 00:00:00	2283-11-11 00:00:00
DateTime	1970-01-01 08:00:00	2106-02-07 14:28:15
DateTime64	1925-01-01 08:00:00	2283-11-12 07:59:59

Partition key calculation logic

DTS calculates partition key values based on the source field type:

Source field type	Calculation
BIGINT	`intDiv(tablePartKey, 18014398509481984)`
INT	`intDiv(tablePartKey, 4194304)`
TIMESTAMP	`toYYYYMM(tablePartKey)`
DATETIME	`toYYYYMM(tablePartKey)`
DATE	`toYYYYMM(tablePartKey)`

Data type mappings

ApsaraDB RDS for PostgreSQL and ApsaraDB for ClickHouse do not have a one-to-one data type mapping. During initial schema synchronization, DTS maps types based on what the destination supports. See Data type mappings for schema synchronization.