Data Transmission Service (DTS) supports synchronizing data from a PostgreSQL database to an ApsaraDB for ClickHouse cluster. This topic uses an ApsaraDB RDS for PostgreSQL instance as an example to describe the procedure.
Prerequisites
An ApsaraDB for ClickHouse cluster is created. The storage space of the cluster must be larger than the data volume that you want to synchronize.
NoteFor information about the supported versions of the source and destination databases, see Synchronization solution overview.
The
wal_levelparameter of the source ApsaraDB RDS for PostgreSQL instance is set tological. For more information, see Set instance parameters.
Precautions
Source database limits
Bandwidth limit: The server where the source database resides must have sufficient outbound bandwidth of at least 100 Mbit/s. Otherwise, the data synchronization speed is affected.
Synchronization object limits:
The tables to be synchronized must have primary keys or UNIQUE constraints, and the fields must be unique. Otherwise, duplicate data may exist in the destination database.
NoteIf the destination table is not created by DTS (that is, Synchronization Types is not set to Schema Synchronization), you must ensure that the table has the same primary key or non-null UNIQUE constraint as the source table. Otherwise, duplicate data may occur in the destination database.
The name of the database to be synchronized cannot contain hyphens (-). For example, a database named dts-testdata cannot be synchronized.
DTS does not synchronize TimescaleDB extension tables, tables with cross-schema inheritance, or tables that contain expression-based unique indexes.
If you synchronize data at the table level and need to edit objects, such as mapping column names, and the number of tables in a single synchronization task exceeds 5,000, split the tables into multiple tasks or configure a task to synchronize the entire database. Otherwise, a request error may be reported after you submit the task.
Write-ahead log (WAL):
WAL must be enabled. Set the wal_level parameter to logical.
For an incremental synchronization task, DTS requires that the WAL logs in the source database are retained for more than 24 hours. For a task that performs both full and incremental synchronization, DTS requires that the WAL logs are retained for at least 7 days. You can change the log retention period to more than 24 hours after the initial full data synchronization is complete. If the task fails because DTS cannot obtain the required WAL logs, or in extreme cases, data inconsistency or data loss occurs, the issue is not covered by the DTS Service-Level Agreement (SLA) because the specified log retention period is shorter than required.
Operation limits:
If you perform DDL changes that use non-standard syntax on the synchronization objects, the synchronization task may fail or data may be lost.
If you perform a major engine version upgrade on the source database while the synchronization instance is running, the instance fails and cannot be recovered. You must reconfigure the synchronization instance.
Due to the limits of logical subscription in the source database, if a single piece of data to be synchronized exceeds 256 MB after an incremental change, the synchronization instance may fail and cannot be recovered. You must reconfigure the synchronization instance.
During schema synchronization and full data synchronization, do not perform Data Definition Language (DDL) operations that change the schema of databases or tables. Otherwise, the data synchronization task fails.
NoteDuring the full synchronization phase, DTS queries the source database, which acquires metadata locks. This may block DDL operations on the source database.
To ensure that the sync task runs properly and to prevent logical replication interruptions during a failover, you must enable Logical Replication Slot Failover for ApsaraDB RDS for PostgreSQL.
If the source database has long-running transactions and the instance includes an incremental synchronization task, the write-ahead logs (WALs) generated before the long-running transactions are committed cannot be cleared and may accumulate. This can cause the disk space of the source database to become insufficient.
Other limits
A single data synchronization task can synchronize data from only one database. To synchronize data from multiple databases, you must configure a data synchronization task for each database.
Time-type data in ApsaraDB for ClickHouse has a range limit. If the time value to be synchronized is outside this range, the value synchronized to ApsaraDB for ClickHouse will be incorrect. For more information about the range limit, see Time information.
The Partition Key cannot be a nullable field. Otherwise, the synchronization task will fail.
NoteOnly fields of the BIGINT, INT, TIMESTAMP, DATETIME, and DATE types are supported as partition keys.
The number of databases to be synchronized must not exceed the limit of ApsaraDB for ClickHouse, which is 256.
The names of the databases, tables, and columns to be synchronized must comply with the naming conventions of ApsaraDB for ClickHouse. For more information about the naming conventions, see Object naming convention limits.
During the schema synchronization phase, DTS adds the _sign, _is_deleted, and _version fields to the target table. If you do not select Schema Synchronization when you configure the Synchronization Types, you must manually create the target table and add these additional fields. For more information, see Table and field information.
In the following three scenarios, you must run the
ALTER TABLE schema.table REPLICA IDENTITY FULL;command on the tables to be synchronized before you write data to them. This ensures data consistency. Do not perform table locking operations during the execution of this command. Otherwise, the tables may be locked. If you skip the related check items in the precheck, DTS automatically runs this command during the initialization of the instance.When the instance runs for the first time.
When the synchronization granularity is schema, and a new table is created in the schema to be synchronized or a table to be synchronized is rebuilt using the RENAME command.
When you use the Modify Objects feature.
NoteIn the command, replace
schemaandtablewith the names of the schema and table to which the data to be synchronized belongs.Perform this operation during off-peak hours.
DTS creates the following temporary tables in the source database to obtain the DDL statements of incremental data (the DDL statements are not written to the destination database), the structure of incremental tables, and heartbeat information. During synchronization, do not delete these temporary tables. Otherwise, the DTS task becomes abnormal. The temporary tables are automatically deleted after the DTS instance is released.
public.dts_pg_class,public.dts_pg_attribute,public.dts_pg_type,public.dts_pg_enum,public.dts_postgres_heartbeat,public.dts_ddl_command,public.dts_args_session, andpublic.aliyun_dts_instance.During data synchronization, DTS creates a replication slot with the prefix
dts_sync_in the source database to replicate data. DTS uses this replication slot to obtain incremental logs from the source database within 15 minutes. When the data synchronization fails or the synchronization instance is released, DTS attempts to automatically clear this replication slot.NoteIf you change the password of the database account used by the task or delete the DTS IP address whitelist from the source database during data synchronization, the replication slot cannot be automatically cleared. In this case, you must manually clear the replication slot in the source database to prevent it from accumulating and occupying disk space, which can make the source database unavailable.
If a failover occurs on the source database, you must log on to the secondary database to manually clear the replication slot.

During initial full data synchronization, DTS consumes read and write resources on the source and destination databases, which may increase the database load. Therefore, evaluate the performance of your source and destination databases before you synchronize data. We recommend that you perform the synchronization during off-peak hours, for example, when the CPU load of both databases is below 30%.
Initial full data synchronization runs concurrent INSERT operations. This causes table fragmentation in the destination database. As a result, the storage space occupied by the tables in the destination database is larger than that in the source database after the initial full data synchronization is complete.
While the synchronization instance is running, if data is written to the destination database from a data source other than DTS, data inconsistency may occur between the source and destination databases. The synchronization instance may also fail.
DTS validates data content but does not validate metadata such as sequences. You must validate the metadata yourself.
If the task fails, DTS technical support will attempt to recover it within 8 hours. During the recovery process, operations such as restarting the task or adjusting its parameters may be performed.
NoteWhen parameters are adjusted, only DTS task parameters are modified. Database parameters remain unchanged.The parameters that may be modified include but are not limited to those described in Modify instance parameters.
When you synchronize partitioned tables, you must include both the parent table and its child tables as synchronization objects. Otherwise, data inconsistency may occur in the partitioned table.
NoteThe parent table of a PostgreSQL partitioned table does not directly store data. All data is stored in the child tables. The synchronization task must include the parent table and all its child tables. Otherwise, data in the child tables may not be synchronized, leading to data inconsistency between the source and destination.
Special cases
When the source instance is an ApsaraDB RDS for PostgreSQL instance
During synchronization, do not change the endpoint or zone of the ApsaraDB RDS for PostgreSQL instance. Otherwise, the synchronization fails.
When the source instance is a self-managed PostgreSQL database
Make sure that the values of the max_wal_senders and max_replication_slots parameters are greater than the sum of the number of replication slots in use and the number of DTS instances to be created with this self-managed PostgreSQL database as the source.
When the source instance is Google Cloud Platform Cloud SQL for PostgreSQL, the Database Account for the source database must have the `cloudsqlsuperuser` permission. When you select synchronization objects, you must select objects that this account is authorized to manage, or grant the Owner permission for the objects to be synchronized to this account (for example, by running the
GRANT <owner_of_the_object_to_be_synchronized> TO <source_database_account_used_by_the_task>command to allow this account to perform related operations as the object owner).NoteAn account with the cloudsqlsuperuser permission cannot manage data whose owner is another account with the cloudsqlsuperuser permission.
Billing
Synchronization type | Pricing |
Schema synchronization and full data synchronization | Free of charge. |
Incremental data synchronization | Charged. For more information, see Billing overview. |
Supported SQL operations for incremental synchronization
Operation type | SQL statement |
DML | INSERT, UPDATE, DELETE |
DDL |
|
Data type mappings
ApsaraDB RDS for PostgreSQL instances and ApsaraDB for ClickHouse clusters support different data types that do not have a one-to-one mapping. Therefore, during initial schema synchronization, DTS performs type mapping based on the data types supported by the destination database.
Permissions required for database accounts
Database | Required permissions | Creation and authorization method |
Source ApsaraDB RDS for PostgreSQL instance | A privileged account that is the owner of the database to be synchronized. | |
Destination ApsaraDB for ClickHouse cluster |
|
Procedure
Navigate to the list of synchronization tasks in the destination region. You can use one of the following methods:
Go to the page from the DTS console
Log on to the Data Transmission Service (DTS) console.
In the left navigation pane, click Data Synchronization.
In the upper-left corner of the page, select the region where the synchronization instance resides.
Go to the page from the DMS console
NoteThe actual operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode console and Customize the layout and style of the DMS UI.
Log on to Data Management (DMS).
On the top menu bar, choose .
To the right of Data Synchronization Tasks, select the region where the synchronization instance is located.
Click Create Task to go to the task configuration page.
Configure the source and destination databases.
Category
Configuration
Description
None
Task Name
DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not have to be unique.
Source Database
Select Existing Connection
If you want to use a database instance that is added to the system (newly created or saved), select the database instance from the drop-down list. The database information is automatically configured.
NoteIn the DMS console, this configuration item is named Select a DMS database instance..
If you have not added the database instance to the system, or you do not need to use an instance that is already added, you must manually configure the following database information.
Database Type
Select PostgreSQL.
Access Method
Select Alibaba Cloud Instance.
Instance Region
Select the region where the source ApsaraDB RDS for PostgreSQL instance resides.
Replicate Data Across Alibaba Cloud Accounts
In this example, a database instance that belongs to the current Alibaba Cloud account is used. Select No.
Instance ID
Select the ID of the source ApsaraDB RDS for PostgreSQL instance.
Database Name
Enter the name of the database in the source ApsaraDB RDS for PostgreSQL instance that contains the objects to be synchronized.
Database Account
Enter the database account of the source ApsaraDB RDS for PostgreSQL instance. For more information about the permission requirements, see Permissions required for database accounts.
Database Password
Enter the password that corresponds to the database account.
Destination Database
Select Existing Connection
If you want to use a database instance that is added to the system (newly created or saved), select the database instance from the drop-down list. The database information is automatically configured.
NoteIn the DMS console, this configuration item is named Select a DMS database instance..
If you have not added the database instance to the system, or you do not need to use an instance that is already added, you must manually configure the following database information.
Database Type
Select ClickHouse.
Access Method
Select Alibaba Cloud Instance.
Instance Region
Select the region where the destination ApsaraDB for ClickHouse cluster resides.
Replicate Data Across Alibaba Cloud Accounts
In this example, a database instance that belongs to the current Alibaba Cloud account is used. Select No.
Cluster Type
You can select the type of ApsaraDB for ClickHouse cluster based on your specific needs.
Cluster ID
Select the ID of the destination ApsaraDB for ClickHouse cluster.
Database Account
Enter the database account of the destination ApsaraDB for ClickHouse cluster. For more information about the permission requirements, see Permissions required for database accounts.
Database Password
Enter the password that corresponds to the database account.
After you have completed the configuration, click Test Connectivity and Proceed at the bottom of the page.
NoteEnsure that the IP address blocks of DTS servers are added to the security settings of the source and destination databases to allow access from DTS servers. This can be done automatically or manually. For more information, see Add the IP address blocks of DTS servers to a whitelist.
If the source or destination database is a self-managed database (where the Access Method is not Alibaba Cloud Instance), you must also click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.
Configure task objects.
On the Configure Objects page, you can configure the objects to be synchronized.
Configuration
Description
Synchronization Types
Incremental Data Synchronization is selected by default. You must also select Schema Synchronization and Full Data Synchronization. After the precheck is complete, DTS initializes the full data of the objects to be synchronized from the source instance to the destination cluster, which serves as the baseline data for subsequent incremental synchronization.
Processing Mode of Conflicting Tables
Precheck and Report Errors: Checks whether a table with the same name exists in the destination database. If a table with the same name does not exist, the precheck passes. If a table with the same name exists, an error is reported during the precheck, and the data synchronization task does not start.
NoteIf you cannot delete or rename the table with the same name in the destination database, you can map it to a different table name. For more information, see Map table and column names.
Ignore Errors and Proceed: Skips the check for duplicate table names in the destination database.
WarningSelecting Ignore Errors and Proceed may cause data inconsistency and put your business at risk. For example:
If the table schemas are the same and a record in the destination database has the same primary key or unique key value as a record in the source database:
During full data synchronization, DTS retains the record in the destination cluster. The corresponding record from the source database is not synchronized.
During incremental data synchronization, the record from the source database overwrites the record in the destination database.
If the table schemas are different, initial data synchronization may fail, data from only some columns may be synchronized, or the entire synchronization may fail. Proceed with caution.
Capitalization of Object Names in Destination Instance
You can configure the case sensitivity policy for database, table, and column object names that are synchronized to the destination instance. By default, the DTS default policy is selected. You can also choose to use the default policies of the source and destination databases. For more information, see Case sensitivity policy for destination object names.
Source Objects
In the Source Objects box, click an object to sync, and then click
to move it to the Selected Objects box.NoteIn the Source Objects box, you can select objects for synchronization at the schema or table level.
Selected Objects
To set the name of a synchronization object in the destination instance or specify which object receives the data, right-click the synchronization object in the Selected Objects box and modify it. For more information, see Map table and column names.
To remove a synchronization object, click it in the Selected Objects box and then click
to move it to the Source Objects box.
NoteIf you use the object name mapping feature, the synchronization of other objects that depend on the mapped object may fail.
To set a WHERE clause to filter data, right-click the table to synchronize in the Selected Objects box. In the dialog box that appears, set the filter condition. For more information, see Set filter conditions.
To select SQL operations for incremental synchronization, right-click the synchronization object in the Selected Objects box and select the desired operations from the dialog box that appears.
Click Next: Advanced Settings.
Configuration
Description
Dedicated Cluster for Task Scheduling
By default, DTS schedules tasks on a shared cluster, and you do not need to select a cluster. For more stable performance, you can purchase a dedicated cluster to run DTS synchronization tasks. For more information, see What is a DTS dedicated cluster?.
Retry Time for Failed Connections
After a synchronization task starts, if the connection to the source or destination database fails, DTS reports an error and immediately begins to retry the connection. The default retry duration is 720 minutes. You can also specify a custom retry duration from 10 to 1,440 minutes. We recommend that you set the duration to 30 minutes or more. If DTS successfully reconnects to the database within the specified duration, the synchronization task automatically resumes. Otherwise, the task fails.
NoteIf you have multiple DTS instances (for example, Instance A and Instance B) that share the same source or destination, and you set the network retry time to 30 minutes for Instance A and 60 minutes for Instance B, the shorter duration of 30 minutes is used for both.
Because DTS charges for task runtime during the connection retry period, we recommend that you customize the retry duration based on your business needs or release the DTS instance as soon as possible after the source and destination database instances are released.
Retry Time for Other Issues
After the synchronization task starts, if other non-connectivity issues occur with the source or destination database (such as DDL or DML execution exceptions), DTS reports an error and immediately starts continuous retry operations. The default retry duration is 10 minutes. You can also customize the retry duration within the range of 1 to 1,440 minutes. We recommend that you set it to 10 minutes or more. If the relevant operations are successful within the set retry duration, the synchronization task automatically resumes. Otherwise, the task fails.
ImportantThe value for Retry Time for Other Issues must be less than that for Retry Time for Failed Connections.
Enable Throttling for Full Data Synchronization
During the full synchronization phase, DTS uses read and write resources from the source and destination databases, which can increase the database load. To reduce the load on the destination database, you can set a rate limit for the full synchronization task by configuring the Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) parameters.
NoteThis configuration item is available only when Synchronization Types is set to Full Data Synchronization.
You can also adjust the full synchronization rate after the synchronization instance is running.
Enable Throttling for Incremental Data Synchronization
You can also set a rate limit for the incremental synchronization task. To relieve pressure on the destination database, set the RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Environment Tag
You can select an environment tag to identify the instance as needed. In this example, no tag is selected.
Configure ETL
Choose whether to enable the extract, transform, and load (ETL) feature. For more information, see What is ETL? Valid values:
Yes: Enables the ETL feature. Enter data processing statements in the code editor. For more information, see Configure ETL in a data migration or data synchronization task.
No: Disables the ETL feature.
Monitoring and Alerting
Specify whether to configure alerts. If the synchronization fails or the latency exceeds the specified threshold, a notification is sent to an alert contact.
No: No alerts are configured.
Yes: Sets an alert. You also need to specify the alert threshold and alert notifications. For more information, see Configure monitoring and alerting during task configuration.
Click Next: Configure Database and Table Fields to configure the Type, Primary Key Column, Sort Key, Distribution Key, and Partition Key for the tables to synchronize to ClickHouse.
DTS provides a default configuration. You can set Definition Status to All to modify the configuration.
The Primary Key Column and Sort Key can be composite keys. You can select multiple fields from the drop-down lists to define the Primary Key Column or Sort Key. You must also select one or more columns from the Primary Key Column as the Partition Key. Only one field can be selected as the Distribution Key. For more information about primary key columns, sort keys, and partition keys, see CREATE TABLE.
NoteThe Partition Key is optional, but it cannot be a nullable field, or the synchronization task will fail.
The partition key supports only fields of the BIGINT, INT, TIMESTAMP, DATETIME, and DATE data types. For the calculation logic, see Calculation logic for partition keys.
Save the task and run a precheck.
To view the API parameters for configuring this instance, hover over the Next: Save Task Settings and Precheck button and click Preview OpenAPI parameters in the bubble.
If you have finished viewing the API parameters, click Next: Save Task Settings and Precheck at the bottom of the page.
NoteBefore the synchronization job starts, DTS runs a precheck. The job can start only after all precheck items are passed.
If the precheck fails, click View Details for the failed item. Fix the issue as prompted, and then run the precheck again.
If the precheck generates a warning:
If a check item fails and cannot be ignored, click View Details next to the item. Follow the instructions to fix the issue, and then run the precheck again.
For check items that can be ignored, you can click Confirm Alert Details, Ignore, OK, and Precheck Again in sequence to skip the warning and rerun the precheck. If you choose to shield the warning item, it may cause issues such as data inconsistency and pose risks to your business.
Purchase the instance.
When the Success Rate is 100%, click Next: Purchase Instance.
On the Purchase page, select the billing method and link specification for the data synchronization instance. The following table describes these options in detail.
Category
Parameter
Description
New Instance Class
Billing Method
Subscription: You pay when you create the instance. This is suitable for long-term needs and is more cost-effective than pay-as-you-go. The longer the subscription duration, the higher the discount.
Pay-as-you-go: You are charged on an hourly basis. This is suitable for short-term needs. You can release the instance immediately after use to save costs.
Resource Group Configuration
The resource group to which the instance belongs. The default is default resource group. For more information, see What is Resource Management?.
Link Specification
DTS provides synchronization specifications with different performance levels. The synchronization link specification affects the synchronization rate. You can choose a specification based on your business scenario. For more information, see Data synchronization link specifications.
Subscription Duration
In subscription mode, select the duration and quantity for the subscription instance. You can choose a monthly subscription from 1 to 9 months, or a yearly subscription of 1, 2, 3, or 5 years.
NoteThis option appears only when the billing method is Subscription.
After you complete the configuration, read and check the Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start. In the OK dialog box, click OK.
You can view the task progress on the Data Synchronization page.
Appendix
Time information
Data type | Minimum value | Maximum value |
Date | 1970-01-01 00:00:00 | 2149-06-06 00:00:00 |
Date32 | 1925-01-01 00:00:00 | 2283-11-11 00:00:00 |
DateTime | 1970-01-01 08:00:00 | 2106-02-07 14:28:15 |
DateTime64 | 1925-01-01 08:00:00 | 2283-11-12 07:59:59 |
Table and field information
Table information
If you do not use the object name mapping feature, the tables you create must meet the following requirements.
If the destination table includes an ENGINE, it must be ENGINE = ReplicatedReplacingMergeTree(_version, _is_deleted). Otherwise, data inconsistency may occur.
ClickHouse Community Edition instance: You need to create one local table and one distributed table. The name of the distributed table must be the same as the source table name. The name of the local table must be
<distributed_table_name>_local.ClickHouse Enterprise Edition instance: You need to create a table with the same name as the source table.
Field information
In a ClickHouse instance, you can run the select * from table_name final where _sign>0; statement to query data. The where condition filters out deleted data, and the final keyword after the table name filters out data with the same sort key.
Version | Name | Data type | Default value | Description |
Community Edition earlier than 23.8 | _sign | Int8 | 1 | The type of DML operation.
|
_version | UInt64 | 1 | The timestamp when the data was written to ClickHouse. | |
Enterprise Edition and Community Edition 23.8 and later | _sign | Int8 | 1 | The type of DML operation.
|
_is_deleted | UInt8 | 0 | Indicates whether the record is deleted:
| |
_version | UInt64 | 1 | The timestamp when the data was written to ClickHouse. |
Calculation logic for partition keys
Source field type | Partition key calculation logic |
BIGINT |
|
INT |
|
TIMESTAMP |
|
DATETIME | |
DATE |