Migrate data from PostgreSQL to an ApsaraDB for ClickHouse cluster - Data Transmission Service

Data Transmission Service (DTS) lets you migrate data from a PostgreSQL database to an ApsaraDB for ClickHouse cluster. This topic describes how to migrate data from a source ApsaraDB RDS for PostgreSQL instance.

Prerequisites

You have created a target ClickHousecluster with storage greater than the size of the source data.
Note
For more information about the supported versions of the source and destination databases, see Migration solutions.
If you want to perform incremental data migration, you must set the wal_level parameter of the source ApsaraDB RDS for PostgreSQL instance to logical. For more information, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance.

Notes

Source database limits

Bandwidth limit: The server that hosts the source database must have sufficient outbound bandwidth of 100 Mb/s or more. Otherwise, the data migration speed is affected.
Migration object limits:
- Tables to be migrated must have primary keys or UNIQUE constraints, and the fields must be unique. Otherwise, data duplication may occur in the destination database.
  Note
  If the destination table is not created by DTS (that is, you did not select Schema Migration for Migration Types), you must ensure that the table has the same primary key or a non-empty UNIQUE constraint as the source table. Otherwise, duplicate data may appear in the destination database.
  The name of the database to be migrated cannot contain a hyphen (-), such as dts-testdata.
- DTS does not support the migration of TimescaleDB extension tables, tables with cross-schema inheritance, or tables with expression-based unique indexes.
- If you migrate objects at the table level and need to edit them, for example, by mapping column names, a single data migration task supports a maximum of 1,000 tables. If you exceed this limit, an error is reported after you submit the task. In this case, split the tables into multiple migration tasks or configure a task to migrate the entire database.
For incremental migration, the write-ahead log (wal):
- Must be enabled. Set the wal_level parameter to logical.
- For an incremental migration task, DTS requires that WAL logs in the source database are retained for more than 24 hours. For a task that includes both full migration and incremental migration, DTS requires that WAL logs are retained for at least 7 days. You can change the log retention period to more than 24 hours after the full migration is complete. Otherwise, the task may fail because DTS cannot obtain the WAL logs. In extreme cases, this may cause data inconsistency or data loss. Issues caused by a log retention period shorter than the one required by DTS are not covered by the DTS Service-Level Agreement (SLA).
Operation limits:
- If you perform a DDL change on a migration object using non-standard syntax, the data migration task may fail or data may be lost.
- If you perform a major engine version upgrade on the source database while the DTS instance is running, the instance fails and cannot be recovered. You must reconfigure the DTS instance.
- During full migration, do not perform DDL operations to change the schema of the database or tables. Otherwise, the data migration task fails.
- Due to the limits of logical replication in the source database, if a single piece of incremental data to be migrated exceeds 256 MB during the migration, the DTS instance may fail and cannot be recovered. You must reconfigure the DTS instance.
- If you perform only full data migration, do not write new data to the source database. Otherwise, data inconsistency occurs between the source and destination databases. To ensure real-time data consistency, select Full Data Migration and Incremental Data Migration.
To ensure the migration task runs as expected and to prevent logical replication interruptions caused by failover, RDS for PostgreSQL must support and enable Logical Replication Slot Failover. For information about how to set this up, see Logical Replication Slot Failover.
If the source database has long-running transactions and the instance performs incremental migration, the write-ahead logs (WALs) before the transaction commit may not be cleared. This can cause the logs to accumulate and lead to insufficient disk space in the source database.

Other limits

A single data migration task can migrate data from only one database. To migrate data from multiple databases, you must configure a separate data migration task for each database.
Time-type data in ApsaraDB for ClickHouse has a range limit. If the time value to be migrated is outside this range, the time value migrated to ApsaraDB for ClickHouse will be incorrect. For more information about the range limit, see Time information.
You cannot select nullable fields for the Partition Key. Otherwise, the migration task fails.
Note
The partition key supports only fields of the BIGINT, INT, TIMESTAMP, DATETIME, and DATE types.
The number of databases to be migrated must not exceed the ApsaraDB for ClickHouse limit of 256.
The names of the databases, tables, and columns to be migrated must comply with the naming conventions of ApsaraDB for ClickHouse. For more information about the conventions, see Object naming conventions.
During the schema migration phase, DTS adds the _sign, _is_deleted, and _version columns to the target table. If you do not select Schema Migration when you configure the Migration Types, you must manually create a table in the destination and add the required columns. For more information about the table and column requirements, see Table and column information.
If the DTS instance performs an incremental data migration task, you must run the ALTER TABLE schema.table REPLICA IDENTITY FULL; command on the tables to be migrated in the source database before you write data to them. This applies to the following two scenarios and ensures data consistency. During the execution of this command, we recommend that you do not perform table lock operations. Otherwise, the tables may be locked. If you skip the relevant check in the precheck, DTS automatically runs this command during the instance initialization.
- When the instance runs for the first time.
- When the migration object granularity is Schema, and a new table is created in the schema to be migrated or a table to be migrated is rebuilt using the RENAME command.
Note
- In the command, replace schema and table with the schema name and table name of the data to be migrated.
- We recommend that you perform this operation during off-peak hours.
DTS creates the following temporary tables in the source database to obtain DDL statements for incremental data, the structure of incremental tables, and heartbeat information. Do not delete these temporary tables during migration. Otherwise, the DTS task becomes abnormal. The temporary tables are automatically deleted after the DTS instance is released.
public.dts_pg_class, public.dts_pg_attribute, public.dts_pg_type, public.dts_pg_enum, public.dts_postgres_heartbeat, public.dts_ddl_command, public.dts_args_session, and public.aliyun_dts_instance.
During incremental data migration, DTS creates a replication slot with the prefix dts_sync_ in the source database to replicate data. Using this replication slot, DTS can obtain incremental logs from the source database within the last 15 minutes. When the data migration fails or the migration instance is released, DTS attempts to automatically clear this replication slot.
Note
- If you change the password of the source database account used by the task or delete the DTS IP address from the whitelist of the source database during data migration, the replication slot cannot be automatically cleared. In this case, you must manually clear the replication slot in the source database to prevent it from accumulating and occupying disk space, which can make the source database unavailable.
- If a failover occurs in the source database, you must log on to the secondary database to manually clear the slot.
During full data migration, DTS consumes some read and write resources on the source and destination databases, which may increase the database load. Therefore, we recommend that you evaluate the performance of the source and destination databases before you migrate data and perform the migration during off-peak hours. For example, you can perform the migration when the CPU load of the source and destination databases is below 30%.
Because a full migration involves concurrent INSERT operations, fragmentation occurs in the destination database tables. As a result, the tables in the destination database use more storage space than the tables in the source database after the full migration is complete.
If data is written to the destination database from a data source other than DTS while the migration instance is running, data inconsistency may occur between the source and destination databases, and the migration instance may even fail.
DTS validates data content but does not currently support the validation of metadata such as Sequences. You must validate this metadata yourself.
If the task fails, DTS technical support will attempt to recover it within 8 hours. During the recovery process, operations such as restarting the task or adjusting its parameters may be performed.
Note
When parameters are adjusted, only DTS task parameters are modified. Database parameters remain unchanged.The parameters that may be modified include but are not limited to those described in Modify instance parameters.
When you migrate partitioned tables, you must include both the parent table and its child partitions as synchronization objects. Otherwise, data inconsistency may occur in the partitioned table.
Note
The parent table of a PostgreSQL partitioned table does not directly store data. All data is stored in the child partitions. The sync task must include the parent table and all its child partitions. Otherwise, data in the child partitions may not be synchronized, leading to data inconsistency between the source and destination.

Special cases

When the source instance is an RDS for PostgreSQL instance, do not change its endpoint or zone during migration. Otherwise, the migration fails.
When the source instance is a self-managed PostgreSQL database, make sure that the values of the `max_wal_senders` and `max_replication_slots` parameters are both greater than the sum of the number of replication slots currently in use and the number of DTS instances to be created with this database as the source.
If the source instance is Google Cloud Platform Cloud SQL for PostgreSQL, you must specify an account that has the cloudsqlsuperuser permission for the Database Account of the source database. When you select migration objects, you must select objects that this account has permission to manage, or grant this account the Owner permission for the objects to be migrated (for example, by running the GRANT <owner_of_objects_to_migrate> TO <source_db_account_for_task> command to allow this account to execute related operations as the owner of the objects).
Note
An account with `cloudsqlsuperuser` permission cannot manage data owned by another account with `cloudsqlsuperuser` permission.

Billing

Migration type	Link Configuration Fees	Data transfer cost
Schema migration and full data migration	Free of charge.	This example is free of charge.
Incremental data migration	Charged. For more information, see Billing overview.	This example is free of charge.

SQL operations supported for incremental migration

Operation type	SQL operation
DML	INSERT, UPDATE, DELETE
DDL	CREATE TABLE, DROP TABLE, TRUNCATE TABLE ADD COLUMN, MODIFY COLUMN, DROP COLUMN

Data type mappings

The data types supported by ApsaraDB RDS for PostgreSQL instances and ApsaraDB for ClickHouse clusters are different, so a one-to-one mapping is not always possible. During initial schema synchronization, Data Transmission Service (DTS) maps the data types based on the types supported by the destination database.

Permissions required for database accounts

Database	Required permissions	How to create an account and grant permissions
Source ApsaraDB RDS for PostgreSQL instance	A privileged account that owns the database to be migrated.	Create an account and Create a database.
Target ApsaraDB for ClickHouse cluster	For versions 22.8 and later: Read and write permissions on the destination database. A privileged account has the required permissions. For version 21.8: Read, write, and settings permissions, and permission to perform DDL operations.	Community-compatible Edition Account Management.

Procedure

Go to the migration task list page of the destination region. You can use one of the following methods.
From the DTS console
1. Log on to the Data Transmission Service (DTS) console.
2. In the navigation pane on the left, click Data Migration.
3. In the upper-left corner of the page, select the region where the migration instance is located.
From the DMS console
Note
The actual operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode console and Customize the layout and style of the DMS console.
1. Log on to the Data Management (DMS) console.
2. In the top navigation bar, choose Data + AI > Data Transmission (DTS) > Data Migration.
3. To the right of Data Migration Tasks, select the region where the migration instance is located.
Click Create Task to go to the task configuration page.

Configure the source and destination databases.

Category	Configuration	Description
None	Task Name	DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not have to be unique.
Source Database	Select Existing Connection	To use a database instance that has been added to the system (created or saved), select the desired database instance from the drop-down list. The database information below will be automatically configured. Note In the DMS console, this parameter is named Select a DMS database instance.. If you have not registered the database instance with the system, or do not need to use a registered instance, manually configure the database information below.
	Database Type	Select PostgreSQL.
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	Select the region where the source ApsaraDB RDS for PostgreSQL instance resides.
	Replicate Data Across Alibaba Cloud Accounts	In this example, a database instance under the current Alibaba Cloud account is used. Select No.
	Instance ID	Select the ID of the source ApsaraDB RDS for PostgreSQL instance.
	Database Name	Enter the name of the database that contains the objects to be migrated in the source ApsaraDB RDS for PostgreSQL instance.
	Database Account	Enter the database account of the source ApsaraDB RDS for PostgreSQL instance. For information about the permission requirements, see Permissions required for database accounts.
	Database Password	Enter the password that corresponds to the database account.
Destination Database	Select Existing Connection	To use a database instance that has been added to the system (created or saved), select the desired database instance from the drop-down list. The database information below will be automatically configured. Note In the DMS console, this parameter is named Select a DMS database instance.. If you have not registered the database instance with the system, or do not need to use a registered instance, manually configure the database information below.
	Database Type	Select ClickHouse.
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	Select the region where the destination ApsaraDB for ClickHouse cluster resides.
	Replicate Data Across Alibaba Cloud Accounts	In this example, a database instance under the current Alibaba Cloud account is used. Select No.
	Cluster Type	Select the type of the ApsaraDB for ClickHouse cluster as needed.
	Cluster ID	Select the ID of the destination ApsaraDB for ClickHouse cluster.
	Database Account	Enter the database account of the destination ApsaraDB for ClickHouse cluster. For information about the permission requirements, see Permissions required for database accounts.
	Database Password	Enter the password for the database account.

After you complete the configuration, click Test Connectivity and Proceed at the bottom of the page.
Note
- Ensure that the IP address segment of the DTS service is automatically or manually added to the security settings of the source and destination databases to allow access from DTS servers. For more information, see Add DTS server IP addresses to a whitelist.
- If the source or destination database is a self-managed database (the Access Method is not Alibaba Cloud Instance), you must also click Test Connectivity in the CIDR Blocks of DTS Servers dialog box that appears.

Configure the task objects.

On the Configure Objects page, configure the objects to be migrated.

Configuration	Description
Migration Types	If you only need to perform a full migration, select both Schema Migration and Full Data Migration. To perform a migration with no downtime, select Schema Migration, Full Data Migration, and Incremental Data Migration. Note If you do not select Schema Migration, ensure that a database and tables to receive the data exist in the destination database. Also, use the object name mapping feature in the Selected Objects box as needed. If you do not select Incremental Data Migration, do not write new data to the source instance during data migration to ensure data consistency.
Processing Mode of Conflicting Tables	Precheck and Report Errors: Checks whether tables with the same names exist in the destination database. If no tables with the same names exist, the precheck item is passed. If tables with the same names exist, an error is reported during the precheck phase, and the data migration task does not start. Note If a table in the destination database has the same name but cannot be easily deleted or renamed, you can change the name of the table in the destination database. For more information, see Object name mapping. Ignore Errors and Proceed: Skips the check for tables with the same names. Warning Selecting Ignore Errors and Proceed may cause data inconsistency and business risks. For example: If the table schemas are consistent and a record in the destination database has the same primary key value as a record in the source database: During full migration, DTS keeps the record in the destination cluster. The record from the source database is not migrated to the destination database. During incremental migration, DTS does not keep the record in the destination cluster. The record from the source database overwrites the record in the destination database. If the table schemas are inconsistent, only some columns of data may be migrated, or the migration may fail. Proceed with caution.
Capitalization of Object Names in Destination Instance	You can configure the case sensitivity policy for the English names of migrated objects, such as databases, tables, and columns, in the destination instance. By default, DTS default policy is selected. You can also choose to keep it consistent with the default policy of the source or destination database. For more information, see Case sensitivity of object names in the destination database.
Source Objects	In the Source Objects box, click the objects to migrate, and then click to move them to the Selected Objects box. Note In the Source Objects box, you can select migration objects at the schema or table level.
Selected Objects	To set the name of a migration object in the destination instance, or to specify the object that receives data in the destination instance, right-click the migration object in the Selected Objects box to make changes. For more information, see Object name mapping. To remove a selected migration object, click the object in the Selected Objects box, and then click to move it to the Source Objects box. Note If you use the object name mapping feature, other objects that depend on the mapped object may fail to migrate. To set a WHERE clause to filter data, right-click the table to migrate in the Selected Objects box and set the filter condition in the dialog box that appears. For more information about how to set the condition, see Set filter conditions. To select the SQL operations for incremental migration, right-click the migration object in the Selected Objects box and select the desired SQL operations in the dialog box that appears.

Click Next: Advanced Settings to configure advanced parameters.

Configuration	Description
Dedicated Cluster for Task Scheduling	By default, DTS schedules tasks on a shared cluster. You do not need to select one. If you want more stable tasks, you can purchase a dedicated cluster to run DTS migration tasks.
Retry Time for Failed Connections	After the migration task starts, if the connection to the source or destination database fails, DTS reports an error and immediately starts continuous retry attempts. The default retry duration is 720 minutes. You can also customize the retry time within a range of 10 to 1440 minutes. We recommend that you set it to more than 30 minutes. If DTS reconnects to the source and destination databases within the set time, the migration task automatically resumes. Otherwise, the task fails. Note For multiple DTS instances that share the same source or destination, the network retry time is determined by the setting of the last created task. Because you are charged for the task during the connection retry period, we recommend that you customize the retry time based on your business needs, or release the DTS instance as soon as possible after the source and destination database instances are released.
Retry Time for Other Issues	After the migration task starts, if other non-connectivity issues occur in the source or destination database (such as a DDL or DML execution exception), DTS reports an error and immediately starts continuous retry attempts. The default retry duration is 10 minutes. You can also customize the retry time within a range of 1 to 1440 minutes. We recommend that you set it to more than 10 minutes. If the related operations succeed within the set retry time, the migration task automatically resumes. Otherwise, the task fails. Important The value of Retry Time for Other Issues must be less than the value of Retry Time for Failed Connections.
Enable Throttling for Full Data Migration	During the full migration phase, DTS consumes some read and write resources of the source and destination databases, which may increase the database load. As needed, you can choose whether to set speed limits for the full migration task. You can set Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) to reduce the pressure on the destination database. Note This configuration item is available only if you select Full Data Migration for Migration Types. You can also adjust the full migration speed after the migration instance is running.
Enable Throttling for Incremental Data Migration	As needed, you can also choose whether to set speed limits for the incremental migration task. You can set RPS of Incremental Data Migration and Data migration speed for incremental migration (MB/s) to reduce the pressure on the destination database. Note This configuration item is available only if you select Incremental Data Migration for Migration Types. You can also adjust the incremental migration speed after the migration instance is running.
Environment Tag	You can select an environment tag for the instance as needed. In this example, you do not need to select a tag.
Configure ETL	Based on your business needs, select whether to configure the ETL feature to process data. Yes: Configures the ETL feature. You must also enter data processing statements in the text box. No: Does not configure the ETL feature.
Monitoring and Alerting	Select whether to set alerts and receive alert notifications based on your business needs. No: Does not set an alert. Yes: Sets an alert. You must also set the alert threshold and alert notifications. The system sends an alert notification if the migration fails or the latency exceeds the threshold.

Click Next: Configure Database and Table Fields to configure the Type, Primary Key Column, Sort Key, Distribution Key, and Partition Key for the destination table in ClickHouse.
- DTS provides a default configuration. To modify this configuration, you can set Definition Status to All.
- The Primary Key Column and Sort Key can be composite keys. You can select multiple fields from the corresponding drop-down lists for the Primary Key Column or Sort Key. You must select one or more columns from the Primary Key Column to serve as the Partition Key. You can select only one field for the Distribution Key. For more information about primary keys, sort keys, and partition keys, see CREATE TABLE.
  Note
  You can leave the Partition Key unconfigured, but you cannot select a nullable field for it because this will cause the migration task to fail.
  The partition key supports only the BIGINT, INT, TIMESTAMP, DATETIME, and DATE data types. For information about the calculation logic, see Calculation logic for partition keys.

Save the task and run a precheck.
- To view the parameters for configuring this instance when you call the API operation, move the pointer over the Next: Save Task Settings and Precheck button and click Preview OpenAPI parameters in the bubble.
- If you do not need to view or have finished viewing the API parameters, click Next: Save Task Settings and Precheck at the bottom of the page.
Note
- Before the migration task starts, a precheck is performed. The task starts only after it passes the precheck.
- If the precheck fails, click View Details next to the failed check item, fix the issue based on the prompt, and then run the precheck again.
- If a warning is reported during the precheck:
  For check items that cannot be ignored, click View Details next to the failed item, fix the issue based on the prompt, and then run the precheck again.
  For check items that can be ignored and do not need to be fixed, you can click Confirm Alert Details, Ignore, OK, and Precheck Again to skip the alert item and run the precheck again. If you choose to shield an alert item, it may cause issues such as data inconsistency and pose risks to your business.

Purchase the instance.

When the Success Rate is 100%, click Next: Purchase Instance.

On the Purchase page, select the link specification for the data migration instance. For more information, see the following table.

Category	Parameter	Description
New Instance Class	Resource Group Settings	Select the resource group to which the instance belongs. The default value is default resource group. For more information, see What is Resource Management?
New Instance Class	Instance Class	DTS provides migration specifications with different performance levels. The link specification affects the migration speed. You can select a specification based on your business scenario. For more information, see Data migration link specifications.

After the configuration is complete, read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, and in the OK dialog box that appears, click OK.
You can view the progress of the migration instance on the Data Migration Tasks list page.
Note
- If the migration instance does not include an incremental migration task, it stops automatically. After the instance stops, its Status is Completed.
- If the migration instance includes an incremental migration task, it does not stop automatically, and the incremental migration task continues to run. While the incremental migration task is running normally, the Status of the instance is Running.

Appendix

Time information

Data type	Minimum value	Maximum value
Date	1970-01-01 00:00:00	2149-06-06 00:00:00
Date32	1925-01-01 00:00:00	2283-11-11 00:00:00
DateTime	1970-01-01 08:00:00	2106-02-07 14:28:15
DateTime64	1925-01-01 08:00:00	2283-11-12 07:59:59

Table and field information

Table information

If you do not use the object name mapping feature, the tables you create must meet the following requirements.

Important

If the destination table includes an ENGINE, it must be ENGINE = ReplicatedReplacingMergeTree(_version, _is_deleted). Otherwise, data inconsistency may occur.

ClickHouse Community Edition instance: You need to create one local table and one distributed table. The name of the distributed table must be the same as the source table name. The name of the local table must be <distributed_table_name>_local.
ClickHouse Enterprise Edition instance: You need to create a table with the same name as the source table.

Field information

Note

In a ClickHouse instance, you can run the select * from table_name final where _sign>0; statement to query data. The where condition filters out deleted data, and the final keyword after the table name filters out data with the same sort key.

Version	Name	Data type	Default value	Description
Community Edition earlier than 23.8	_sign	Int8	1	The type of DML operation. INSERT: The record is 1. UPDATE: The record is 1. DELETE: The record is -1.
Community Edition earlier than 23.8	_version	UInt64	1	The timestamp when the data was written to ClickHouse.
Enterprise Edition and Community Edition 23.8 and later	_sign	Int8	1	The type of DML operation. INSERT: The record is 1. UPDATE: The record is 1. DELETE: The record is -1.
	_is_deleted	UInt8	0	Indicates whether the record is deleted: Insert: The record is 0. Delete: The record is 1. Update: The record is 0.
	_version	UInt64	1	The timestamp when the data was written to ClickHouse.

Calculation logic for partition keys

Source field type	Partition key calculation logic
BIGINT	`intDiv(" + tablePartKey + ", 18014398509481984)`
INT	`intDiv(" + tablePartKey + ", 4194304)`
TIMESTAMP	`toYYYYMM(" + tablePartKey + ")`
DATETIME
DATE