Use Data Transmission Service (DTS) to synchronize data from a PolarDB for PostgreSQL cluster to an ApsaraDB for SelectDB instance for large-scale data analytics. DTS supports schema synchronization, full data synchronization, and ongoing incremental data synchronization.
Prerequisites
Before you begin, make sure that you have:
A destination ApsaraDB for SelectDB instance with more disk space than the amount used by the source PolarDB for PostgreSQL cluster. For more information, see Create an instance.
The
wal_levelparameter of the source PolarDB for PostgreSQL cluster set tological. For more information, see Set cluster parameters.
Billing
| Synchronization type | Fee |
|---|---|
| Schema synchronization and full data synchronization | Free of charge |
| Incremental data synchronization | Charged. For more information, see Billing overview. |
Supported SQL operations
| Operation type | SQL statement |
|---|---|
| DML | INSERT, UPDATE, DELETE |
| DDL | ADD COLUMN, DROP COLUMN |
Required permissions
| Database | Required permissions | How to grant |
|---|---|---|
| Source PolarDB for PostgreSQL cluster | A privileged account that owns the database to be synchronized | Create a database account and Database Management |
| Destination ApsaraDB for SelectDB instance | Cluster access permissions (Usage_priv) and database access permissions (Select_priv, Load_priv, Alter_priv, Create_priv, and Drop_priv) | Cluster Permission Management and Basic Permission Management |
Limitations
Source database
All tables to be synchronized must have a primary key or a non-null unique index. If table fields are not unique, duplicate data may appear in the destination database.
For tables without a primary key or a non-null unique index: select Schema Synchronization for Synchronization Types and duplicate for Engine in the Configurations for Databases, Tables, and Columns step.
Long-running transactions in the source database can cause write-ahead logging (WAL) logs to accumulate before those transactions commit, which may exhaust disk space on the source database.
The PolarDB for PostgreSQL cluster must support and have Logical Replication Slot Failover enabled. If the cluster uses PostgreSQL 14, which does not support Logical Replication Slot Failover, a high-availability (HA) switchover may cause the synchronization instance to fail permanently and require full reconfiguration.
If a single incremental change exceeds 256 MB, the synchronization instance may fail permanently. Reconfigure the synchronization instance to recover.
Do not run DDL operations that alter schemas during initial schema synchronization or initial full data synchronization. DTS queries the source database during initial full data synchronization, which creates metadata locks that can block DDL operations.
Destination database
SelectDB data model requirements
DTS can synchronize data only to tables that use the Unique Key model or the Duplicate key model in ApsaraDB for SelectDB. Choose the model based on your requirements:
| Requirement | Recommended model | Notes |
|---|---|---|
| Consistent, deduplicated records | Unique Key model | All unique keys of the destination table must exist in the source table and the synchronization object. Missing keys cause data inconsistency. |
| Append-only or high-throughput ingestion | Duplicate key model | Duplicate data may appear if a retry occurs or two or more DML operations run on the same row after the synchronization instance starts. Use _is_deleted, _version, and _record_id columns to deduplicate. DTS converts UPDATE and DELETE statements to INSERT statements for Duplicate key model tables. |
Object and naming constraints
Only the
bucket_countparameter can be specified in the Selected Objects section. The value must be a positive integer. Default value: auto.ApsaraDB for SelectDB only supports database and table names that start with a letter. Use the object name mapping feature to rename objects that do not start with a letter.
Object names (databases, tables, or columns) that contain Chinese characters must be renamed using the object name mapping feature.
A single synchronization instance can synchronize only one database. To synchronize multiple databases, configure a separate synchronization instance for each database.
Synchronization scope and DDL limits
Synchronization of TimescaleDB extension tables and tables with cross-schema inheritance is not supported.
You cannot modify DDL operations on multiple columns at a time, or modify DDL operations on the same table consecutively.
In a multi-table merge scenario (synchronizing multiple source tables to a single destination table), the schemas of all source tables must be identical. Otherwise, data inconsistency or task failure may occur.
Partitioned tables
When synchronizing partitioned tables, include both the parent table and all its child partitions as synchronization objects. The parent table of a PostgreSQL partitioned table does not store data directly—all data is in child partitions. Omitting partitions causes data inconsistency.
REPLICA IDENTITY requirement
Run the following command on tables before writing data in these three scenarios: when the instance runs for the first time; when you select objects at the schema level and a new table is created or rebuilt using RENAME; and when you use the feature to modify synchronization objects.
ALTER TABLE schema.table REPLICA IDENTITY FULL;Replace schema and table with the actual schema name and table name. Run this command during off-peak hours. Do not perform table lock operations while this command is running. If you skip the related precheck, DTS automatically runs this command during instance initialization.
Operations that cause task failure
The following operations cause the synchronization task to fail. Restart the synchronization instance to resume:
Adding backend nodes to the destination ApsaraDB for SelectDB database during synchronization
Creating clusters in the destination ApsaraDB for SelectDB instance during synchronization
Temporary tables
DTS creates the following temporary tables in the source database to support incremental synchronization. Do not delete them during synchronization—they are automatically removed after the DTS instance is released:
public.dts_pg_class, public.dts_pg_attribute, public.dts_pg_type, public.dts_pg_enum, public.dts_postgres_heartbeat, public.dts_ddl_command, public.dts_args_session, and public.aliyun_dts_instance.
Replication slot
DTS creates a replication slot with the prefix dts_sync_ in the source database. This slot retains incremental logs for the last 15 minutes. When the synchronization task fails or the instance is released, DTS attempts to clean up the slot automatically.
If you change the password of the source database account or remove the DTS IP address whitelist during synchronization, the replication slot cannot be cleaned up automatically. Manually clean up the replication slot to prevent disk space from being exhausted. If a primary/secondary switchover occurs, log on to the secondary database to perform the cleanup.

Incremental synchronization latency
DTS uses a batch synchronization policy to reduce load on the destination. By default, DTS writes data at most once every 5 seconds per synchronization object, which results in a normal synchronization latency of up to 10 seconds. To reduce this latency, modify the selectdb.reservoir.timeout.milliseconds parameter of the DTS instance in the console. The valid range is 1,000 to 10,000 milliseconds.
A lower batching time increases write frequency, which may increase the load and write response time of the destination and, in turn, increase the DTS synchronization latency. Adjust the batching time based on the load of the destination.
Other notes
DTS does not check the validity of metadata such as sequences. Manually verify metadata validity before and after synchronization.
Perform full data synchronization when the CPU load of both the source and destination databases is below 30%.
If an instance fails, the DTS support team will attempt to recover it within 8 hours. During recovery, only DTS instance parameters may be modified—database parameters are not changed. For the parameters that may be modified, see Modify instance parameters.
Create a synchronization task
Step 1: Go to the Data Synchronization page
Use one of the following consoles.
DTS console
Log on to the .DTS console
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner of the page, select the region where the synchronization task resides.
DMS console
The actual operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode and Customize the layout and style of the DMS console.
Log on to the .DMS console
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list to the right of Data Synchronization Tasks, select the region where the synchronization instance resides.
Step 2: Configure source and destination databases
Click Create Task to go to the task configuration page.
Configure the source and destination databases using the following parameters.
Category Parameter Description None Task Name The task name. DTS automatically generates a name. Specify a descriptive name to make the task easy to identify. A unique name is not required. Source Database Select Existing Connection If you registered the database instance with DTS, select it from the drop-down list. DTS automatically fills in the database parameters. Otherwise, configure the following parameters manually. Database Type Select PolarDB for PostgreSQL. Access Method Select Alibaba Cloud Instance. Instance Region Select the region where the source PolarDB for PostgreSQL cluster resides. Replicate Data Across Alibaba Cloud Accounts Select No if the source and destination databases belong to the same Alibaba Cloud account. Instance ID Select the ID of the source PolarDB for PostgreSQL cluster. Database Name Enter the name of the database that contains the objects to be synchronized. Database Account Enter the database account. For permission requirements, see Required permissions. Database Password Enter the password for the database account. Destination Database Select Existing Connection If you registered the database instance with DTS, select it from the drop-down list. DTS automatically fills in the database parameters. Otherwise, configure the following parameters manually. Database Type Select SelectDB. Access Method Select Alibaba Cloud Instance. Instance Region Select the region where the destination SelectDB instance resides. Replicate Data Across Alibaba Cloud Accounts Select No if the source and destination databases belong to the same Alibaba Cloud account. Instance ID Select the ID of the destination SelectDB instance. Database Account Enter the database account. For permission requirements, see Required permissions. Database Password Enter the password for the database account. Click Test Connectivity and Proceed.
Make sure the CIDR blocks of DTS servers are added to the security settings of both the source and destination databases. For more information, see Add DTS server IP addresses to a whitelist.
Step 3: Configure synchronization objects
In the Configure Objects step, configure the synchronization parameters.
Parameter Description Synchronization Types By default, Incremental Data Synchronization is selected. Also select Schema Synchronization and Full Data Synchronization. After the precheck completes, DTS synchronizes historical data as the basis for ongoing incremental synchronization. ImportantWhen synchronizing from PolarDB for PostgreSQL to SelectDB, data types are converted. If you do not select Schema Synchronization, create the destination tables manually using the corresponding Unique or Duplicate model. For more information, see Data type mapping, Additional column information, and Data model.
Processing Mode of Conflicting Tables Precheck and Report Errors: DTS checks for tables with the same name in the destination database. If a duplicate table exists, the precheck fails and the task does not start. NoteIf you cannot delete or rename the duplicate table, use the object name mapping feature. For more information, see Map schema, table, and column names. Ignore Errors and Proceed: DTS skips the check for duplicate table names.
WarningThis option may cause data inconsistency. If table schemas are the same and a record has a matching primary key or unique key, the source record overwrites the destination record. If schemas differ, initialization may fail or only some columns may be synchronized.
Capitalization of Object Names in Destination Instance The capitalization policy for database names, table names, and column names in the destination instance. Default: DTS default policy. For more information, see Specify the capitalization of object names in the destination instance. Source Objects Select one or more objects and click the
icon to move them to Selected Objects. Objects can be selected at the schema, table, or column level.Selected Objects - To rename a destination object or specify a receiving object, right-click it in the Selected Objects section. For more information, see Map object names. - To remove a selected object, click it and then click the
icon. - To set the bucket_countparameter for a table (available only when Schema Synchronization is selected and objects are selected at the table level): right-click the table, set Enable Parameter Settings to Yesalert notification settings, enter the value, and click OK.NoteRenaming an object using object name mapping may cause dependent objects to fail synchronization. To filter data with WHERE conditions, right-click a table and specify the conditions. For more information, see Specify filter conditions. To select SQL operations for incremental synchronization, right-click an object and select the operations.
Click Next: Advanced Settings and configure the following parameters.
Parameter Description Dedicated Cluster for Task Scheduling By default, DTS schedules the task to the shared cluster. To improve stability, purchase a dedicated cluster. For more information, see What is a DTS dedicated cluster. Retry Time for Failed Connections The time range for retrying failed connections. Valid values: 10–1440 minutes. Default: 720 minutes. Set to more than 30 minutes. If DTS reconnects within this window, the task resumes. Otherwise, the task fails. NoteIf multiple tasks share the same source or destination database and have different retry windows, the shortest window takes precedence. DTS charges for the instance during retries.
Retry Time for Other Issues The time range for retrying failed DDL or DML operations. Valid values: 1–1440 minutes. Default: 10 minutes. Set to more than 10 minutes. This value must be smaller than Retry Time for Failed Connections. Enable Throttling for Full Data Synchronization Throttle full data synchronization to reduce load on the source and destination. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected. Enable Throttling for Incremental Data Synchronization Throttle incremental data synchronization by configuring RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s). Environment Tag An optional tag to identify the instance. Configure ETL Specify whether to enable the extract, transform, and load (ETL) feature. Select Yes to enter data processing statements in the code editor. For more information, see Configure ETL in a data migration or data synchronization task. Select No to skip ETL configuration. Monitoring and Alerting Specify whether to configure alerting. Select Yes to configure the alert threshold and notification settings. For more information, see the Configure monitoring and alerting when you create a DTS task section. (Optional) Click Next: Configure Database and Table Fields to set the Primary Key Column, Distribution Key, and Engine for destination tables.
- This step is available only when Schema Synchronization is selected for Synchronization Types. Set Definition Status to All to make modifications. - You can select multiple columns as a composite primary key. At least one column in Primary Key Column must also be selected as the Distribution Key. - For tables without a primary key or UNIQUE constraint, set Engine to duplicate. Otherwise, the synchronization may fail or data may be lost.
Step 4: Run the precheck
Click Next: Save Task Settings and Precheck. DTS runs a precheck before the synchronization task can start. The task starts only after the precheck passes.
To view the API parameters for this configuration, move the pointer over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
If the precheck fails, click View Details next to each failed item, troubleshoot the issue, and rerun the precheck.
If an alert is triggered:
If the alert cannot be ignored, click View Details, fix the issue, and rerun the precheck.
If the alert can be ignored, click Confirm Alert Details, click Ignore in the View Details dialog box, click OK, and then click Precheck Again.
Ignoring alerts may cause data inconsistency and expose your business to risk.
Step 5: Purchase the instance
Wait until the Success Rate reaches 100%, then click Next: Purchase Instance.
On the buy page, configure the following parameters.
Parameter Description Billing Method Subscription: Pay upfront for a fixed duration. More cost-effective for long-term use. Pay-as-you-go: Billed hourly. Suitable for short-term use. Release the instance when no longer needed to reduce costs. Resource Group Settings The resource group for the synchronization instance. Default: default resource group. For more information, see What is Resource Management? Instance Class The synchronization speed tier. Select based on your business requirements. For more information, see Instance classes of data synchronization instances. Subscription Duration Available only for the subscription billing method. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years. Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start. In the dialog box that appears, click OK.
The task appears in the task list. Monitor its progress from there.
Data type mapping
The following table shows how PolarDB for PostgreSQL data types map to SelectDB data types.
| Category | PolarDB for PostgreSQL data type | SelectDB data type |
|---|---|---|
| Numeric | SMALLINT | SMALLINT |
| INTEGER | INT | |
| BIGINT | BIGINT | |
| DECIMAL | DECIMAL | |
| NUMERIC | DECIMAL | |
| REAL | DOUBLE | |
| DOUBLE | DOUBLE | |
| SMALLSERIAL | SMALLINT | |
| SERIAL | INT | |
| BIGSERIAL | BIGINT | |
| Monetary | MONEY | STRING |
| Character | CHAR(n), VARCHAR(n) | VARCHAR. Important To prevent data loss, CHAR(n) and VARCHAR(n) are converted to VARCHAR(4\*n). If no length is specified, the SelectDB default VARCHAR(65533) is used. If the length exceeds 65533, the data is converted to STRING. |
| TEXT | STRING | |
| Binary | BYTEA | STRING |
| Date and time | TIMESTAMP [(P)] [WITHOUT TIME ZONE] | DATETIMEV2 |
| TIMESTAMP [(P)] WITH TIME ZONE | DATETIMEV2 | |
| DATE | DATEV2 | |
| TIME [(P)] [WITHOUT TIME ZONE] | VARCHAR(50) | |
| TIME [(P)] WITH TIME ZONE | VARCHAR(50) | |
| INTERVAL [FIELDS] [(P)] | STRING | |
| Boolean | BOOLEAN | BOOLEAN |
| Geometric | POINT, LINE, LSEG, BOX, PATH, POLYGON, CIRCLE | STRING |
| Network address | CIDR, INET, MACADDR, MACADDR8 | STRING |
| Text search | TSVECTOR | STRING |
| XML | XML | STRING |
| JSON | JSON | JSON |
Additional column information
For tables using the Duplicate key model, DTS automatically adds or requires the following columns in the destination table.
| Column name | Data type | Default value | Description |
|---|---|---|---|
_is_deleted | Int | 0 | Deletion flag. Insert: 0. Update: 0. Delete: 1. |
_version | Bigint | 0 | Full data synchronization: 0. Incremental data synchronization: the corresponding timestamp in seconds from the source database's binary log. |
_record_id | Bigint | 0 | Full data synchronization: 0. Incremental data synchronization: the record ID from the incremental log, which uniquely identifies the log entry. The value is unique and increments. |