Data Transmission Service (DTS) supports data synchronization from a PostgreSQL database, such as a self-managed PostgreSQL database or an RDS for PostgreSQL instance, to a SelectDB instance for large-scale data analytics. This topic describes the procedure using a source RDS for PostgreSQL instance as an example.
Prerequisites
You have created a destination SelectDB instance. The storage space of the destination instance must be larger than the storage space used by the source RDS for PostgreSQL instance. For more information, see Create an instance.
The
wal_levelparameter of the source RDS for PostgreSQL instance is set tological. For more information, see Set instance parameters.
Precautions
Type | Description |
Source database limits |
|
Other limits |
|
Special cases |
|
Billing
Synchronization type | Pricing |
Schema synchronization and full data synchronization | Free of charge. |
Incremental data synchronization | Charged. For more information, see Billing overview. |
Supported SQL operations for synchronization
Operation type | SQL statements |
DML | INSERT, UPDATE, DELETE |
DDL | ADD COLUMN, DROP COLUMN |
Permissions required for database accounts
Database | Required permissions | Creation and authorization method |
Source RDS for PostgreSQL instance | A privileged account that is the Owner of the database to be synchronized (authorized account). | |
Destination SelectDB instance | Cluster access permissions (Usage_priv) and read and write permissions for the database (Select_priv, Load_priv, Alter_priv, Create_priv, Drop_priv). | Cluster Permission Management and Basic Permission Management. |
Procedure
Navigate to the list of synchronization tasks in the destination region. You can use one of the following methods:
Go to the page from the DTS console
Log on to the Data Transmission Service (DTS) console.
In the left navigation pane, click Data Synchronization.
In the upper-left corner of the page, select the region where the synchronization instance resides.
Go to the page from the DMS console
NoteThe actual operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode console and Customize the layout and style of the DMS UI.
Log on to Data Management (DMS).
On the top menu bar, choose .
To the right of Data Synchronization Tasks, select the region where the synchronization instance is located.
Click Create Task to go to the task configuration page.
Configure the source and destination databases.
Category
Configuration
Description
None
Task Name
DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not have to be unique.
Source Database
Select Existing Connection
If you want to use a database instance that is added to the system (newly created or saved), select the database instance from the drop-down list. The database information is automatically configured.
NoteIn the DMS console, this configuration item is named Select a DMS database instance..
If you have not added the database instance to the system, or you do not need to use an instance that is already added, you must manually configure the following database information.
Database Type
Select PostgreSQL.
Access Method
Select Alibaba Cloud Instance.
Instance Region
Select the region where the source RDS for PostgreSQL instance resides.
Replicate Data Across Alibaba Cloud Accounts
In this example, a database instance that belongs to the current Alibaba Cloud account is used. Select No.
Instance ID
Select the ID of the source RDS for PostgreSQL instance.
Database Name
Enter the name of the database that contains the objects to be synchronized in the source RDS for PostgreSQL instance.
Database Account
Enter the database account of the source RDS for PostgreSQL instance. For information about the required permissions, see Permissions required for database accounts.
Database Password
Enter the password that corresponds to the database account.
Destination Database
Select Existing Connection
If you want to use a database instance that is added to the system (newly created or saved), select the database instance from the drop-down list. The database information is automatically configured.
NoteIn the DMS console, this configuration item is named Select a DMS database instance..
If you have not added the database instance to the system, or you do not need to use an instance that is already added, you must manually configure the following database information.
Database Type
Select SelectDB.
Access Method
Select Alibaba Cloud Instance.
Instance Region
Select the region where the destination SelectDB instance resides.
Replicate Data Across Alibaba Cloud Accounts
In this example, a database instance that belongs to the current Alibaba Cloud account is used. Select No.
Instance ID
Select the ID of the destination SelectDB instance.
Database Account
Enter the database account of the destination SelectDB instance. For information about the required permissions, see Permissions required for database accounts.
Database Password
Enter the password that corresponds to the database account.
After you have completed the configuration, click Test Connectivity and Proceed at the bottom of the page.
NoteEnsure that the IP address blocks of DTS servers are added to the security settings of the source and destination databases to allow access from DTS servers. This can be done automatically or manually. For more information, see Add the IP address blocks of DTS servers to a whitelist.
If the source or destination database is a self-managed database (where the Access Method is not Alibaba Cloud Instance), you must also click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.
Configure task objects.
On the Configure Objects page, you can configure the objects to be synchronized.
Configuration
Description
Synchronization Types
Incremental Data Synchronization is selected by default. You must also select Schema Synchronization and Full Data Synchronization. After the precheck is complete, DTS initializes the full data of the objects to be synchronized from the source instance to the destination cluster, which serves as the baseline data for subsequent incremental synchronization.
ImportantWhen data is synchronized from a PostgreSQL database to a SelectDB instance, data types are converted. If you do not select the Schema Synchronization check box, you must create Unique or Duplicate model tables with the appropriate schemas in the destination SelectDB instance in advance. For more information, see Data type mapping, , and Data models.
Processing Mode of Conflicting Tables
Precheck and Report Errors: Checks whether a table with the same name exists in the destination database. If a table with the same name does not exist, the check passes. If a table with the same name exists, the precheck fails and the data synchronization task does not start.
NoteIf you cannot delete or rename the table with the same name in the destination database, you can map it to a different table name in the destination database. For more information, see Map schemas, tables, and columns.
Ignore Errors and Proceed: Skips the check for tables with the same name in the destination database.
WarningIf you select Ignore Errors and Proceed, data inconsistency may occur, which can pose risks to your business. For example:
If the table schemas are the same and a record in the destination database has the same primary key or unique key value as a record in the source database, the record from the source database overwrites the record in the destination database.
If the table schemas are different, the data may fail to be initialized, only some columns of data can be synchronized, or the synchronization may fail. Proceed with caution.
Capitalization of Object Names in Destination Instance
You can configure the case sensitivity policy for database, table, and column object names that are synchronized to the destination instance. By default, the DTS default policy is selected. You can also choose to use the default policies of the source and destination databases. For more information, see Case sensitivity policy for destination object names.
Source Objects
In the Source Objects box, click an object to sync, and then click
to move it to the Selected Objects box.NoteYou can select objects at the schema or table level.
Selected Objects
To set the name of a synchronization object in the destination instance or specify which object receives the data, right-click the synchronization object in the Selected Objects box and modify it. For more information, see Map table and column names.
To remove a synchronization object, click it in the Selected Objects box and then click
to move it to the Source Objects box.If you select the Schema Synchronization check box for Synchronization Types and select objects at the table level, you can set the number of buckets (the
bucket_countparameter). To do this, right-click the table in the Selected Objects box. In the Parameter Settings section, set Enable Parameter Settings to Yes, specify the Value, and then click OK.
NoteIf you use the object name mapping feature, the synchronization of other objects that depend on the mapped object may fail.
To set a WHERE clause to filter data, right-click the table to synchronize in the Selected Objects box. In the dialog box that appears, set the filter condition. For more information, see Set filter conditions.
To select SQL operations for incremental synchronization, right-click the synchronization object in the Selected Objects box and select the desired operations from the dialog box that appears.
Click Next: Advanced Settings.
Configuration
Description
Dedicated Cluster for Task Scheduling
By default, DTS schedules tasks on a shared cluster, and you do not need to select a cluster. For more stable performance, you can purchase a dedicated cluster to run DTS synchronization tasks. For more information, see What is a DTS dedicated cluster?.
Retry Time for Failed Connections
After a synchronization task starts, if the connection to the source or destination database fails, DTS reports an error and immediately begins to retry the connection. The default retry duration is 720 minutes. You can also specify a custom retry duration from 10 to 1,440 minutes. We recommend that you set the duration to 30 minutes or more. If DTS successfully reconnects to the database within the specified duration, the synchronization task automatically resumes. Otherwise, the task fails.
NoteIf you have multiple DTS instances (for example, Instance A and Instance B) that share the same source or destination, and you set the network retry time to 30 minutes for Instance A and 60 minutes for Instance B, the shorter duration of 30 minutes is used for both.
Because DTS charges for task runtime during the connection retry period, we recommend that you customize the retry duration based on your business needs or release the DTS instance as soon as possible after the source and destination database instances are released.
Retry Time for Other Issues
After the synchronization task starts, if other non-connectivity issues occur with the source or destination database (such as DDL or DML execution exceptions), DTS reports an error and immediately starts continuous retry operations. The default retry duration is 10 minutes. You can also customize the retry duration within the range of 1 to 1,440 minutes. We recommend that you set it to 10 minutes or more. If the relevant operations are successful within the set retry duration, the synchronization task automatically resumes. Otherwise, the task fails.
ImportantThe value for Retry Time for Other Issues must be less than that for Retry Time for Failed Connections.
Enable Throttling for Full Data Synchronization
During the full synchronization phase, DTS uses read and write resources from the source and destination databases, which can increase the database load. To reduce the load on the destination database, you can set a rate limit for the full synchronization task by configuring the Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) parameters.
NoteThis configuration item is available only when Synchronization Types is set to Full Data Synchronization.
You can also adjust the full synchronization rate after the synchronization instance is running.
Enable Throttling for Incremental Data Synchronization
You can also set a rate limit for the incremental synchronization task. To relieve pressure on the destination database, set the RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Environment Tag
You can select an environment tag to identify the instance. This setting is optional.
Configure ETL
Choose whether to enable the extract, transform, and load (ETL) feature. For more information, see What is ETL? Valid values:
Yes: Enables the ETL feature. Enter data processing statements in the code editor. For more information, see Configure ETL in a data migration or data synchronization task.
No: Disables the ETL feature.
Monitoring and Alerting
Specify whether to configure alerts. If the synchronization fails or the latency exceeds the specified threshold, a notification is sent to an alert contact.
No: No alerts are configured.
Yes: Sets an alert. You also need to specify the alert threshold and alert notifications. For more information, see Configure monitoring and alerting during task configuration.
Optional: After you complete the preceding configurations, click Next: Configure Database and Table Fields to set the Primary Key Column, Distribution Key, and Engine for the destination tables.
NoteThis step is available only if you select the Schema Synchronization check box for Synchronization Types. You can set Definition Status to All to modify the settings.
You can select multiple columns to form a composite primary key for Primary Key Column. You must select one or more columns from the Primary Key Column as the Distribution Key.
For a table that has no primary key or UNIQUE constraint, you must set Engine to duplicate. Otherwise, the synchronization task may fail or data may be lost.
Save the task and run a precheck.
To view the API parameters for configuring this instance, hover over the Next: Save Task Settings and Precheck button and click Preview OpenAPI parameters in the bubble.
If you have finished viewing the API parameters, click Next: Save Task Settings and Precheck at the bottom of the page.
NoteBefore the synchronization job starts, DTS runs a precheck. The job can start only after all precheck items are passed.
If the precheck fails, click View Details for the failed item. Fix the issue as prompted, and then run the precheck again.
If the precheck generates a warning:
If a check item fails and cannot be ignored, click View Details next to the item. Follow the instructions to fix the issue, and then run the precheck again.
For check items that can be ignored, you can click Confirm Alert Details, Ignore, OK, and Precheck Again in sequence to skip the warning and rerun the precheck. If you choose to shield the warning item, it may cause issues such as data inconsistency and pose risks to your business.
Purchase the instance.
When the Success Rate is 100%, click Next: Purchase Instance.
On the Purchase page, select the billing method and link specification for the data synchronization instance. The following table describes these options in detail.
Category
Parameter
Description
New Instance Class
Billing Method
Subscription: You pay when you create the instance. This is suitable for long-term needs and is more cost-effective than pay-as-you-go. The longer the subscription duration, the higher the discount.
Pay-as-you-go: You are charged on an hourly basis. This is suitable for short-term needs. You can release the instance immediately after use to save costs.
Resource Group Configuration
The resource group to which the instance belongs. The default is default resource group. For more information, see What is Resource Management?.
Link Specification
DTS provides synchronization specifications with different performance levels. The synchronization link specification affects the synchronization rate. You can choose a specification based on your business scenario. For more information, see Data synchronization link specifications.
Subscription Duration
In subscription mode, select the duration and quantity for the subscription instance. You can choose a monthly subscription from 1 to 9 months, or a yearly subscription of 1, 2, 3, or 5 years.
NoteThis option appears only when the billing method is Subscription.
After you complete the configuration, read and check the Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start. In the OK dialog box, click OK.
You can view the task progress on the Data Synchronization page.
Data type mapping
Category | PostgreSQL data type | SelectDB data type |
NUMERIC | SMALLINT | SMALLINT |
INTEGER | INT | |
BIGINT | BIGINT | |
DECIMAL | DECIMAL | |
NUMERIC | DECIMAL | |
REAL | DOUBLE | |
DOUBLE | DOUBLE | |
SMALLSERIAL | SMALLINT | |
SERIAL | INT | |
BIGSERIAL | BIGINT | |
MONETARY | MONEY | STRING |
CHARACTER |
| VARCHAR Important To prevent data loss, data of the CHAR(n) and VARCHAR(n) types is converted to VARCHAR(4*n) when synchronized to a SelectDB instance.
|
TEXT | STRING | |
BINARY | BYTEA | STRING |
DATE AND TIME | TIMESTAMP [(P)] [WITHOUT TIME ZONE] | DATETIMEV2 |
TIMESTAMP [(P)] WITH TIME ZONE | DATETIMEV2 | |
DATE | DATEV2 | |
TIME [(P)] [WITHOUT TIME ZONE] | VARCHAR(50) | |
TIME [(P)] WITH TIME ZONE | VARCHAR(50) | |
INTERVAL [FIELDS] [(P)] | STRING | |
BOOLEAN | BOOLEAN | BOOLEAN |
GEOMETRIC |
| STRING |
NETWORK ADDRESS |
| STRING |
TEXT SEARCH | TSVECTOR | STRING |
XML | XML | STRING |
JSON | JSON | JSON |
Additional column information
The following table describes the additional columns that DTS automatically adds or that you must manually add to the destination table that uses the Duplicate model.
Name | Data type | Default value | Description |
_is_deleted | Int | 0 | Indicates whether the data is deleted.
|
_version | Bigint | 0 |
|
_record_id | Bigint | 0 |
|
