Synchronize data from SQL Server on ECS to AnalyticDB for PostgreSQL - AnalyticDB

Data Transmission Service (DTS) synchronizes incremental data in real time from a user-created SQL Server database to an AnalyticDB for PostgreSQL instance.

Prerequisites

The user-created SQL Server database must be one of the following versions: 2008, 2008 R2, 2012, 2014, 2016, 2017, or 2019.

Note
If your SQL Server database is deployed in an Always On availability group, you must use the synchronous-commit mode.
The tables to be synchronized in the user-created SQL Server database must have a primary key or a unique non-null index. This ensures data idempotency during synchronization.
The destination AnalyticDB for PostgreSQL instance must have more available storage space than is used by the source SQL Server database.

Usage notes

During initial full data synchronization, DTS consumes read and write resources from the source and destination databases, which increases the database load. If database performance is poor, instance specifications are low, or business traffic is heavy (for example, the source database has many slow SQL queries or tables without primary keys, or the destination database experiences deadlocks), the database load increases and may even cause the service to become unavailable. Before you synchronize data, evaluate the performance of your source and destination instances. We recommend performing data synchronization during off-peak hours, for example, when the CPU utilization of both instances is below 30%.
You can select objects at the table level for synchronization. DTS does not support synchronizing append-optimized (AO) tables. You can modify column mappings. If you use column mapping and do not synchronize the entire table, or if the source and destination table schemas are inconsistent, data in the columns that exist in the source table but not in the destination table will be lost.
To ensure the data synchronization task runs smoothly, do not perform frequent backups on the source database. Retain logs for at least three days to prevent retrieval failures after log truncation.
To ensure that the displayed data synchronization latency is accurate, DTS adds a heartbeat table to the user-created SQL Server database. The table name is in the format <table_name>_dts_mysql_heartbeat.

Billing

Synchronization type	Pricing
Schema synchronization and full data synchronization	Free of charge.
Incremental data synchronization	Charged. For more information, see Billing overview.

Limitations

DTS does not support schema and table structure synchronization for assemblies, service brokers, full-text indexes, full-text catalogs, distributed schemas, distributed functions, CLR stored procedures, CLR scalar-valued functions, CLR table-valued functions, internal tables, system tables, or aggregate functions.
DTS does not synchronize data of the following types: TIMESTAMP, CURSOR, ROWVERSION, HIERARCHYID, SQL_VARIANT, SPATIAL GEOMETRY, SPATIAL GEOGRAPHY, and TABLE.
DTS does not synchronize tables that contain computed columns.

Supported SQL operations

DML operations: INSERT, UPDATE, DELETE
DDL operation: ADD COLUMN

Note
DTS does not migrate transactional DDL operations.

Account permissions

Database	Required permissions	Account creation and authorization
User-created SQL Server instance	sysadmin	CREATE USER Manage user permissions
AnalyticDB for PostgreSQL instance	LOGIN permission. SELECT, CREATE, INSERT, UPDATE, and DELETE permissions on the destination tables. CONNECT and CREATE permissions on the destination database. CREATE permission on the destination schema. COPY permission (for memory-based batch copy operations). Note You can also use the initial account of the AnalyticDB for PostgreSQL instance.	Create and manage a user Manage user permissions

Preparations

Before you configure the data synchronization task, configure log settings and create a clustered index on the source SQL Server database.

In the user-created SQL Server database, run the following command to change the recovery model of the source database to FULL. You can also modify this setting by using an SSMS client. For more information, see View or Change the Recovery Model of a Database.
```
use master;
GO
ALTER DATABASE <database_name> SET RECOVERY FULL WITH ROLLBACK IMMEDIATE;
GO
```
Parameters:

<database_name>: The name of the source database.

Example:
```
use master;
GO
ALTER DATABASE mytestdata SET RECOVERY FULL WITH ROLLBACK IMMEDIATE;
GO
```
Back up the source database by running the following command. You can skip this step if you have already performed a logical backup.
```
BACKUP DATABASE <database_name> TO DISK='<physical_backup_device_name>';
GO
```
Parameters:
- <database_name>: The name of the source database.
- <physical_backup_device_name>: The path and filename for the backup file.
Example:
```
BACKUP DATABASE mytestdata TO DISK='D:\backup\dbdata.bak';
GO
```
Back up the logs of the source database by running the following command.
```
BACKUP LOG <database_name> to DISK='<physical_backup_device_name>' WITH init;
GO
```
Parameters:
- <database_name>: The name of the source database.
- <physical_backup_device_name>: The path and filename for the backup file.
Example:
```
BACKUP LOG mytestdata TO DISK='D:\backup\dblog.bak' WITH init;
GO
```
Create a clustered index for the tables to be synchronized. For more information, see Create Clustered Indexes.

Procedure

Purchase a data synchronization instance. For more information, see Purchase procedure.

Note
On the buy page, set the SQL Server parameter to SQL Server, the Destination Instance parameter to AnalyticDB for PostgreSQL, and the One-way Synchronization parameter to One-way Synchronization.
Log on to the DTS console.
In the left-side navigation pane, click Data Synchronization.
At the top of the Synchronization Tasks page, select the region where the destination instance is located.
Find your data synchronization task and click Configure Synchronization Channel.

Configure the source and destination instances.

Category	Parameter	Description
N/A	Synchronization Task Name	DTS automatically generates a task name. We recommend that you provide a descriptive name for easy identification. The name does not need to be unique.
Source Instance Details	Instance Type	Select the deployment location of the source database. This topic uses User-Created Database in ECS Instance as an example. Note If you select another instance type for the user-created database, you must complete additional preparations. For more information, see Preparation overview.
	Instance Region	The region of the source instance that you selected when purchasing the data synchronization instance. This setting cannot be changed.
	ECS Instance ID	Select the ID of the ECS instance where the source database is located.
	Database Type	Fixed as SQL Server.
	Port Number	Enter the service port number of the user-created database. The default value is 1433.
	Database Account	Enter the database account for the source database. For permission requirements, see Account permissions.
	Database Password	Enter the password for the database account.
	Connection Mode	Select Non-encrypted or SSL-encrypted as needed. Note Currently, the Encryption setting is supported only in regions in the Chinese mainland and the China (Hong Kong) region.
Destination Instance Details	Instance Type	Select AnalyticDB for PostgreSQL.
	Instance Region	The region of the destination instance that you selected when purchasing the data synchronization instance. This setting cannot be changed.
	Instance ID	Select the ID of the destination AnalyticDB for PostgreSQL instance.
	Database Name	Enter the name of the database where the destination table is located.
	Database Account	Enter the database account for the AnalyticDB for PostgreSQL instance. For permission requirements, see Account permissions.
	Database Password	Enter the password for the database account.

In the lower-right corner of the page, click Set Whitelist and Next.

If the source or destination database is an Alibaba Cloud database instance, such as an ApsaraDB RDS for MySQL or ApsaraDB for MongoDB instance, DTS automatically adds the CIDR blocks of DTS servers to the IP address whitelist of the instance. If the source or destination database is a self-managed database hosted on an Elastic Compute Service (ECS) instance, DTS automatically adds the CIDR blocks of DTS servers to the security group rules of the ECS instance, and you must make sure that the ECS instance can access the database. If the self-managed database is hosted on multiple ECS instances, you must manually add the CIDR blocks of DTS servers to the security group rules of each ECS instance. If the source or destination database is a self-managed database that is deployed in a data center or provided by a third-party cloud service provider, you must manually add the CIDR blocks of DTS servers to the IP address whitelist of the database to allow DTS to access the database. For more information, see Whitelist DTS server IP addresses.

Warning
Adding the public IP address blocks of the DTS service, either automatically or manually, may pose security risks. Using this product, you acknowledge that you understand and accept the potential security risks and that you must implement basic security measures. These measures include, but are not limited to, strengthening password security, limiting the ports open to each CIDR block, using authentication for internal API calls, and regularly checking and restricting unnecessary CIDR blocks. Alternatively, you can connect through a private network using a leased line, VPN Gateway, or Smart Access Gateway.

Configure the synchronization policy and select the objects to synchronize.

For the synchronization operation types, select Insert, Update, and Delete. From the source objects, select the database to synchronize (such as dtstest) and move it to the selected objects pane. For name mapping, select Do not batch rename databases and tables, set the retry duration for connection failures of the source or destination database to 720 minutes, and select No for adding quotation marks to target objects.

Setting	Description
Initialize Synchronization	Schema Initialization, Full Data Initialization, and Incremental Data Initialization are selected by default. After a successful precheck, DTS performs a one-time synchronization of schemas and existing data of the selected objects from the source instance to the destination database. This creates the baseline for subsequent incremental data synchronization.
Processing Mode In Existed Target Table	Precheck and Report Errors: DTS checks whether the destination database contains tables with the same names as the source tables. If no tables with the same names exist, the precheck is passed. If tables with the same names exist, DTS reports an error and does not start the task. Note If a table with a conflicting name in the destination database cannot be deleted or renamed, you can change its name in the destination database. For more information, see Set object names in the destination instance. Ignore Errors and Proceed: DTS skips the check for tables with the same name in the destination database. Warning If you select Ignore Errors and Proceed, data inconsistency may occur and expose your business to risks. For example: If the table schemas are identical, and the destination database encounters a record with the same primary key value as a record in the source database, the record in the destination database is retained. The record from the source database is not synchronized. If the table schemas are different, data initialization may fail, only partial data may be synchronized, or the synchronization may fail.
Merge Multi Tables	Yes: In OLTP systems, sharding is often used to improve performance. However, AnalyticDB for PostgreSQL can efficiently query massive single tables. This feature lets you merge multiple sharded source tables with the same schema (sharded tables) into a single table in the destination AnalyticDB for PostgreSQL instance. Note After selecting multiple source tables, you must use the object name mapping feature to map them to the same table name in the destination AnalyticDB for PostgreSQL instance. For more information, see Set object names in the destination instance. You must add a `__dts_data_source` column of the TEXT type to the destination table to store the data source information. DTS writes the column value in the format of `<DTS data synchronization instance ID>:<source database name>.<source schema name>.<source table name>` to identify the source of the table. Example: `dts********:dtstestdata.testschema.customer1`. The table merging feature is configured at the task level, not the table level. If you need to merge some tables but not others, you must create two separate data synchronization tasks. No: This is the default option.
Select the operation types	Select the operation types to synchronize based on your business requirements. By default, all operation types are selected.
Select the objects to be synchronized	In the Source Objects box, click the objects that you want to synchronize, and then click the icon to move them to the Selected Objects box. In this heterogeneous synchronization scenario, you can only select objects at the table level, and other objects such as views, triggers, and stored procedures are not synchronized to the destination database. Note By default, the names of synchronized objects remain unchanged. If you want the synchronized objects to have different names in the destination instance, use the object name mapping feature. For more information, see Set object names in the destination instance. If you set Merge Tables to Yes, you must use the object name mapping feature to map the selected source tables to a single table name in the destination AnalyticDB for PostgreSQL instance.
Add quotation marks to the target object	Select whether to add quotation marks to the names of destination objects. If you select Yes and one of the following conditions is met, DTS adds single or double quotation marks to the destination object during the schema synchronization and incremental data synchronization phases: The business environment of the source database is case-sensitive and uses mixed case. A source table name does not start with a letter and contains characters other than letters, numbers, or special characters. Note The supported special characters are underscores (_), number signs (#), and dollar signs ($). The schema, table, or column name to be synchronized is a keyword, reserved word, or invalid character in the destination database. Note If you choose to add quotation marks, you must use the quoted object name when you query data after the synchronization is complete.
Rename databases and tables	Change the names of synchronized objects in the destination instance. For more information, see Map databases, tables, and columns.
Retry time for failed connections	If DTS cannot connect to the source or destination instance, it retries for 720 minutes (12 hours) by default. You can also specify a custom retry duration. If DTS reconnects to the source or destination instance within the specified duration, the synchronization task automatically resumes. Otherwise, the task fails. Note You are billed for task run time during connection retries. Customize the retry duration based on your business needs, or release the DTS instance as soon as the source and destination instances are released.

Configure the table type, primary key column, and distribution key for the tables in the destination AnalyticDB for PostgreSQL instance.

For Type, select dimension table or partitioned table. For each table, select a Primary Key Column and set a Distribution Key. After completing the configuration, the Definition Status column displays Defined. Then, click Precheck and Start.

Note
For more information about primary key columns and distribution keys, see Table constraint definitions and Table distribution key definitions.
After completing the preceding configurations, click Precheck and Start in the lower-right corner of the page.
Note
- A precheck runs before the synchronization task starts, and you can only start the task after it passes.
- If the precheck fails, click the icon next to the failed item to view the details.
  
  You can fix the issues based on the cause and run the precheck again.
  
  If you do not need to fix the items that triggered warnings, you can click Ignore or Ignore Warnings and Rerun Precheck to skip the warnings and run the precheck again.
After the Precheck dialog box shows that Precheck Passed, close the Precheck dialog box. The synchronization task starts.
Wait for the task to complete initialization and enter the Synchronizing state.

You can view the status of the data synchronization task on the Data Synchronization page.