How to synchronize data from a self-managed SQL Server to PolarDB for MySQL - Data Transmission Service

Prerequisites

Before you begin, ensure that you have:

A self-managed SQL Server database running a version supported by DTS. For supported versions, see Overview of data synchronization scenarios
A PolarDB for MySQL cluster already created. See Purchase an Enterprise Edition cluster or Purchase a subscription cluster
Enough available storage on the PolarDB for MySQL cluster to hold all data from the source SQL Server database

If the source database meets any of the following conditions, split the task into multiple subtasks:

Contains more than 10 databases
Backs up logs more frequently than once per hour
Executes more than 100 DDL statements per hour per database
Writes logs at more than 20 MB/s per database
Requires change data capture (CDC) to be enabled for more than 1,000 tables

Choose an incremental synchronization mode

Before configuring the task, decide which incremental synchronization mode fits your source database. This choice affects both the source database preparation steps and the limits that apply.

Mode	Supported table types	Impact on source database	Best for
Hybrid log-based parsing	All table types, including heap tables, tables without primary keys, compressed tables, and tables with computed columns	Creates trigger `dts_cdc_sync_ddl`, heartbeat table `dts_sync_progress`, and DDL history table `dts_cdc_ddl_history`; enables CDC on the source database and specific tables	Most SQL Server deployments where the source can tolerate DTS writing artifacts
Incremental synchronization based on source database logs	Tables with clustered indexes and primary key columns only; no heap tables, tables without primary keys, compressed tables, or tables with computed columns	Creates trigger `dts_cdc_sync_ddl`, heartbeat table `dts_sync_progress`, and DDL history table `dts_cdc_ddl_history` in the source database	Sources where you cannot modify the database
Polling CDC instances	Amazon RDS for SQL Server, Microsoft Azure SQL Database, Microsoft Azure SQL Managed Instance, Microsoft Azure SQL Server on Virtual Machine, and Google Cloud SQL for SQL Server	DTS enables database-level and table-level CDC; causes brief table locks (a few seconds)	Third-party cloud SQL Server databases

For ApsaraDB RDS for SQL Server instances running the Web edition, you must use Incremental synchronization based on source database logs.

To check whether your tables are heap tables, tables without primary keys, compressed tables, or tables with computed columns, see the FAQ.

Billing

Synchronization type	Cost
Schema synchronization and full data synchronization	Free
Incremental data synchronization	Charged. See Billing overview.

Supported synchronization topologies

One-way one-to-one synchronization
One-way one-to-many synchronization
One-way many-to-one synchronization

For all supported topologies, see Synchronization topologies.

SQL operations that can be synchronized

Operation type	Statements
DML	`INSERT`, `UPDATE`, `DELETE`
DDL	`CREATE TABLE`, `ADD COLUMN`, `DROP COLUMN`, `DROP TABLE`, `CREATE INDEX`, `DROP INDEX`

The following DDL operations are not synchronized:

CREATE TABLE for partitioned tables or tables containing functions
DDL operations involving user-defined types
Online DDL operations
Transactional DDL statements

Limitations

Source database limitations

Tables must have a PRIMARY KEY or UNIQUE constraint, with all fields unique. Without this, the destination database may contain duplicate records.
If you select tables as objects and rename them in the destination database: maximum 5,000 tables per task. For more tables, configure multiple tasks or synchronize at the database level.
Maximum 10 databases per synchronization task. For more, configure multiple tasks.
DTS uses the fn_log function to read source logs. Do not clear source database logs before the task completes — doing so may cause the task to fail.
Data logging requirements:
- The backup mode must be set to Full, and a full physical backup must be performed.
- For incremental-only synchronization: retain logs for at least 24 hours.
- For full plus incremental synchronization: retain logs for at least 7 days. After full synchronization completes, you can reduce the retention to 24 hours.
If the source is a read-only instance, DDL operations cannot be synchronized.
If the source database is an Azure SQL Database, a data synchronization task can synchronize data from only one database.
For ApsaraDB RDS for SQL Server instances: disable Transparent Data Encryption (TDE) before starting the task. See Disable TDE.
During schema synchronization and full data synchronization, do not execute DDL statements. Doing so causes the task to fail.
For full data synchronization, use the READ_COMMITTED_SNAPSHOT transaction isolation mode on the source database to prevent shared locks from blocking data writes. If you skip this, data inconsistency or instance failures may occur and are not covered by the DTS service level agreement (SLA).

CDC requirements (applies when CDC needs to be enabled on source tables):

The srvname field in sys.sysservers must match the return value of the SERVERPROPERTY function.
For self-managed SQL Server databases: the database owner must be the sa user. For ApsaraDB RDS for SQL Server: the database owner must be the sqlsa user.
Enterprise edition: SQL Server 2008 or later is required.
Standard edition: SQL Server 2016 SP1 or later is required.
If running SQL Server 2017 (Standard or Enterprise edition), update to a newer version.
The number of CDC-enabled tables in a task cannot exceed the maximum supported by DTS. If it does, the precheck fails.
If a single field in a CDC-enabled table exceeds 64 KB, run: exec sp_configure 'max text repl size', -1;

Mode-specific limitations

Hybrid log-based parsing:

Do not add or remove columns within a 10-minute window. For example, the following sequence within 10 minutes causes a task error:
```
ALTER TABLE test_table DROP COLUMN Flag;
ALTER TABLE test_table ADD Remark nvarchar(50) not null default('');
```
You cannot run SELECT INTO or TRUNCATE on CDC-enabled tables. Do not delete the triggers DTS creates.
The CDC component retains incremental data for 3 days by default. To change the retention period, run:
```
exec console.sys.sp_cdc_change_job @job_type = 'cleanup', @retention = <time>;
```
<time> is the retention period in minutes. If daily incremental records exceed 10 million, set <time> to 1,440.
Keep the CDC-enabled table count at or below 1,000 per task to avoid latency or instability.
Set the maximum write rate for CDC-enabled tables to 1,000 records per second.
Enabling CDC causes brief table locks (a few seconds) on the source database.

Incremental synchronization based on source database logs:

Only tables with clustered indexes that include primary key columns are supported.

Polling CDC instances:

The DTS account must have sysadmin role membership to enable database-level CDC. For table-level CDC, a privileged account is required.
- Microsoft Azure SQL Database (vCore model): all databases are supported. Azure SQL Database (DTU model): only databases with an S3 or higher service tier are supported.
- Amazon RDS for SQL Server: privileged accounts can enable CDC at the database level via stored procedures.
- Clustered columnstore indexes are not supported for CDC.
Maximum 1,000 tables per task.
Cannot add or remove columns more than twice per minute.
Do not modify CDC instances on the source database during synchronization — this may cause data loss.
CDC retains incremental data for 3 days by default. Use the same retention command as hybrid mode.
Enabling CDC causes brief table locks (a few seconds) on the source database.

General limitations

Schema synchronization supports databases, schemas, and tables. Views, triggers, and stored procedures are not synchronized in cross-database-engine scenarios.
Unsupported data types: TIMESTAMP, CURSOR, ROWVERSION, SQL_VARIANT, HIERARCHYID, POLYGON, GEOMETRY, and GEOGRAPHY.
If synchronized data contains 4-byte characters (rare characters or emojis), the destination database and tables must use the UTF8mb4 character set. Set the character_set_server parameter on the destination database to UTF8mb4.
DTS does not support complex DDL operations, DDL operations involving user-defined types, online DDL operations, or transactional DDL statements.
Do not perform reindexing during incremental synchronization — this may cause the task to fail and data may be lost. DTS also cannot synchronize DDL operations related to primary keys on CDC-enabled tables.
If DTS fails to write DDL statements to the destination database, the task continues running. View the failed DDL statements in task logs. See View task logs.
Full data synchronization uses concurrent INSERT operations, which causes fragmentation in destination tables. After full synchronization completes, the destination tablespace is larger than the source.
Before you synchronize data, evaluate the impact of data synchronization on the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. During initial full data synchronization, DTS uses the read and write resources of the source and destination databases, which may increase the loads on the database servers.
Do not write data from other sources to the destination database during synchronization. Doing so causes data inconsistency.
You cannot add or remove databases when modifying synchronization objects after the task starts.
If the source instance is an ApsaraDB RDS for SQL Server instance, DTS creates an account named rdsdt_dtsacct. Do not delete this account or change its password while the task runs. For more information, see System accounts.
If a DTS task fails, DTS technical support attempts to restore it within 8 hours. During restoration, the task may be restarted and task parameters (not database parameters) may be modified.

Prepare the source database

Configure log settings on the self-managed SQL Server database before creating the synchronization task. If you are synchronizing from multiple databases, repeat these steps for each database.

Change the recovery model to Full. You can also do this in SQL Server Management Studio (SSMS). See View or Change the Recovery Model of a Database (SQL Server).
```
use master;
GO
ALTER DATABASE <database_name> SET RECOVERY FULL WITH ROLLBACK IMMEDIATE;
GO
```
Replace <database_name> with the name of your source database. Example:
```
use master;
GO
ALTER DATABASE mytestdata SET RECOVERY FULL WITH ROLLBACK IMMEDIATE;
GO
```
Create a logical backup of the source database. Skip this step if a logical backup already exists.
- <database_name>: source database name
- <physical_backup_device_name>: path and file name for the backup file
```
BACKUP DATABASE <database_name> TO DISK='<physical_backup_device_name>';
GO
```
Example:
```
BACKUP DATABASE mytestdata TO DISK='D:\backup\dbdata.bak';
GO
```

Create a log backup for the source database.

BACKUP LOG <database_name> to DISK='<physical_backup_device_name>' WITH init;
GO

Example:

BACKUP LOG mytestdata TO DISK='D:\backup\dblog.bak' WITH init;
GO

Configure the synchronization task

Go to the Data Synchronization Tasks page.
1. Log on to the Data Management (DMS) console.
2. In the top navigation bar, click Data + AI.
3. In the left-side navigation pane, choose DTS (DTS) > Data Synchronization.
The navigation path may vary based on the DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console. You can also go directly to the Data Synchronization Tasks page.
Select the region where the synchronization instance resides. In the new DTS console, select the region in the top navigation bar.

Click Create Task. Configure the source and destination databases using the following parameters.

Section	Parameter	Description
N/A	Task Name	A descriptive name to identify the task. Does not need to be unique. DTS generates a default name.
Source Database	Select a DMS database instance	Select an existing DMS database instance to auto-populate the parameters, or leave blank to configure manually.
	Database Type	Select SQL Server.
	Access Method	Select Express Connect, VPN Gateway, or Smart Access Gateway.
	Instance Region	The region where the self-managed SQL Server database resides. If it's a self-managed database, deploy the required network environment first. See Preparation overview.
	Replicate Data Across Alibaba Cloud Accounts	Select No for this scenario.
	Connected VPC	The ID of the virtual private cloud (VPC) where the source SQL Server database is deployed.
	IP address	The server IP address of the source database.
	Port Number	The service port. Default: 1433.
	Database Account	The account used to connect to the source database. Must have `sysadmin` role permissions. See CREATE USER and GRANT (Transact-SQL).
	Database Password	The password for the database account.
Destination Database	Select a DMS database instance	Select an existing DMS database instance to auto-populate the parameters, or leave blank to configure manually.
	Database Type	Select PolarDB for MySQL.
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	The region where the PolarDB for MySQL cluster resides.
	PolarDB Cluster ID	The ID of the destination PolarDB for MySQL cluster.
	Database Account	The account used to connect to the destination cluster. Must have read and write permissions on the destination database.
	Database Password	The password for the database account.

Click Test Connectivity and Proceed. DTS automatically adds its server CIDR blocks to the whitelist of Alibaba Cloud database instances and to the security group rules of Elastic Compute Service (ECS) instances. For self-managed databases in data centers or on third-party cloud platforms, manually add the DTS server CIDR blocks to the database whitelist. See CIDR blocks of DTS servers.

Warning
Adding DTS CIDR blocks to whitelists or security group rules introduces security exposure. Before proceeding, take preventive measures such as using strong credentials, limiting exposed ports, authenticating API calls, and regularly auditing whitelist rules. For additional security, connect the database to DTS through Express Connect, VPN Gateway, or Smart Access Gateway.

Configure the objects to synchronize and advanced settings.

Parameter	Description
Synchronization Types	Select Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. Full data synchronization runs first to copy historical data, providing the baseline for incremental synchronization.
Processing Mode of Conflicting Tables	Precheck and Report Errors (default): the precheck fails if the destination contains tables with the same names as source tables. Use object name mapping if you need to rename destination tables. Ignore Errors and Proceed: skips the name conflict check. During full synchronization, records with conflicting primary or unique keys are skipped. During incremental synchronization, they overwrite destination records. If schemas differ, initialization may fail or partial data may be lost.
SQL Server Incremental Synchronization Mode	Select the mode you chose in Choose an incremental synchronization mode. The UI labels are: Log-based Parsing for Non-heap Tables and CDC-based Incremental Synchronization for Heap Tables (Hybrid Log-based Parsing), Incremental Synchronization Based on Logs of Source Database (Heap tables are not supported), and Polling and querying CDC instances for incremental synchronization.
Source Objects	Select the objects to synchronize and click the arrow icon to move them to Selected Objects. In this heterogeneous database scenario, only tables can be synchronized.
Selected Objects	To rename a single object in the destination, right-click it and use the rename option. See Map the name of a single object. To rename multiple objects at once, click Batch Edit. See Map multiple object names at a time. To select which SQL operations to synchronize for a specific table, right-click it and configure the operations. To filter rows by condition, right-click the object and specify WHERE conditions. See Specify filter conditions.

Click Next: Advanced Settings and configure the following parameters.

Parameter	Description
Monitoring and Alerting	Select Yes to receive notifications when the task fails or synchronization latency exceeds a threshold. Configure the alert threshold and notification settings. See Configure monitoring and alerting.
Retry Time for Failed Connections	How long DTS retries after a connection failure. Range: 10–1,440 minutes. Default: 720. Set to at least 30 minutes. If DTS reconnects within this period, the task resumes. If multiple tasks share the same source or destination database, the shortest retry time takes effect. When DTS retries a connection, you are charged for the DTS instance. We recommend that you specify the retry time range based on your business requirements, and release the DTS instance promptly after the source and destination instances are released.
Retry Time for Other Issues	How long DTS retries after DDL or DML failures. Range: 1–1,440 minutes. Default: 10. Set to at least 10 minutes. Must be less than Retry Time for Failed Connections.
Configure ETL	Select Yes to use extract, transform, and load (ETL) to transform data during synchronization. See What is ETL? and Configure ETL.

Click Next: Save Task Settings and Precheck. To preview the API parameters for this configuration, hover over the button and click Preview OpenAPI parameters before clicking through.

DTS runs a precheck before the task starts. If any check fails, click View Details to see the cause and troubleshoot. Then click Precheck Again. If an alert item can be ignored, click Confirm Alert Details, then Ignore, and then Precheck Again. Ignoring an alert may result in data inconsistency.
Wait for the Success Rate to reach 100%, then click Next: Purchase Instance.

Configure the billing method and instance class for the synchronization instance.

Section	Parameter	Description
New Instance Class	Billing Method	Subscription: pay upfront for a fixed term — more cost-effective for long-term use. Pay-as-you-go: billed hourly — suitable for short-term use. Release the instance when you no longer need it to stop billing.
	Resource Group Settings	The resource group for the synchronization instance. Default: default resource group. See What is Resource Management?
	Instance Class	The synchronization throughput class. See Instance classes of data synchronization instances.
	Subscription Duration	Available only for the Subscription billing method. Choose a term of one to nine months, or one, two, three, or five years.

Read and accept Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start. In the confirmation dialog, click OK.

The task appears in the task list. You can monitor synchronization progress there.

What's next

Synchronization topologies — set up one-to-many or many-to-one topologies
Map object names — rename objects in the destination database
Configure monitoring and alerting — set up latency and failure alerts
View task logs — inspect DDL failures and task activity