Synchronize data from RDS for SQL Server to DataHub - Data Transmission Service

Use Data Transmission Service (DTS) to stream incremental changes from an ApsaraDB RDS for SQL Server instance into a DataHub project. DTS captures changes using change data capture (CDC) and delivers them as a continuous stream without requiring full data exports.

Prerequisites

Before you begin, make sure you have:

An ApsaraDB RDS for SQL Server instance. For supported SQL Server versions, see Overview of data synchronization scenarios.
DataHub activated and a project created to receive the synchronized data. See Get started with DataHub and Manage projects.
Owner permissions on the objects to be synchronized. See Create a privileged account or a standard account and Modify the permissions of an account.

Consider splitting into multiple tasks if any of the following apply to the source instance:

More than 10 databases
Transaction log backup interval less than 1 hour per database
More than 100 Data Definition Language (DDL) operations per hour per database
Transaction log write rate of 20 MB/s per database
CDC needs to be enabled for more than 1,000 tables

Billing

Synchronization type	Fee
Schema synchronization and full data synchronization	Free
Incremental data synchronization	Charged. See Billing overview.

Supported synchronization topologies

One-way one-to-one synchronization
One-way one-to-many synchronization
One-way many-to-one synchronization

For details, see Synchronization topologies.

Supported SQL operations

DTS synchronizes the following DML operations: INSERT, UPDATE, and DELETE.

If an UPDATE operation modifies only large fields, DTS does not synchronize it.

Full data synchronization is not supported. Only incremental data synchronization and schema synchronization are available for this source-destination pair.

Limitations

Source database requirements

Tables must have a primary key or unique key, with all fields unique. Tables without these constraints may produce duplicate records in the destination.
When selecting tables as the synchronization objects and applying object renaming, a single task can synchronize up to 5,000 tables. For more than 5,000 tables, configure multiple tasks or synchronize at the database level.
A single task can synchronize data from up to 10 databases. For more than 10 databases, configure multiple tasks.
DTS uses the fn_log function to read transaction logs. Do not clear transaction logs while a task is running — doing so may cause the task to fail.
Disable Transparent Data Encryption (TDE) on the source RDS SQL Server instance to ensure task stability. See Disable TDE.
Do not perform DDL operations that change database or table schemas during schema synchronization — the task will fail.
In hybrid log-based parsing mode, do not execute multiple ADD COLUMN or DROP COLUMN operations within 10 minutes. For example, the following sequence within 10 minutes causes a task error:
```
ALTER TABLE test_table DROP COLUMN Flag;
ALTER TABLE test_table ADD Remark nvarchar(50) NOT NULL DEFAULT('');
```
Enable READ_COMMITTED_SNAPSHOT on the source database during full data synchronization to prevent shared lock contention. Without this setting, data inconsistency or instance failures may occur and are not covered by the DTS service level agreement (SLA).
If the source is an Azure SQL Database, a single task can synchronize data from only one database.

Transaction log requirements

Scenario	Minimum log retention
Incremental synchronization only	More than 24 hours
Full + incremental synchronization	At least 7 days

Additional requirements:

The data logging feature must be enabled with the backup mode set to Full, and a full physical backup must have been performed.
After full synchronization completes, the log retention period can be reduced to more than 24 hours.

Failure to meet log retention requirements may cause DTS to fail to obtain transaction logs, resulting in task failure, data inconsistency, or data loss. Outcomes in such circumstances are not covered by the DTS SLA.

CDC requirements

If CDC needs to be enabled for the source tables, the following conditions must be met — otherwise the precheck fails:

The value of the srvname field in the sys.sysservers view must match the return value of the SERVERPROPERTY function.
For self-managed SQL Server: the database owner must be the sa user.
For ApsaraDB RDS for SQL Server: the database owner must be the sqlsa user.
Enterprise edition: SQL Server 2008 or later is required.
Standard edition: SQL Server 2016 SP1 or later is required.
If running SQL Server 2017 (Standard or Enterprise edition), update to a newer version.
The number of tables with CDC enabled in a single task cannot exceed 1,000. Tasks exceeding this limit fail the precheck.
If a single field value written to a CDC-enabled table exceeds 64 KB, run the following command on the source database:
By default, CDC processes fields up to 64 KB.
```
exec sp_configure 'max text repl size', -1;
```

Destination and other limits

A single string in the destination DataHub project cannot exceed 2 MB.
DTS does not synchronize foreign keys. Cascade and delete operations on the source are not propagated to the destination.
DTS does not synchronize tables that contain computed columns.
Do not write data from other sources to the destination during synchronization — this causes data inconsistency. For example, running online DDL operations via Data Management (DMS) while other sources write to the destination may cause data loss.
If the task involves incremental synchronization, reindexing operations are not supported. Performing a reindex may cause the task to fail and result in data loss.
DTS cannot synchronize DDL operations related to the primary key of a CDC-enabled table.
When modifying the objects to be synchronized, you cannot remove databases.
If multiple DTS instances share the same source SQL Server database, their incremental data collection modules operate independently.
If a DTS task fails, DTS technical support will attempt to restore it within 8 hours. During this period, the task may be restarted and task parameters (not database parameters) may be modified.

Schema synchronization scope

DTS supports initial schema synchronization for: schemas, tables, views, functions, and procedures.

Warning

The source and destination are heterogeneous databases. Data types do not map one-to-one. Evaluate the impact of data type conversion before starting the task. See Data type mappings for initial schema synchronization.

DTS does not synchronize the following object types: assemblies, service brokers, full-text indexes, full-text catalogs, distributed schemas, distributed functions, CLR stored procedures, CLR scalar-valued functions, CLR table-valued functions, internal tables, systems, and aggregate functions.

DTS does not synchronize the following data types: CURSOR, ROWVERSION, SQL_VARIANT, HIERARCHYID, POLYGON, GEOMETRY, GEOGRAPHY, and custom types created with CREATE TYPE.

Objects DTS creates in the source database

In Incremental Synchronization Based on Logs of Source Database mode, DTS creates the following objects in the source database to track synchronization state:

Object	Type	Purpose
`dts_cdc_sync_ddl`	Trigger	Captures DDL changes
`dts_sync_progress`	Table	Tracks synchronization latency (heartbeat)
`dts_cdc_ddl_history`	Table	Stores DDL change history

In hybrid log-based parsing mode, DTS also enables CDC on the source database and specific tables. Set the maximum throughput to 1,000 records per second (RPS) for CDC-enabled tables.

Set up the synchronization task

Step 1: Open the Data Synchronization page

Use one of the following methods:

DTS console

Log on to the DTS console.DTS console
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the synchronization instance will reside.

DMS console

Note

The steps may vary depending on the DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console.

Log on to the DMS console.DMS console
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list to the right of Data Synchronization Tasks, select the region where the synchronization instance will reside.

Step 2: Configure source and destination databases

Click Create Task, then configure the following parameters:

Parameter	Description	Required
Task name	A name for the DTS task. DTS auto-generates a name — specify a descriptive name for easier identification. Unique names are not required.	No
Source database — select existing connection	Select a registered database instance to auto-populate connection parameters. If no registered instance is available, configure the parameters below manually.	No
Database type (source)	Select SQL Server.	Yes
Access method (source)	Select Alibaba Cloud Instance.	Yes
Instance region (source)	The region of the source ApsaraDB RDS for SQL Server instance.	Yes
Replicate data across Alibaba Cloud accounts	Select No unless synchronizing across accounts.	Yes
RDS instance ID	The ID of the source ApsaraDB RDS for SQL Server instance.	Yes
Database account	The database account with owner permissions on the objects to be synchronized.	Yes
Database password	The password for the database account.	Yes
Encryption	Select Non-encrypted if SSL is not enabled on the source. Select SSL-encrypted if SSL is enabled — DTS trusts the server certificate by default.	Yes
Destination database — select existing connection	Select a registered database instance to auto-populate connection parameters. If no registered instance is available, configure the parameters below manually.	No
Database type (destination)	Select DataHub.	Yes
Access method (destination)	Select Alibaba Cloud Instance.	Yes
Instance region (destination)	The region of the destination DataHub project.	Yes
Project	The name of the destination DataHub project.	Yes

After filling in the parameters, click Test Connectivity and Proceed.

Make sure the CIDR blocks of DTS servers are added to the security settings of the source and destination databases. See Add the CIDR blocks of DTS servers. If the source or destination is a self-managed database with an access method other than Alibaba Cloud Instance, click Test Connectivity in the CIDR blocks of DTS servers dialog box.

Step 3: Configure synchronization objects

In the Configure Objects step, set the following parameters:

Parameter	Description	Required
Synchronization types	Incremental Data Synchronization is selected by default. Also select Schema Synchronization.	Yes
Processing mode of conflicting tables	Precheck and Report Errors: fails the precheck if tables with identical names exist in the destination. Ignore Errors and Proceed: skips the name conflict check — use with caution, as this may cause data inconsistency. See the conflict handling behavior details below.	Yes
Additional column rule	When DTS writes data to a DataHub topic, it adds extra columns. If these columns conflict with existing column names in the destination topic, the task fails. Select New Rule or Previous Rule based on your requirements. Check for name conflicts before setting this parameter. See Naming rules for additional columns.	Yes
Case Policy for Destination Object Names	Controls the casing of database, table, and column names in the destination. Default: DTS default policy. See Specify the capitalization of object names in the destination instance.	No
Source Objects	Select objects and click to move them to Selected Objects. Select columns, tables, or databases. If you select tables or columns, DTS does not synchronize other object types such as views, triggers, and stored procedures.	Yes
Selected objects	To rename a single object, right-click it. See Individual database, table, and column name mapping. To rename multiple objects, click Batch Edit. See Batch database, table, and column name mapping. Right-click an object to select specific SQL operations to synchronize, or to set WHERE conditions to filter rows. See Set filter conditions. Renaming an object may break other objects that depend on it.	Yes

Conflict handling behavior (when Ignore Errors and Proceed is selected):

Same schema, matching primary or unique key: during full synchronization, the existing destination record is kept; during incremental synchronization, the destination record is overwritten.
Different schemas: initialization may fail, only some columns may be synchronized, or the task may fail.

Step 4: Configure advanced settings

Click Next: Advanced Settings and configure the following:

Parameter	Description	Required
Dedicated cluster for task scheduling	By default, DTS schedules the task to the shared cluster. Purchase a dedicated cluster to improve task stability. See What is a DTS dedicated cluster.	No
Retry time for failed connections	How long DTS retries when the source or destination is unreachable after the task starts. Valid values: 10–1440 minutes. Default: 720. Set to more than 30 minutes. If the shortest retry time across shared-source tasks applies, shorter retry times take precedence. DTS charges for the instance during retry periods.	No
Retry time for other issues	How long DTS retries when DDL or DML operations fail. Valid values: 1–1440 minutes. Default: 10. Set to more than 10 minutes, and to a value less than Retry time for failed connections.	No
Enable throttling for incremental data synchronization	Limits the synchronization rate to reduce load on the destination. Configure RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s) if enabled.	No
Environment tag	A tag to identify the DTS instance.	No
Configure ETL	Select Yesalert notification settings to enable extract, transform, and load (ETL) processing and enter data processing statements. Select No to skip. See What is ETL? and Configure ETL in a data migration or data synchronization task.	No
Monitoring and alerting	Select Yes to receive notifications when the task fails or synchronization latency exceeds a threshold. Configure alert thresholds and notification settings. See Configure monitoring and alerting when you create a DTS task.	No

Step 5: Run the precheck

Click Next: Save Task Settings and Precheck.

To preview the API parameters for this task configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.

DTS runs a precheck before starting the task. Handle precheck results as follows:

Failure: click View Details next to the failed item, resolve the issue, then rerun the precheck.
Alert — cannot be ignored: click View Details, resolve the issue, then rerun the precheck.
Alert — can be ignored: click Confirm Alert Details. In the View Details dialog box, click Ignore, then click OK. Click Precheck Again to proceed. Ignoring alerts may cause data inconsistency.

Step 6: Purchase and start the instance

Wait until Success Rate reaches 100%, then click Next: Purchase Instance.

On the Buy page, configure the following:

Parameter	Description	Required
Billing method	Subscription: pay upfront; cost-effective for long-term use. Pay-as-you-go: billed hourly; suitable for short-term use. Release the instance when no longer needed to stop charges.	Yes
Resource group settings	The resource group for the synchronization instance. Default: default resource group. See What is Resource Management?	No
Instance class	The synchronization throughput tier. Select based on your data volume and latency requirements. See Instance classes of data synchronization instances.	Yes
Subscription duration	Available for the Subscription billing method. Options: 1–9 months, or 1, 2, 3, or 5 years.	Conditional

Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start. In the dialog box, click OK.

The task appears in the task list. Monitor its progress there.

Special cases

If the source is an ApsaraDB RDS for SQL Server instance, DTS automatically creates an account named rdsdt_dtsacct on the instance for synchronization. Do not delete this account or change its password while the task is running — otherwise, the task may fail. See System accounts.

What's next

Monitor task status and latency in the DTS console task list.
To adjust synchronization objects after the task starts, edit the task configuration — note that you cannot remove databases from the synchronization scope.
To improve task reliability for high-throughput workloads, consider purchasing a dedicated cluster.