Use Data Transmission Service (DTS) to stream incremental changes from an ApsaraDB RDS for SQL Server instance into a DataHub project. DTS captures changes using change data capture (CDC) and delivers them as a continuous stream without requiring full data exports.
Prerequisites
Before you begin, make sure you have:
An ApsaraDB RDS for SQL Server instance. For supported SQL Server versions, see Overview of data synchronization scenarios.
DataHub activated and a project created to receive the synchronized data. See Get started with DataHub and Manage projects.
Owner permissions on the objects to be synchronized. See Create a privileged account or a standard account and Modify the permissions of an account.
Consider splitting into multiple tasks if any of the following apply to the source instance:
More than 10 databases
Transaction log backup interval less than 1 hour per database
More than 100 Data Definition Language (DDL) operations per hour per database
Transaction log write rate of 20 MB/s per database
CDC needs to be enabled for more than 1,000 tables
Billing
| Synchronization type | Fee |
|---|---|
| Schema synchronization and full data synchronization | Free |
| Incremental data synchronization | Charged. See Billing overview. |
Supported synchronization topologies
One-way one-to-one synchronization
One-way one-to-many synchronization
One-way many-to-one synchronization
For details, see Synchronization topologies.
Supported SQL operations
DTS synchronizes the following DML operations: INSERT, UPDATE, and DELETE.
If an UPDATE operation modifies only large fields, DTS does not synchronize it.Full data synchronization is not supported. Only incremental data synchronization and schema synchronization are available for this source-destination pair.
Limitations
Source database requirements
Tables must have a primary key or unique key, with all fields unique. Tables without these constraints may produce duplicate records in the destination.
When selecting tables as the synchronization objects and applying object renaming, a single task can synchronize up to 5,000 tables. For more than 5,000 tables, configure multiple tasks or synchronize at the database level.
A single task can synchronize data from up to 10 databases. For more than 10 databases, configure multiple tasks.
DTS uses the
fn_logfunction to read transaction logs. Do not clear transaction logs while a task is running — doing so may cause the task to fail.Disable Transparent Data Encryption (TDE) on the source RDS SQL Server instance to ensure task stability. See Disable TDE.
Do not perform DDL operations that change database or table schemas during schema synchronization — the task will fail.
In hybrid log-based parsing mode, do not execute multiple
ADD COLUMNorDROP COLUMNoperations within 10 minutes. For example, the following sequence within 10 minutes causes a task error:ALTER TABLE test_table DROP COLUMN Flag; ALTER TABLE test_table ADD Remark nvarchar(50) NOT NULL DEFAULT('');Enable
READ_COMMITTED_SNAPSHOTon the source database during full data synchronization to prevent shared lock contention. Without this setting, data inconsistency or instance failures may occur and are not covered by the DTS service level agreement (SLA).If the source is an Azure SQL Database, a single task can synchronize data from only one database.
Transaction log requirements
| Scenario | Minimum log retention |
|---|---|
| Incremental synchronization only | More than 24 hours |
| Full + incremental synchronization | At least 7 days |
Additional requirements:
The data logging feature must be enabled with the backup mode set to Full, and a full physical backup must have been performed.
After full synchronization completes, the log retention period can be reduced to more than 24 hours.
Failure to meet log retention requirements may cause DTS to fail to obtain transaction logs, resulting in task failure, data inconsistency, or data loss. Outcomes in such circumstances are not covered by the DTS SLA.
CDC requirements
If CDC needs to be enabled for the source tables, the following conditions must be met — otherwise the precheck fails:
The value of the
srvnamefield in thesys.sysserversview must match the return value of theSERVERPROPERTYfunction.For self-managed SQL Server: the database owner must be the
sauser.For ApsaraDB RDS for SQL Server: the database owner must be the
sqlsauser.Enterprise edition: SQL Server 2008 or later is required.
Standard edition: SQL Server 2016 SP1 or later is required.
If running SQL Server 2017 (Standard or Enterprise edition), update to a newer version.
The number of tables with CDC enabled in a single task cannot exceed 1,000. Tasks exceeding this limit fail the precheck.
If a single field value written to a CDC-enabled table exceeds 64 KB, run the following command on the source database:
By default, CDC processes fields up to 64 KB.
exec sp_configure 'max text repl size', -1;
Destination and other limits
A single string in the destination DataHub project cannot exceed 2 MB.
DTS does not synchronize foreign keys. Cascade and delete operations on the source are not propagated to the destination.
DTS does not synchronize tables that contain computed columns.
Do not write data from other sources to the destination during synchronization — this causes data inconsistency. For example, running online DDL operations via Data Management (DMS) while other sources write to the destination may cause data loss.
If the task involves incremental synchronization, reindexing operations are not supported. Performing a reindex may cause the task to fail and result in data loss.
DTS cannot synchronize DDL operations related to the primary key of a CDC-enabled table.
When modifying the objects to be synchronized, you cannot remove databases.
If multiple DTS instances share the same source SQL Server database, their incremental data collection modules operate independently.
If a DTS task fails, DTS technical support will attempt to restore it within 8 hours. During this period, the task may be restarted and task parameters (not database parameters) may be modified.
Schema synchronization scope
DTS supports initial schema synchronization for: schemas, tables, views, functions, and procedures.
The source and destination are heterogeneous databases. Data types do not map one-to-one. Evaluate the impact of data type conversion before starting the task. See Data type mappings for initial schema synchronization.
DTS does not synchronize the following object types: assemblies, service brokers, full-text indexes, full-text catalogs, distributed schemas, distributed functions, CLR stored procedures, CLR scalar-valued functions, CLR table-valued functions, internal tables, systems, and aggregate functions.
DTS does not synchronize the following data types: CURSOR, ROWVERSION, SQL_VARIANT, HIERARCHYID, POLYGON, GEOMETRY, GEOGRAPHY, and custom types created with CREATE TYPE.
Objects DTS creates in the source database
In Incremental Synchronization Based on Logs of Source Database mode, DTS creates the following objects in the source database to track synchronization state:
| Object | Type | Purpose |
|---|---|---|
dts_cdc_sync_ddl | Trigger | Captures DDL changes |
dts_sync_progress | Table | Tracks synchronization latency (heartbeat) |
dts_cdc_ddl_history | Table | Stores DDL change history |
In hybrid log-based parsing mode, DTS also enables CDC on the source database and specific tables. Set the maximum throughput to 1,000 records per second (RPS) for CDC-enabled tables.
Set up the synchronization task
Step 1: Open the Data Synchronization page
Use one of the following methods:
DTS console
Log on to the DTS console.DTS console
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the synchronization instance will reside.
DMS console
The steps may vary depending on the DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console.DMS console
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list to the right of Data Synchronization Tasks, select the region where the synchronization instance will reside.
Step 2: Configure source and destination databases
Click Create Task, then configure the following parameters:
| Parameter | Description | Required |
|---|---|---|
| Task name | A name for the DTS task. DTS auto-generates a name — specify a descriptive name for easier identification. Unique names are not required. | No |
| Source database — select existing connection | Select a registered database instance to auto-populate connection parameters. If no registered instance is available, configure the parameters below manually. | No |
| Database type (source) | Select SQL Server. | Yes |
| Access method (source) | Select Alibaba Cloud Instance. | Yes |
| Instance region (source) | The region of the source ApsaraDB RDS for SQL Server instance. | Yes |
| Replicate data across Alibaba Cloud accounts | Select No unless synchronizing across accounts. | Yes |
| RDS instance ID | The ID of the source ApsaraDB RDS for SQL Server instance. | Yes |
| Database account | The database account with owner permissions on the objects to be synchronized. | Yes |
| Database password | The password for the database account. | Yes |
| Encryption | Select Non-encrypted if SSL is not enabled on the source. Select SSL-encrypted if SSL is enabled — DTS trusts the server certificate by default. | Yes |
| Destination database — select existing connection | Select a registered database instance to auto-populate connection parameters. If no registered instance is available, configure the parameters below manually. | No |
| Database type (destination) | Select DataHub. | Yes |
| Access method (destination) | Select Alibaba Cloud Instance. | Yes |
| Instance region (destination) | The region of the destination DataHub project. | Yes |
| Project | The name of the destination DataHub project. | Yes |
After filling in the parameters, click Test Connectivity and Proceed.
Make sure the CIDR blocks of DTS servers are added to the security settings of the source and destination databases. See Add the CIDR blocks of DTS servers. If the source or destination is a self-managed database with an access method other than Alibaba Cloud Instance, click Test Connectivity in the CIDR blocks of DTS servers dialog box.
Step 3: Configure synchronization objects
In the Configure Objects step, set the following parameters:
| Parameter | Description | Required |
|---|---|---|
| Synchronization types | Incremental Data Synchronization is selected by default. Also select Schema Synchronization. | Yes |
| Processing mode of conflicting tables | Precheck and Report Errors: fails the precheck if tables with identical names exist in the destination. Ignore Errors and Proceed: skips the name conflict check — use with caution, as this may cause data inconsistency. See the conflict handling behavior details below. | Yes |
| Additional column rule | When DTS writes data to a DataHub topic, it adds extra columns. If these columns conflict with existing column names in the destination topic, the task fails. Select New Rule or Previous Rule based on your requirements. Check for name conflicts before setting this parameter. See Naming rules for additional columns. | Yes |
| Case Policy for Destination Object Names | Controls the casing of database, table, and column names in the destination. Default: DTS default policy. See Specify the capitalization of object names in the destination instance. | No |
| Source Objects | Select objects and click | Yes |
| Selected objects | To rename a single object, right-click it. See Individual database, table, and column name mapping. To rename multiple objects, click Batch Edit. See Batch database, table, and column name mapping. Right-click an object to select specific SQL operations to synchronize, or to set WHERE conditions to filter rows. See Set filter conditions. Renaming an object may break other objects that depend on it. | Yes |
Conflict handling behavior (when Ignore Errors and Proceed is selected):
Same schema, matching primary or unique key: during full synchronization, the existing destination record is kept; during incremental synchronization, the destination record is overwritten.
Different schemas: initialization may fail, only some columns may be synchronized, or the task may fail.
Step 4: Configure advanced settings
Click Next: Advanced Settings and configure the following:
| Parameter | Description | Required |
|---|---|---|
| Dedicated cluster for task scheduling | By default, DTS schedules the task to the shared cluster. Purchase a dedicated cluster to improve task stability. See What is a DTS dedicated cluster. | No |
| Retry time for failed connections | How long DTS retries when the source or destination is unreachable after the task starts. Valid values: 10–1440 minutes. Default: 720. Set to more than 30 minutes. If the shortest retry time across shared-source tasks applies, shorter retry times take precedence. DTS charges for the instance during retry periods. | No |
| Retry time for other issues | How long DTS retries when DDL or DML operations fail. Valid values: 1–1440 minutes. Default: 10. Set to more than 10 minutes, and to a value less than Retry time for failed connections. | No |
| Enable throttling for incremental data synchronization | Limits the synchronization rate to reduce load on the destination. Configure RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s) if enabled. | No |
| Environment tag | A tag to identify the DTS instance. | No |
| Configure ETL | Select Yesalert notification settings to enable extract, transform, and load (ETL) processing and enter data processing statements. Select No to skip. See What is ETL? and Configure ETL in a data migration or data synchronization task. | No |
| Monitoring and alerting | Select Yes to receive notifications when the task fails or synchronization latency exceeds a threshold. Configure alert thresholds and notification settings. See Configure monitoring and alerting when you create a DTS task. | No |
Step 5: Run the precheck
Click Next: Save Task Settings and Precheck.
To preview the API parameters for this task configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
DTS runs a precheck before starting the task. Handle precheck results as follows:
Failure: click View Details next to the failed item, resolve the issue, then rerun the precheck.
Alert — cannot be ignored: click View Details, resolve the issue, then rerun the precheck.
Alert — can be ignored: click Confirm Alert Details. In the View Details dialog box, click Ignore, then click OK. Click Precheck Again to proceed. Ignoring alerts may cause data inconsistency.
Step 6: Purchase and start the instance
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
On the Buy page, configure the following:
Parameter Description Required Billing method Subscription: pay upfront; cost-effective for long-term use. Pay-as-you-go: billed hourly; suitable for short-term use. Release the instance when no longer needed to stop charges. Yes Resource group settings The resource group for the synchronization instance. Default: default resource group. See What is Resource Management? No Instance class The synchronization throughput tier. Select based on your data volume and latency requirements. See Instance classes of data synchronization instances. Yes Subscription duration Available for the Subscription billing method. Options: 1–9 months, or 1, 2, 3, or 5 years. Conditional Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start. In the dialog box, click OK.
The task appears in the task list. Monitor its progress there.
Special cases
If the source is an ApsaraDB RDS for SQL Server instance, DTS automatically creates an account named rdsdt_dtsacct on the instance for synchronization. Do not delete this account or change its password while the task is running — otherwise, the task may fail. See System accounts.
What's next
Monitor task status and latency in the DTS console task list.
To adjust synchronization objects after the task starts, edit the task configuration — note that you cannot remove databases from the synchronization scope.
To improve task reliability for high-throughput workloads, consider purchasing a dedicated cluster.