Architecture - Data Transmission Service - Alibaba Cloud Documentation Center

This topic describes the architecture of Data Transmission Service (DTS) and how it works in each replication mode.

DTS Architecture

Architecture description

Primary/secondary redundancy
Each module of DTS is deployed on servers with primary/secondary redundancy. The HA manager continuously performs health checks on each server. If an exception occurs on a server, the workloads on that server are switched over to a healthy server with minimal latency.
Endpoint change detection
For continuous data replications, such as data synchronization and change tracking, the HA manager detects the changes made to the endpoints of data sources. If an instance endpoint has changed, the HA manager reconfigures the data source to ensure the connection.

How DTS works in data migration mode

A complete data migration process consists of three phases: schema migration, full data migration, and incremental data migration. To keep the source database operational during data migration, you must select all the phases when you configure a data migration task.

Schema migration: Before data is migrated, DTS re-creates a schema in the destination database. For data migration between heterogeneous databases, DTS parses DDL code of the source database and translates the code into the syntax of the destination database. Then, DTS re-creates the schema objects in the destination database.
Full data migration: DTS migrates the historical data from the source database to the destination database. The source database can remain operational, and data updates in the source database do not cease in this process. DTS uses an incremental data reader to capture the ongoing data updates that occur during the full data migration phase. Incremental data reading is activated when full data migration starts. During full data migration, incremental data is parsed, reformatted, and stored locally in the DTS server.
Incremental data migration: After full data migration is completed, DTS retrieves the incremental data stored locally in the DTS server, and reformats and migrates the data to the destination database. This process continues until all ongoing data changes are replicated to the destination database and the destination database is in sync with the source database.

How DTS works in data synchronization mode

In data synchronization mode, DTS replicates ongoing data changes between two data stores. This mode is typically used for OLTP-to-OLAP replications. OLTP stands for online transaction processing, whereas OLAP stands for online analytical processing. A data synchronization task consists of the following phases:

Initial data synchronization: DTS synchronizes the historical data from the source database to the destination database.
Real-time data synchronization: DTS synchronizes ongoing data changes and keeps the destination database in sync with the source database.

To synchronize ongoing data changes, DTS uses two components that work with transaction logs:

Transaction log reader: The transaction log reader obtains data from the source instance. After parsing, filtering, and syntax conversion, the data is persisted locally in the DTS server. The transaction log reader connects to the source instance over the corresponding protocol and reads the logs about the incremental data of the source instance. For example, this reader connects to an ApsaraDB RDS for MySQL instance over the binlog dump protocol.
Transaction log applier: The transaction log applier retrieves data updates from the transaction log reader, filters out the updates that are not related to the objects being replicated, and applies the filtered updates to the destination database. In this process, the transaction log applier maintains the atomicity, consistency, isolation, and durability (ACID) properties of transactions. Both the transaction log reader and the transaction log applier are deployed in redundancy mode. The HA manager checks the health conditions of each server. If an exception occurs, the execution of transaction logs is resumed on a healthy server.

How DTS works in change tracking mode

The change tracking mode allows you to obtain the logs about the incremental data of an ApsaraDB RDS instance in real time. You can track the logs on the change tracking server by using DTS SDKs. You can also customize data consumption rules based on your requirements.

The log processor obtains data from the source database. After parsing, filtering, and syntax conversion, the data is stored locally in the DTS server.

The log processor connects to the source instance over the corresponding protocol and reads the logs about the incremental data of the source instance. For example, this log processor connects to an ApsaraDB RDS for MySQL instance over the binlog dump protocol.

DTS ensures the high availability of the log processor and SDK-based consumption processes.

To ensure the high availability of the log processor, the HA manager restarts the log processor on a healthy service node when an exception is detected on the log processor.
To ensure the high availability of SDK-based consumption processes on the server, DTS allows you to start multiple SDK-based consumption processes for a single change tracking task. The server pushes incremental data by using a single process at a time. If an exception occurs in a process, the server pushes incremental data by using another consumption process.