Data Transmission Service (DTS) can stream change data from ApsaraDB RDS for MySQL into ApsaraMQ for Kafka in real time. This lets downstream consumers — analytics pipelines, event-driven services, or data warehouses — react to row-level changes without querying the source database directly.
Prerequisites
Before you begin, make sure that you have:
An ApsaraDB RDS for MySQL instance and an ApsaraMQ for Kafka instance. For information on creating an RDS instance, see Create an ApsaraDB RDS for MySQL instance. For supported version combinations, see Overview of data synchronization scenarios.
A topic created in the destination Kafka instance to receive synchronized data. See Step 1: Create a topic.
Enough free storage space in the Kafka instance to hold all data from the source RDS MySQL instance.
Billing
| Synchronization type | Fee |
|---|---|
| Schema synchronization and full data synchronization | Free |
| Incremental data synchronization | Charged. See Billing overview. |
Limitations
Source database requirements
Tables to be synchronized must have a PRIMARY KEY or UNIQUE constraint with no duplicate field values. Otherwise, the destination may contain duplicate records.
If you rename tables or columns during synchronization and select individual tables as the sync objects, a single task supports up to 1,000 tables. To synchronize more tables, split them across multiple tasks or synchronize the entire database instead.
Do not execute DDL statements that change database or table schemas during schema synchronization or full data synchronization. Doing so causes the task to fail.
DTS does not synchronize foreign keys. Cascade and delete operations on the source database are not reflected in the destination.
Data generated by physical backup restores or cascade operations is not captured or synchronized while the task is running. If this data is missing from the destination, remove and re-add the affected databases and tables in the synchronization objects. See Modify the objects to be synchronized.
Binary logging requirements:
| Source type | Requirements |
|---|---|
| ApsaraDB RDS for MySQL | Binary logging is enabled by default. Set binlog_row_image to full. See Modify instance parameters. Retain binary logs for at least 3 days (7 days recommended). |
| Self-managed MySQL | Enable binary logging. Set binlog_format to row and binlog_row_image to full. For dual-primary clusters, also set log_slave_updates to ON. See Create an account for a self-managed MySQL database and configure binary logging. Retain binary logs for at least 7 days. |
If DTS cannot read the binary logs, the task fails and data inconsistency may occur. To set the retention period for RDS MySQL, see the Delete binary log files section.
MySQL 8.0.23 and later — invisible columns:
Invisible columns cannot be synchronized and their data is lost. To make a column visible, run:
ALTER TABLE <table_name> ALTER COLUMN <column_name> SET VISIBLE;Tables without explicit primary keys automatically get invisible primary keys. Make these visible before synchronizing. See Invisible Columns and Generated Invisible Primary Keys.
Other limitations
Evaluate the performance impact before starting. Full data synchronization reads and writes both databases heavily. Run synchronization during off-peak hours to reduce load.
Full data synchronization with concurrent INSERT operations causes table fragmentation in the destination. After full synchronization completes, the destination tablespace is larger than the source.
If you synchronize individual tables (not the entire database), do not use pt-online-schema-change for online DDL operations. Use Data Management (DMS) instead.
Do not write data from other sources to the destination Kafka instance during synchronization. Doing so causes data inconsistency.
If you scale the destination Kafka instance or cluster during synchronization, restart it afterward.
If a DTS task fails, DTS technical support attempts to restore it within 8 hours. The task may be restarted and task parameters may be modified during restoration.
ApsaraDB RDS for MySQL — instance-specific limitations:
| Instance type | Limitation |
|---|---|
| EncDB enabled | Full data synchronization is not supported. |
| Transparent Data Encryption (TDE) enabled | Schema synchronization, full data synchronization, and incremental data synchronization are all supported. |
| Read-only RDS MySQL 5.6 (no transaction logs) | Cannot be used as the source database. |
Special cases for self-managed MySQL
Performing a primary/secondary switchover while the task is running causes the task to fail.
If no DML operations are performed on the source database for a long time, synchronization latency reporting may be inaccurate. Perform a DML operation on the source database to reset the latency value. If you synchronize an entire database, create a heartbeat table that updates every second.
DTS executes
CREATE DATABASE IF NOT EXISTS 'test'in the source database on a schedule to advance the binary log file position.
Special cases for ApsaraDB RDS for MySQL
DTS executes
CREATE DATABASE IF NOT EXISTS 'test'in the source database on a schedule to advance the binary log file position.
Single-record size limit
The maximum size of a single record written to Kafka is 10 MB. If a source row exceeds this limit, the DTS task stops.
To work around this, exclude large-field tables from the synchronization objects, or use filter conditions to exclude the oversized fields. If the tables are already included, remove them, re-add them, and specify filter conditions that exclude the large fields.
Supported synchronization topologies
One-way one-to-one synchronization
One-way one-to-many synchronization
One-way many-to-one synchronization
For details, see Synchronization topologies.
SQL operations that can be synchronized
| Type | Operations |
|---|---|
| DML | INSERT, UPDATE, DELETE |
| DDL | CREATE TABLE, ALTER TABLE, DROP TABLE, RENAME TABLE, TRUNCATE TABLE; CREATE VIEW, ALTER VIEW, DROP VIEW; CREATE PROCEDURE, ALTER PROCEDURE, DROP PROCEDURE; CREATE FUNCTION, DROP FUNCTION, CREATE TRIGGER, DROP TRIGGER; CREATE INDEX, DROP INDEX |
Create a data synchronization task
Step 1: Go to the data synchronization page
Use one of the following methods:
DTS console
Log on to the DTS console.DTS console
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the synchronization instance resides.
DMS console
The steps below may vary based on your DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console.DMS console
In the top navigation bar, move the pointer over Data + AI and choose .
From the drop-down list to the right of Data Synchronization Tasks, select the region where the synchronization instance resides.
Step 2: Configure source and destination databases
Click Create Task to go to the task configuration page.
Configure the source and destination database parameters.
WarningAfter you configure the source and destination databases, read the Limits displayed on the page. Skipping this step may cause the task to fail or data inconsistency.
Source database parameters
Parameter Description Task Name Enter a descriptive name. DTS generates a name automatically, but a meaningful name helps identify the task. Task names do not need to be unique. Select Existing Connection Select a registered database instance to auto-populate the connection fields. If the instance is not registered, configure the fields manually. For registration instructions, see Manage database connections. Database Type Select MySQL. Access Method Select Alibaba Cloud Instance. Instance Region Select the region where the source RDS MySQL instance resides. Replicate Data Across Alibaba Cloud Accounts Select No for same-account synchronization. RDS Instance ID Select the source RDS MySQL instance. Database Account Enter the account with read permissions on the objects to be synchronized. Database Password Enter the password for the database account. Encryption Select Non-encrypted or SSL-encrypted. To use SSL encryption, enable it on the RDS instance first. See Use a cloud certificate to enable SSL encryption. Destination database parameters
Parameter Description Select Existing Connection Select a registered database instance to auto-populate the connection fields. If the instance is not registered, configure the fields manually. Database Type Select Kafka. Access Method Select Alibaba Cloud Instance. Instance Region Select the region where the destination Kafka instance resides. Kafka Instance ID Select the destination Kafka instance. Encryption Select Non-encrypted or SCRAM-SHA-256 based on your security requirements. Topic Select the topic to receive synchronized data. Topic That Stores DDL Information (Optional) Select a topic to store DDL information separately. If left blank, DDL information is stored in the topic set by Topic. Use Kafka Schema Registry Select No or Yes. If you select Yes, enter the URL or IP address registered in Kafka Schema Registry for your Avro schemas. Kafka Schema Registry provides a RESTful API to store and retrieve Avro schemas. Click Test Connectivity and Proceed.
DTS server CIDR blocks must be added to the security settings of the source and destination databases. DTS adds them automatically for Alibaba Cloud instances. For self-managed databases, see Add the CIDR blocks of DTS servers. If the access method is not Alibaba Cloud Instance, click Test Connectivity in the CIDR Blocks of DTS Servers dialog box first.
Step 3: Configure synchronization objects and options
In the Configure Objects step, set the following parameters.
Parameter Description Synchronization Types Incremental Data Synchronization is selected by default. Also select Schema Synchronization and Full Data Synchronization to synchronize historical data first, which serves as the baseline for incremental synchronization. NoteIf the destination is an ApsaraMQ for Kafka instance, Schema Synchronization is unavailable.
Processing Mode of Conflicting Tables Precheck and Report Errors: fails the precheck if identical table names exist in both databases. Use object name mapping to rename the conflicting tables. Ignore Errors and Proceed: skips the check. If the source and destination databases have the same schema and a record in the destination has the same primary key value or unique key value as a record in the source: during full synchronization, the existing record in the destination is kept; during incremental synchronization, the existing record in the destination is overwritten. Schema mismatches may cause initialization failures. Data Format in Kafka DTS Avro: data parsed using the DTS Avro schema. See the schema definition on GitHub. Canal JSON: data in Canal JSON format. See the Canal JSON section. Kafka Data Compression Format Choose based on your workload: LZ4 (default) — low compression ratio, fast speed; GZIP — high compression ratio, slow speed, high CPU usage; Snappy — balanced ratio and speed. Policy for Shipping Data to Kafka Partitions Select a partition routing policy. See Specify the policy for migrating data to Kafka partitions. Message acknowledgement mechanism Configure based on your reliability requirements. See Message acknowledgement mechanism. Capitalization of Object Names in Destination Instance Select DTS default policy or choose another option to match the capitalization of the source or destination database. See Specify the capitalization of object names in the destination instance. Source Objects Select one or more objects and click
to add them to Selected Objects. Only tables can be selected as sync objects.Selected Objects Use the object name mapping feature to set the destination topic, number of partitions, and partition keys per table. See Use the object name mapping feature. To filter specific SQL operations for a table, right-click the object in Selected Objects and select the operations. Note: Renaming an object may break dependent objects. Click Next: Advanced Settings and configure the following parameters.
Parameter Description Dedicated Cluster for Task Scheduling By default, DTS schedules the task to a shared cluster. Purchase a dedicated cluster to improve stability. See What is a DTS dedicated cluster. Retry Time for Failed Connections The time range DTS retries failed connections. Valid values: 10–1440 minutes. Default: 720 minutes. Set this to more than 30 minutes. If multiple tasks share a source or destination database, the shortest retry time applies. DTS charges for the instance during retries. Retry Time for Other Issues The time range DTS retries failed DDL or DML operations. Valid values: 1–1440 minutes. Default: 10 minutes. Set this to more than 10 minutes. This value must be less than Retry Time for Failed Connections. Enable Throttling for Full Data Synchronization Limit read QPS (queries per second) and write throughput during full synchronization to reduce load on the destination. Configure the Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) parameters. Available only when Full Data Synchronization is selected. Enable Throttling for Incremental Data Synchronization Limit write throughput for incremental synchronization by configuring the RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s) parameters. Whether to delete SQL operations on heartbeat tables of forward and reverse tasks Yes: DTS does not write heartbeat SQL to the source database. Synchronization latency may appear in the task. No: DTS writes heartbeat SQL to the source. Physical backup and cloning operations on the source database may be affected. Environment Tag (Optional) Assign an environment tag to identify this DTS instance. Configure ETL Yes: configure extract, transform, and load (ETL) processing by entering data processing statements. See Configure ETL in a data migration or data synchronization task. No: skip ETL. Monitoring and Alerting Yes: configure alert thresholds and notification contacts. DTS sends alerts when the task fails or synchronization latency exceeds the threshold. See Configure monitoring and alerting when you create a DTS task. No: no alerting.
Step 4: Run a precheck
Click Next: Save Task Settings and Precheck. To preview the API parameters for this configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters before proceeding.
DTS runs a precheck before the synchronization task starts. The task only starts after all precheck items pass.
If any precheck item fails, click View Details to see the cause, fix the issue, and click Precheck Again. If a precheck item generates an alert:
If the alert cannot be ignored, fix the issue and rerun the precheck.
If the alert can be ignored, click Confirm Alert Details, then click Ignore in the dialog box, click OK, and then click Precheck Again. Ignoring an alert may cause data inconsistency.
Step 5: Purchase and start the instance
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
On the buy page, configure the following parameters.
Parameter Description Billing Method Subscription: pay upfront. More cost-effective for long-term use. Pay-as-you-go: billed hourly. Suitable for short-term use. Release the instance when no longer needed to avoid ongoing charges. Resource Group Settings Select the resource group for this instance. Default: default resource group. See What is Resource Management? Instance Class Select an instance class based on the required synchronization throughput. See Instance classes of data synchronization instances. Subscription Duration (Subscription only) Set the duration: 1–9 months, 1 year, 2 years, 3 years, or 5 years. Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the dialog box.
The task appears in the task list. Monitor its progress from there.
Use the object name mapping feature
The object name mapping feature lets you route data from each source table to a specific Kafka topic, set the number of partitions, and define partition keys.
In the Selected Objects section, hover over a table name.
Right-click and select Edit.
In the Edit Table dialog box, configure the following parameters.
Parameter Description Table Name Enter the name of the destination topic. By default, this is the topic set in the Destination Database section. If the destination is an ApsaraMQ for Kafka instance, the topic must already exist — DTS does not create it. If the destination is a self-managed Kafka cluster and schema synchronization is included, DTS attempts to create the topic. Filter Conditions Specify SQL conditions to filter which rows are synchronized. See Specify filter conditions. Number of Partitions Set the number of partitions in the destination topic. Partition Key Available when Policy for Shipping Data to Kafka Partitions is set to Ship Data to Separate Partitions Based on Hash Values of Primary Keys. Specify one or more columns as partition keys. DTS routes rows to partitions based on the hash values of these columns. To select columns as partition keys, first clear Synchronize All Tables. Click OK.
FAQ
Can I change the Kafka Data Compression Format or Message acknowledgement mechanism after the task is created?
Yes. Modify these settings through the object modification feature. See Modify the objects to be synchronized.