Synchronize data from RDS for MySQL to ApsaraMQ for Kafka using DTS - ApsaraDB RDS

Use Data Transmission Service (DTS) to stream change data from an ApsaraDB RDS for MySQL instance into an ApsaraMQ for Kafka instance. DTS captures row-level changes via binary logs and delivers them to Kafka topics in real time, enabling downstream analytics, event-driven architectures, and data pipeline integrations.

Prerequisites

Before you begin, make sure you have:

An ApsaraDB RDS for MySQL instance. For setup instructions, see Create an ApsaraDB RDS for MySQL instance.
An ApsaraMQ for Kafka instance with a topic created to receive synchronized data. See Step 1: Create a topic. For supported source and destination versions, see Overview of data synchronization scenarios.
Enough free storage space in the Kafka instance — at least as large as the total data size in the RDS MySQL instance.
Binary logging enabled on the source instance with binlog_row_image set to full. Binary logging is enabled by default on ApsaraDB RDS for MySQL. To verify or change this parameter, see Modify instance parameters.

Limitations

Source database

Tables must have a primary key or UNIQUE constraint with all fields unique. Without this, the destination may contain duplicate records.
If you select individual tables (not the entire database) and want to rename tables or columns during synchronization, a single task supports up to 1,000 tables. For more than 1,000 tables, configure multiple tasks or synchronize the entire database instead.
Binary log requirements:
- Set binlog_row_image to full. If this parameter is not set correctly, the precheck fails and the task cannot start.
- Retain binary logs for at least 3 days on ApsaraDB RDS for MySQL (7 days recommended). For self-managed MySQL, retain logs for at least 7 days. Shorter retention periods may cause task failures or data loss, and may affect DTS service reliability under its Service Level Agreement (SLA). For details, see the Delete binary log files section.
- For self-managed MySQL, also set binlog_format to row. In a dual-primary cluster, set log_slave_updates to ON so DTS can obtain all binary logs. See Create an account for a self-managed MySQL database and configure binary logging.
Do not run DDL statements that change database or table schemas during schema synchronization or full data synchronization — this causes the task to fail.
Data generated by binary log change operations — such as data restored from a physical backup or data from cascade operations — is not captured or synchronized. If needed, you can remove the affected databases and tables from the synchronization objects and re-add them. See Modify the objects to be synchronized.
For MySQL 8.0.23 and later, invisible columns cannot be synchronized and their data is lost. To make a column visible, run ALTER TABLE <table_name> ALTER COLUMN <column_name> SET VISIBLE;. Tables without explicit primary keys may auto-generate invisible primary keys — make those visible too. See Invisible Columns and Generated Invisible Primary Keys.
A read-only ApsaraDB RDS for MySQL 5.6 instance cannot be used as the source because it does not record transaction logs.

Other limits

DTS does not synchronize foreign keys. Cascade and delete operations triggered in the source are not propagated to the destination.
Full data synchronization uses read and write resources of both source and destination instances, increasing database load. Run synchronization during off-peak hours when possible.
During full data synchronization, concurrent INSERT operations cause table fragmentation in the destination. After full synchronization, the destination tablespace is typically larger than the source.
If you select one or more tables instead of an entire database as the objects to be synchronized, do not use tools such as pt-online-schema-change for online DDL operations on those tables during synchronization — this may cause synchronization to fail. Use Data Management (DMS) for online DDL instead. See Perform lock-free DDL operations.
Do not write data from other sources to the destination during synchronization. External writes cause data inconsistency and may result in data loss.
If the destination Kafka instance is scaled during synchronization, restart the instance to resume the task.
ApsaraDB RDS for MySQL instances with the EncDB feature enabled do not support full data synchronization. Instances with Transparent Data Encryption (TDE) enabled support schema synchronization, full data synchronization, and incremental data synchronization.
If a DTS task fails, DTS support will attempt to restore it within 8 hours. During restoration, the task may be restarted and task parameters may be modified. Database parameters are not changed.
If you perform a primary/secondary switchover on a self-managed MySQL source while the task is running, the task fails.
DTS calculates synchronization latency based on the timestamp of the latest synchronized data in the destination database and the current timestamp in the source database. If no DML operation is performed on the source database for a long time, the synchronization latency may be inaccurate. If the latency appears too high, you can perform a DML operation on the source database to update the latency. If you select an entire database as the synchronization object, you can also create a heartbeat table — the heartbeat table is updated or receives data every second.
DTS executes CREATE DATABASE IF NOT EXISTS 'test' in the source database periodically to advance the binary log file position. This is expected behavior.

Record size limit

The maximum size of a single record written to Kafka is 10 MB. If a source row exceeds 10 MB, the task is interrupted. To avoid this, exclude large-field columns using filter conditions when configuring the task. If a table with large fields is already included in the task objects, remove the table, re-add it, and configure filter conditions to exclude the large fields.

Billing

Synchronization type	Fee
Schema synchronization and full data synchronization	Free of charge
Incremental data synchronization	Charged. See Billing overview.

Supported synchronization topologies

One-way one-to-one synchronization
One-way one-to-many synchronization
One-way many-to-one synchronization

For all supported topologies, see Synchronization topologies.

SQL operations that can be synchronized

Operation type	SQL statements
DML	INSERT, UPDATE, DELETE
DDL	CREATE TABLE, ALTER TABLE, DROP TABLE, RENAME TABLE, TRUNCATE TABLE; CREATE VIEW, ALTER VIEW, DROP VIEW; CREATE PROCEDURE, ALTER PROCEDURE, DROP PROCEDURE; CREATE FUNCTION, DROP FUNCTION, CREATE TRIGGER, DROP TRIGGER; CREATE INDEX, DROP INDEX

DTS does not synchronize foreign keys from the source to the destination. Cascade and delete operations on the source are not replicated.

Create a synchronization task

Step 1: Go to the Data Synchronization page

Use either the DTS console or the DMS console.

DTS console

Log on to the DTS console.
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the synchronization instance will reside.

DMS console

Exact steps may vary depending on the DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console.

Log on to the DMS console.
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list to the right of Data Synchronization Tasks, select the region where the synchronization instance will reside.

Step 2: Configure source and destination databases

Click Create Task.

Configure the source and destination databases using the parameters in the following table.

Warning

After configuring the source and destination databases, review the Limits shown on the page to avoid task failures or data inconsistency.

Section	Parameter	Description
N/A	Task Name	A name for the DTS task. DTS generates a name automatically. Specify a descriptive name to make the task easy to identify. The name does not need to be unique.
Source Database	Select Existing Connection	If the instance is registered with DTS, select it from the drop-down list and DTS fills in the remaining parameters automatically. Otherwise, configure the database parameters manually. In the DMS console, select from the Select a DMS database instance drop-down list.
	Database Type	Select MySQL.
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	The region where the source ApsaraDB RDS for MySQL instance resides.
	Replicate Data Across Alibaba Cloud Accounts	Select No for same-account synchronization.
	RDS Instance ID	The ID of the source ApsaraDB RDS for MySQL instance.
	Database Account	A database account with read permissions on the objects to be synchronized.
	Database Password	The password for the database account.
	Encryption	Select Non-encrypted or SSL-encrypted. To use SSL encryption, enable SSL on the RDS instance before configuring the DTS task. See Use a cloud certificate to enable SSL encryption.
Destination Database	Select Existing Connection	If the instance is registered with DTS, select it from the drop-down list. Otherwise, configure the database parameters manually.
	Database Type	Select Kafka.
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	The region where the destination ApsaraMQ for Kafka instance resides.
	Kafka Instance ID	The ID of the destination ApsaraMQ for Kafka instance.
	Encryption	Select Non-encrypted or SCRAM-SHA-256 based on your security requirements.
	Topic	The topic that receives the synchronized data. Select from the drop-down list.
	Topic That Stores DDL Information	The topic that stores DDL information. If left blank, DDL information is stored in the topic specified by Topic.
	Use Kafka Schema Registry	Whether to use Kafka Schema Registry for Avro schema storage and retrieval. Select No or Yes. If Yes, enter the URL or IP address registered in Kafka Schema Registry for your Avro schemas.

Click Test Connectivity and Proceed.
Make sure DTS server CIDR blocks are added to the security settings of both source and destination databases. See Add the CIDR blocks of DTS servers. For self-managed databases not using Alibaba Cloud Instance as the access method, click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.

Step 3: Configure synchronization objects

In the Configure Objects step, set the synchronization parameters.

Parameter	Description
Synchronization Types	Incremental Data Synchronization is selected by default. Also select Full Data Synchronization to synchronize historical data as the baseline for incremental synchronization. Note Schema synchronization is not available when the destination is an ApsaraMQ for Kafka instance.
Processing Mode of Conflicting Tables	Precheck and Report Errors: Fails the precheck if the destination has tables with the same names as source tables. Use object name mapping to rename conflicting tables. See Database, table, and column name mapping. Ignore Errors and Proceed: Skips the precheck for duplicate table names. During full synchronization, conflicting records are not overwritten — existing destination records are retained. During incremental synchronization, conflicting records overwrite existing destination records. If schemas differ, synchronization may fail or only some columns are synchronized. Proceed with caution.
Data Format in Kafka	The message format written to Kafka. DTS Avro: Data is structured per the DTS Avro schema definition. See the schema on GitHub. Canal Json: Data is stored in Canal JSON format. See Canal Json.
Kafka Data Compression Format	Compression algorithm for Kafka messages. LZ4 (default): low compression ratio, high speed. GZIP: high compression ratio, low speed — consumes significant CPU resources. Snappy: medium compression ratio and speed.
Policy for Shipping Data to Kafka Partitions	How records are distributed across Kafka partitions. See Specify the policy for migrating data to Kafka partitions.
Message acknowledgement mechanism	Kafka producer acknowledgement settings. See Message acknowledgement mechanism.
Capitalization of object names in destination instance	Controls the case of database, table, and column names in the destination. Default is DTS default policy. See Specify the capitalization of object names in the destination instance.
Source Objects	Select one or more objects and click the icon to add them to Selected Objects. You can select tables as the objects to be synchronized.
Selected Objects	Lists the selected objects. Use the object name mapping feature to specify the destination topic name, number of partitions, and partition keys for each source table. See Use the object name mapping feature. To filter SQL operations per object, right-click an object in the Selected Objects section and select the operations to synchronize.

Click Next: Advanced Settings and configure the advanced parameters.

Parameter	Description
Dedicated Cluster for Task Scheduling	By default, DTS schedules the task to the shared cluster. For improved stability, purchase and select a dedicated cluster. See What is a DTS dedicated cluster.
Retry Time for Failed Connections	How long DTS retries failed connections after the task starts. Valid values: 10–1,440 minutes. Default: 720 minutes. Set to at least 30 minutes. If DTS reconnects within this period, the task resumes; otherwise the task fails. If multiple tasks share the same source or destination, the shortest configured retry time applies. Note: You are charged for the DTS instance during the retry period. We recommend that you specify the retry time based on your business requirements, and release the DTS instance promptly after the source and destination instances are released.
Retry Time for Other Issues	How long DTS retries failed DDL or DML operations. Valid values: 1–1,440 minutes. Default: 10 minutes. Set to at least 10 minutes. This value must be less than Retry Time for Failed Connections.
Enable Throttling for Full Data Synchronization	Limits read/write throughput during full synchronization to reduce load on source and destination servers. Configure QPS (queries per second) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected.
Enable Throttling for Incremental Data Synchronization	Limits throughput during incremental synchronization. Configure RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Whether to delete SQL operations on heartbeat tables of forward and reverse tasks	Controls whether DTS writes heartbeat SQL operations to the source database. Yes: Does not write heartbeat operations — a latency indicator may appear on the task. No: Writes heartbeat operations — may affect physical backup and cloning of the source database.
Environment Tag	An optional tag to identify the DTS instance.
Configure ETL	Whether to enable extract, transform, and load (ETL). Yes: Opens a code editor to enter data processing statements. See Configure ETL in a data migration or data synchronization task. No: ETL is disabled.
Monitoring and Alerting	Whether to configure alerts for the task. Yes: Set an alert threshold and notification contacts — see Configure monitoring and alerting when you create a DTS task. No: Alerts are disabled.

Step 4: Run the precheck

Click Next: Save Task Settings and Precheck.
To preview the API parameters for this task configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
Review the precheck results:
- If all items pass, proceed to the next step.
- If an item fails, click View Details next to the failed item, resolve the issue, and click Precheck Again.
- If an alert is triggered for an item that cannot be ignored, resolve the issue and rerun the precheck. For ignorable alerts, click View Details next to the alert item, then click Ignore > OK, and click Precheck Again.
Warning
Ignoring precheck alerts may cause data inconsistency.

Step 5: Purchase an instance and start the task

Wait for Success Rate to reach 100%, then click Next: Purchase Instance.

On the buy page, configure the billing and instance parameters.

Parameter	Description
Billing Method	Subscription: Pay upfront for a fixed term. More cost-effective for long-term use. Subscription duration options: 1–9 months, or 1, 2, 3, or 5 years. Pay-as-you-go: Billed hourly. Suitable for short-term use — release the instance when no longer needed to stop billing.
Resource Group Settings	The resource group for the synchronization instance. Default: default resource group. See What is Resource Management?.
Instance Class	Instance classes vary in synchronization speed. See Instance classes of data synchronization instances.

Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the dialog box.

The task appears in the task list. Track its progress from there.

Use the object name mapping feature

Use this feature to route source table data to a specific Kafka topic, control the partition count, and set partition keys.

In the Selected Objects section, hover over the topic name.
Right-click and select Edit.

In the Edit Table dialog box, configure the parameters.

Parameter	Description
Table Name	The topic that receives data from this source table. Defaults to the topic set in the Destination Database section. For ApsaraMQ for Kafka destinations, the topic must already exist — DTS does not create it automatically. For self-managed Kafka with schema synchronization, DTS attempts to create the topic. Changing this value routes the source table's data to the specified topic.
Filter Conditions	SQL-based row filter for this table. See Specify filter conditions.
Number of Partitions	The number of partitions in the destination topic.
Partition Key	One or more columns used to compute partition hash values, applicable when Policy for Shipping Data to Kafka Partitions is set to Ship Data to Separate Partitions Based on Hash Values of Primary Keys. To configure partition keys, first clear Synchronize All Tables.

Click OK.

FAQ

Can I modify the Kafka Data Compression Format after the task starts?

Yes. See Modify the objects to be synchronized.

Can I modify the Message acknowledgement mechanism after the task starts?

Yes. See Modify the objects to be synchronized.