How to synchronize data from a PolarDB-X 2.0 instance to an Alibaba Cloud Message Queue for Kafka instance - Data Transmission Service

When your downstream applications need a real-time feed of database changes — for event-driven processing, search indexing, or analytics — streaming raw change data from PolarDB-X 2.0 to Apache Kafka decouples producers from consumers without modifying your application. Data Transmission Service (DTS) captures INSERT, UPDATE, and DELETE operations from the source binlog and delivers them to your Kafka topic so downstream consumers always have an up-to-date view of your data.

Prerequisites

Before you begin, ensure that you have:

A PolarDB-X 2.0 instance compatible with MySQL 5.7
A Message Queue for Apache Kafka instance whose version is supported by DTS — see Overview of data synchronization scenarios
Available storage space in the Kafka instance that exceeds the total data size of the source PolarDB-X instance
A Kafka topic created to receive the synchronized data — see Step 1: Create a topic

Limitations

Source database requirements

Tables must have PRIMARY KEY or UNIQUE constraints, and all fields must be unique. Without this, the destination may contain duplicate records.
If you select tables as objects to synchronize and need to rename tables or columns in the destination, a single task supports at most 5,000 tables. For more than 5,000 tables, split the work across multiple tasks, or synchronize the entire database instead.
Only tables can be selected as objects to synchronize. Views, triggers, and stored procedures are not synchronized.
DTS does not synchronize foreign keys. Cascade and delete operations on the source are not replicated to the destination.

Binary log requirements

Enable binary logging in the PolarDB-X 2.0 console and set binlog_row_image to full. If this parameter is not set correctly, the precheck fails and the task cannot start. See Parameter settings.

Set the binary log retention period based on your synchronization type:

Synchronization type	Minimum retention period
Incremental sync only	24 hours
Full + incremental sync	7 days (can reduce to more than 24 hours after full sync completes)

Warning

If binary logs are purged before DTS processes them, the following problems can occur:

Task failure: DTS may fail to obtain the binary logs and the task may fail.
Data loss or inconsistency: In exceptional circumstances, data inconsistency or loss may occur. DTS cannot guarantee the reliability and performance defined in the Service Level Agreement (SLA). Set the retention period to meet the requirements above before starting the task.

Other limitations

Avoid using pt-online-schema-change for DDL operations on synchronized objects — it may cause the task to fail.
Write to the destination Kafka instance only through DTS during synchronization. Writing through other tools may cause data inconsistency. Using DMS for online DDL operations when other tools are also writing to the destination may result in data loss.
If a table is renamed and the new name is not in the objects to synchronize, DTS stops synchronizing that table. To resume, add the object to the synchronization task and reselect the objects to synchronize.
Initial full data synchronization uses concurrent INSERT operations, which causes table fragmentation in the destination. After full synchronization, the destination tablespace size is larger than the source.
Run the task during off-peak hours when possible. Full data synchronization increases read and write load on both the source and destination databases.

DTS periodically updates the dts_health_check.ha_health_check table in the source database to advance the binary log position.

Billing

Synchronization type	Fee
Schema synchronization and full data synchronization	Free
Incremental data synchronization	Charged — see Billing overview

Single-record size limit

Kafka rejects records larger than 10 MB. If a source row exceeds this limit, DTS cannot write the record and the task is interrupted. To avoid this, exclude large-field columns when configuring the task. If a table with large fields is already included, remove it from the objects list, re-add it, and set a filter condition to exclude the oversized columns.

SQL operations that can be synchronized

DML only: INSERT, UPDATE, and DELETE.

Route DDL information to a separate Kafka topic using the Topic That Stores DDL Information parameter.

Configure the synchronization task

Step 1: Go to the Data Synchronization Tasks page

Log on to the Data Management (DMS) console.
In the top navigation bar, click DTS.
In the left-side navigation pane, choose DTS (DTS) > Data Synchronization.

Navigation options vary by console mode. See Simple mode and Customize the layout and style of the DMS console. Alternatively, go directly to the Data Synchronization Tasks page.

Step 2: Select the region

On the right side of Data Synchronization Tasks, select the region where your synchronization instance resides.

In the new DTS console, select the region in the top navigation bar instead.

Step 3: Configure source and destination databases

Click Create Task. On the page that appears, configure the following parameters.

General

Parameter	Description
Task Name	A name for the task. DTS assigns a default name. Use a descriptive name to make the task easy to identify — it does not need to be unique.

Source Database

Before you configure the source database, make sure the database account has the required permissions: SELECT, REPLICATION CLIENT, and REPLICATION SLAVE on the objects to be synchronized. For details on granting permissions, see Data synchronization tools for PolarDB-X.

Parameter	Value / description
Select an existing DMS database instance	(Optional) Select an existing instance to auto-populate the parameters below.
Database Type	Select PolarDB-X 2.0.
Connection Type	Select Alibaba Cloud Instance.
Instance Region	The region where the source PolarDB-X instance resides.
Instance ID	The ID of the source PolarDB-X instance.
Database Account	The account with SELECT, REPLICATION CLIENT, and REPLICATION SLAVE permissions.
Database Password	The password for the database account.

Destination Database

Parameter	Value / description
Select an existing DMS database instance	(Optional) Select an existing instance to auto-populate the parameters below.
Database Type	Select Kafka.
Connection Type	Select Express Connect, VPN Gateway, or Smart Access Gateway. DTS does not list Message Queue for Apache Kafka as a direct access method — configure it as a self-managed Kafka cluster.
Instance Region	The region where the destination Kafka instance resides.
Connected VPC	The virtual private cloud (VPC) ID of the Kafka instance. To find the VPC ID, go to the Kafka console, open the instance details page, and check the Configuration Information section.
IP Address or Domain Name	An IP address from the Default Endpoint field of the Kafka instance. Find this on the instance details page under Basic Information.
Port Number	The Kafka service port. Default: `9092`.
Database Account	The Kafka account. Leave blank if the instance connects through a VPC — authentication is not required for VPC-connected instances.
Database Password	The password for the Kafka account. Leave blank for VPC-connected instances.
Kafka Version	The version of the destination Kafka instance.
Encryption	Select Non-encrypted or SCRAM-SHA-256 based on your security requirements.
Topic	The topic that receives the synchronized data.
Topic That Stores DDL Information	(Optional) A separate topic for DDL information. If left blank, DDL information is written to the topic specified in Topic.
Use Kafka Schema Registry	Whether to use Kafka Schema Registry for Avro schema management. Select Yes and enter the Schema Registry URL if needed; otherwise select No.

Step 4: Test connectivity

Click Test Connectivity and Proceed.

DTS automatically adds its CIDR blocks to the whitelist of Alibaba Cloud database instances and the security group rules of Elastic Compute Service (ECS)-hosted databases. For on-premises or third-party databases, add the DTS CIDR blocks manually. See Add the CIDR blocks of DTS servers to the security settings of on-premises databases.

Warning

Adding DTS CIDR blocks to whitelists or security groups introduces security exposure. Before proceeding, take precautions such as using strong credentials, limiting exposed ports, auditing API calls, reviewing whitelist rules regularly, and preferring private network connections (Express Connect, VPN Gateway, or Smart Access Gateway) over public internet access.

Step 5: Configure synchronization objects and settings

Parameter	Description
Synchronization Type	Incremental Data Synchronization is selected by default. Also select Schema Synchronization and Full Data Synchronization to synchronize historical data as a baseline for incremental sync.
Processing Mode of Conflicting Tables	Precheck and Report Errors (default): the precheck fails if the source and destination have tables with identical names. Use object name mapping to resolve name conflicts without deleting destination tables. Ignore Errors and Proceed: skips the name conflict check. During full sync, existing destination records are kept; during incremental sync, they are overwritten. Use with caution — data inconsistency may result.
Data Format in Kafka	The format in which DTS writes records to the Kafka topic. Choose based on how your downstream consumers read data: DTS Avro (recommended for schema-enforced pipelines): records are serialized using the DTS Avro schema definition. See the schema on GitHub. Canal Json: records are stored in Canal JSON format. See Data formats of a Kafka cluster for the full field reference.
Policy for Shipping Data to Kafka Partitions	Controls which Kafka partition each record is routed to. See Specify the policy for migrating data to Kafka partitions. Important This feature is not supported if the source database is a PolarDB-X 1.0 database.
Capitalization of Object Names in Destination Instance	Determines the case of database, table, and column names in the destination. DTS default policy is selected by default. See Specify the capitalization of object names in the destination instance.
Source Objects	Select columns, tables, or databases from Source Objects and click the arrow icon to move them to Selected Objects.
Selected Objects	To rename a single object, right-click it. See Map the name of a single object. To rename multiple objects at once, click Batch Edit. See Map multiple object names at a time. To select which SQL operations to synchronize for a specific object, right-click it and choose the operations. To filter rows, right-click the object and specify a WHERE clause. See Set filter conditions.

Step 6: Configure advanced settings

Click Next: Advanced Settings.

Parameter	Description
Set Alerts	Configure alerts to be notified when the task fails or synchronization latency exceeds a threshold. Select Yes to configure the threshold and alert contacts. See Configure monitoring and alerting for a new DTS task.
Specify the retry time range for failed connections	How long DTS retries a failed connection before marking the task as failed. Range: 10–1,440 minutes. Default: 720 minutes. Set this to at least 30 minutes. If multiple tasks share the same source or destination database, the shortest retry time among them applies. DTS instance charges accrue during retry periods.
Configure ETL	Whether to apply extract, transform, and load (ETL) transformations. Select Yes to enter processing statements. See Configure ETL in a data migration or data synchronization task.
Whether to delete SQL operations on heartbeat tables of forward and reverse tasks	Controls whether DTS writes heartbeat operations to the source database. Yes: heartbeat writes are suppressed; synchronization latency readings may be inaccurate. No: heartbeat writes occur, which may affect physical backup or cloning of the source database.

Step 7: Run the precheck

Click Next: Save Task Settings and Precheck.

To preview the API parameters used to create this task, hover over the button and click Preview OpenAPI parameters.

DTS runs a precheck before starting the task. If an item fails:

Click View Details next to the failed item, fix the issue, then click Precheck Again.
If an item shows an alert that can be ignored, click Confirm Alert Details, then click Ignore in the dialog box. Ignoring alerts may lead to data inconsistency — proceed with caution.

Step 8: Purchase the instance

Wait until the precheck reaches 100%, then click Next: Purchase Instance.

Parameter	Description
Billing Method	Subscription: pay upfront. More cost-effective for long-term use. Pay-as-you-go: billed hourly. Release the instance when no longer needed to stop charges.
Resource Group	The resource group for the instance. Defaults to default resource group. See What is Resource Management?.
Instance Class	The synchronization specification, which determines throughput. See Specifications of data synchronization instances.
Subscription Duration	Available when Subscription is selected. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years.

Read and accept the Data Transmission Service (Pay-as-you-go) Service Terms, then click Buy and Start.

The task appears in the task list. Monitor its progress there.

Data Transmission Service:Synchronize data from a PolarDB-X 2.0 instance to an Alibaba Cloud Message Queue for Kafka instance