When your downstream applications need a real-time feed of database changes — for event-driven processing, search indexing, or analytics — streaming raw change data from PolarDB-X 2.0 to Apache Kafka decouples producers from consumers without modifying your application. Data Transmission Service (DTS) captures INSERT, UPDATE, and DELETE operations from the source binlog and delivers them to your Kafka topic so downstream consumers always have an up-to-date view of your data.
Prerequisites
Before you begin, ensure that you have:
-
A PolarDB-X 2.0 instance compatible with MySQL 5.7
-
A Message Queue for Apache Kafka instance whose version is supported by DTS — see Overview of data synchronization scenarios
-
Available storage space in the Kafka instance that exceeds the total data size of the source PolarDB-X instance
-
A Kafka topic created to receive the synchronized data — see Step 1: Create a topic
Limitations
Source database requirements
-
Tables must have PRIMARY KEY or UNIQUE constraints, and all fields must be unique. Without this, the destination may contain duplicate records.
-
If you select tables as objects to synchronize and need to rename tables or columns in the destination, a single task supports at most 5,000 tables. For more than 5,000 tables, split the work across multiple tasks, or synchronize the entire database instead.
-
Only tables can be selected as objects to synchronize. Views, triggers, and stored procedures are not synchronized.
-
DTS does not synchronize foreign keys. Cascade and delete operations on the source are not replicated to the destination.
Binary log requirements
Enable binary logging in the PolarDB-X 2.0 console and set binlog_row_image to full. If this parameter is not set correctly, the precheck fails and the task cannot start. See Parameter settings.
Set the binary log retention period based on your synchronization type:
| Synchronization type | Minimum retention period |
|---|---|
| Incremental sync only | 24 hours |
| Full + incremental sync | 7 days (can reduce to more than 24 hours after full sync completes) |
If binary logs are purged before DTS processes them, the following problems can occur:
-
Task failure: DTS may fail to obtain the binary logs and the task may fail.
-
Data loss or inconsistency: In exceptional circumstances, data inconsistency or loss may occur. DTS cannot guarantee the reliability and performance defined in the Service Level Agreement (SLA). Set the retention period to meet the requirements above before starting the task.
Other limitations
-
Avoid using pt-online-schema-change for DDL operations on synchronized objects — it may cause the task to fail.
-
Write to the destination Kafka instance only through DTS during synchronization. Writing through other tools may cause data inconsistency. Using DMS for online DDL operations when other tools are also writing to the destination may result in data loss.
-
If a table is renamed and the new name is not in the objects to synchronize, DTS stops synchronizing that table. To resume, add the object to the synchronization task and reselect the objects to synchronize.
-
Initial full data synchronization uses concurrent INSERT operations, which causes table fragmentation in the destination. After full synchronization, the destination tablespace size is larger than the source.
-
Run the task during off-peak hours when possible. Full data synchronization increases read and write load on both the source and destination databases.
DTS periodically updates thedts_health_check.ha_health_checktable in the source database to advance the binary log position.
Billing
| Synchronization type | Fee |
|---|---|
| Schema synchronization and full data synchronization | Free |
| Incremental data synchronization | Charged — see Billing overview |
Single-record size limit
Kafka rejects records larger than 10 MB. If a source row exceeds this limit, DTS cannot write the record and the task is interrupted. To avoid this, exclude large-field columns when configuring the task. If a table with large fields is already included, remove it from the objects list, re-add it, and set a filter condition to exclude the oversized columns.
SQL operations that can be synchronized
DML only: INSERT, UPDATE, and DELETE.
Route DDL information to a separate Kafka topic using the Topic That Stores DDL Information parameter.
Configure the synchronization task
Step 1: Go to the Data Synchronization Tasks page
-
Log on to the Data Management (DMS) console.
-
In the top navigation bar, click DTS.
-
In the left-side navigation pane, choose DTS (DTS) > Data Synchronization.
Navigation options vary by console mode. See Simple mode and Customize the layout and style of the DMS console. Alternatively, go directly to the Data Synchronization Tasks page.
Step 2: Select the region
On the right side of Data Synchronization Tasks, select the region where your synchronization instance resides.
In the new DTS console, select the region in the top navigation bar instead.
Step 3: Configure source and destination databases
Click Create Task. On the page that appears, configure the following parameters.
General
| Parameter | Description |
|---|---|
| Task Name | A name for the task. DTS assigns a default name. Use a descriptive name to make the task easy to identify — it does not need to be unique. |
Source Database
Before you configure the source database, make sure the database account has the required permissions: SELECT, REPLICATION CLIENT, and REPLICATION SLAVE on the objects to be synchronized. For details on granting permissions, see Data synchronization tools for PolarDB-X.
| Parameter | Value / description |
|---|---|
| Select an existing DMS database instance | (Optional) Select an existing instance to auto-populate the parameters below. |
| Database Type | Select PolarDB-X 2.0. |
| Connection Type | Select Alibaba Cloud Instance. |
| Instance Region | The region where the source PolarDB-X instance resides. |
| Instance ID | The ID of the source PolarDB-X instance. |
| Database Account | The account with SELECT, REPLICATION CLIENT, and REPLICATION SLAVE permissions. |
| Database Password | The password for the database account. |
Destination Database
| Parameter | Value / description |
|---|---|
| Select an existing DMS database instance | (Optional) Select an existing instance to auto-populate the parameters below. |
| Database Type | Select Kafka. |
| Connection Type | Select Express Connect, VPN Gateway, or Smart Access Gateway. DTS does not list Message Queue for Apache Kafka as a direct access method — configure it as a self-managed Kafka cluster. |
| Instance Region | The region where the destination Kafka instance resides. |
| Connected VPC | The virtual private cloud (VPC) ID of the Kafka instance. To find the VPC ID, go to the Kafka console, open the instance details page, and check the Configuration Information section. |
| IP Address or Domain Name | An IP address from the Default Endpoint field of the Kafka instance. Find this on the instance details page under Basic Information. |
| Port Number | The Kafka service port. Default: 9092. |
| Database Account | The Kafka account. Leave blank if the instance connects through a VPC — authentication is not required for VPC-connected instances. |
| Database Password | The password for the Kafka account. Leave blank for VPC-connected instances. |
| Kafka Version | The version of the destination Kafka instance. |
| Encryption | Select Non-encrypted or SCRAM-SHA-256 based on your security requirements. |
| Topic | The topic that receives the synchronized data. |
| Topic That Stores DDL Information | (Optional) A separate topic for DDL information. If left blank, DDL information is written to the topic specified in Topic. |
| Use Kafka Schema Registry | Whether to use Kafka Schema Registry for Avro schema management. Select Yes and enter the Schema Registry URL if needed; otherwise select No. |
Step 4: Test connectivity
Click Test Connectivity and Proceed.
DTS automatically adds its CIDR blocks to the whitelist of Alibaba Cloud database instances and the security group rules of Elastic Compute Service (ECS)-hosted databases. For on-premises or third-party databases, add the DTS CIDR blocks manually. See Add the CIDR blocks of DTS servers to the security settings of on-premises databases.
Adding DTS CIDR blocks to whitelists or security groups introduces security exposure. Before proceeding, take precautions such as using strong credentials, limiting exposed ports, auditing API calls, reviewing whitelist rules regularly, and preferring private network connections (Express Connect, VPN Gateway, or Smart Access Gateway) over public internet access.
Step 5: Configure synchronization objects and settings
| Parameter | Description |
|---|---|
| Synchronization Type | Incremental Data Synchronization is selected by default. Also select Schema Synchronization and Full Data Synchronization to synchronize historical data as a baseline for incremental sync. |
| Processing Mode of Conflicting Tables | Precheck and Report Errors (default): the precheck fails if the source and destination have tables with identical names. Use object name mapping to resolve name conflicts without deleting destination tables. Ignore Errors and Proceed: skips the name conflict check. During full sync, existing destination records are kept; during incremental sync, they are overwritten. Use with caution — data inconsistency may result. |
| Data Format in Kafka | The format in which DTS writes records to the Kafka topic. Choose based on how your downstream consumers read data: DTS Avro (recommended for schema-enforced pipelines): records are serialized using the DTS Avro schema definition. See the schema on GitHub. Canal Json: records are stored in Canal JSON format. See Data formats of a Kafka cluster for the full field reference. |
| Policy for Shipping Data to Kafka Partitions | Controls which Kafka partition each record is routed to. See Specify the policy for migrating data to Kafka partitions. Important
This feature is not supported if the source database is a PolarDB-X 1.0 database. |
| Capitalization of Object Names in Destination Instance | Determines the case of database, table, and column names in the destination. DTS default policy is selected by default. See Specify the capitalization of object names in the destination instance. |
| Source Objects | Select columns, tables, or databases from Source Objects and click the arrow icon to move them to Selected Objects. |
| Selected Objects | To rename a single object, right-click it. See Map the name of a single object. To rename multiple objects at once, click Batch Edit. See Map multiple object names at a time. To select which SQL operations to synchronize for a specific object, right-click it and choose the operations. To filter rows, right-click the object and specify a WHERE clause. See Set filter conditions. |
Step 6: Configure advanced settings
Click Next: Advanced Settings.
| Parameter | Description |
|---|---|
| Set Alerts | Configure alerts to be notified when the task fails or synchronization latency exceeds a threshold. Select Yes to configure the threshold and alert contacts. See Configure monitoring and alerting for a new DTS task. |
| Specify the retry time range for failed connections | How long DTS retries a failed connection before marking the task as failed. Range: 10–1,440 minutes. Default: 720 minutes. Set this to at least 30 minutes. If multiple tasks share the same source or destination database, the shortest retry time among them applies. DTS instance charges accrue during retry periods. |
| Configure ETL | Whether to apply extract, transform, and load (ETL) transformations. Select Yes to enter processing statements. See Configure ETL in a data migration or data synchronization task. |
| Whether to delete SQL operations on heartbeat tables of forward and reverse tasks | Controls whether DTS writes heartbeat operations to the source database. Yes: heartbeat writes are suppressed; synchronization latency readings may be inaccurate. No: heartbeat writes occur, which may affect physical backup or cloning of the source database. |
Step 7: Run the precheck
Click Next: Save Task Settings and Precheck.
To preview the API parameters used to create this task, hover over the button and click Preview OpenAPI parameters.
DTS runs a precheck before starting the task. If an item fails:
-
Click View Details next to the failed item, fix the issue, then click Precheck Again.
-
If an item shows an alert that can be ignored, click Confirm Alert Details, then click Ignore in the dialog box. Ignoring alerts may lead to data inconsistency — proceed with caution.
Step 8: Purchase the instance
Wait until the precheck reaches 100%, then click Next: Purchase Instance.
| Parameter | Description |
|---|---|
| Billing Method | Subscription: pay upfront. More cost-effective for long-term use. Pay-as-you-go: billed hourly. Release the instance when no longer needed to stop charges. |
| Resource Group | The resource group for the instance. Defaults to default resource group. See What is Resource Management?. |
| Instance Class | The synchronization specification, which determines throughput. See Specifications of data synchronization instances. |
| Subscription Duration | Available when Subscription is selected. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years. |
Read and accept the Data Transmission Service (Pay-as-you-go) Service Terms, then click Buy and Start.
The task appears in the task list. Monitor its progress there.