Use Data Transmission Service (DTS) to stream change data from a PolarDB-X 1.0 instance into an ApsaraMQ for Kafka topic in real time. DTS runs the sync as distributed subtasks — one per attached ApsaraDB RDS for MySQL instance — so you can fan out CDC events to Kafka consumers without modifying your application.
Prerequisites
Before you begin, make sure you have:
-
A PolarDB-X 1.0 instance. See Create a PolarDB-X 1.0 instance.
-
A topic in the destination ApsaraMQ for Kafka instance to receive the synchronized data. See Getting started overview.
-
Enough free storage space in the ApsaraMQ for Kafka instance to hold the full dataset from PolarDB-X 1.0.
-
Read permissions on the objects to be synchronized, granted to the database account. See Manage accounts.
Limitations
Source database requirements
-
Tables must have PRIMARY KEY or UNIQUE constraints and all fields must be unique. Otherwise, the destination database may contain duplicate data records. Tables with only UNIQUE constraints do not support schema synchronization. We recommend that you synchronize tables that have PRIMARY KEY constraints. Tables with secondary indexes cannot be synchronized.
-
If you select tables (not an entire database) as the sync objects and need to rename tables or columns in the destination, a single task supports up to 5,000 tables. For more tables, split the work across multiple tasks or sync the entire database instead.
-
The ApsaraDB RDS for MySQL instances attached to PolarDB-X 1.0 must meet the following binary log requirements:PolarDB-X 1.0
-
Binary logging must be enabled and
binlog_row_imagemust be set tofull. If not, the precheck fails and the task cannot start. To verify these settings, run the following SQL on the source database:sql -- Check binary logging status SHOW VARIABLES LIKE 'log_bin'; -- Check binlog_row_image value (must be FULL) SHOW VARIABLES LIKE 'binlog_row_image';If either setting is incorrect, update the MySQL configuration file (
my.cnf) on each attached RDS instance:log_bin = ON binlog_format = ROW binlog_row_image = FULL -
For incremental-only synchronization: binary logs must be retained for at least 24 hours.
-
For full plus incremental synchronization: binary logs must be retained for at least seven days. If DTS cannot read the binary logs, the task fails and data loss may occur. After full data synchronization is complete, you can set the retention period to more than 24 hours. Make sure that you set the retention period of binary logs in accordance with the preceding requirements. Otherwise, the service reliability and performance stated in the Service Level Agreement (SLA) of DTS cannot be achieved.
-
Operational restrictions
-
Do not perform the following operations during synchronization, as they cause task failure or data inconsistency:
-
Upgrade or downgrade the attached ApsaraDB RDS for MySQL instances
-
Change the distribution of physical databases and tables that correspond to the logical databases and tables in the ApsaraDB RDS for MySQL instances
-
Change shard keys
-
Perform DDL operations on the objects being synchronized
-
Use gh-ost or pt-online-schema-change for DDL operations
-
-
DTS disables foreign key constraint checks and cascade operations at the session level during synchronization. If you run cascade updates or deletes on the source, data inconsistency may result.
-
If the network type of the PolarDB-X 1.0 instance changes during synchronization, update the network connection settings in the DTS task.PolarDB-X 1.0
-
For full-only synchronization: do not write to the source database during the task. To ensure consistency, run schema synchronization, full data synchronization, and incremental data synchronization together.
-
Write data to the destination only through DTS. Writing through other tools may cause data loss if DMS online DDL operations are performed.
-
PolarDB-X 1.0 synchronization runs as distributed synchronization. Each attached ApsaraDB RDS for MySQL instance maps to one DTS subtask. Monitor subtask status in the task topology.
-
If the destination ApsaraMQ for Kafka instance is upgraded or downgraded, restart it to resume synchronization.
-
Run synchronization during off-peak hours when possible. Initial full data synchronization increases read and write load on both source and destination.
-
After initial full data synchronization completes, the destination tablespace may be larger than the source due to fragmentation from concurrent INSERT operations.
Supported synchronization topologies
-
One-way one-to-one
-
One-way one-to-many
-
One-way cascade
-
One-way many-to-one
For details, see Synchronization topologies.
SQL operations that can be synchronized
| Operation type | SQL statements |
|---|---|
| DML | INSERT, UPDATE, DELETE |
Create a synchronization task
-
Go to the Data Synchronization page in the DTS console.
NoteAlternatively, log on to the Data Management (DMS) console, move the pointer over Data + AI in the top navigation bar, and choose DTS (DTS) > Data Synchronization.
-
In the upper-left corner, select the region where the synchronization instance will reside.
-
Click Create Task and configure the source and destination databases.
Task settings
Parameter Description Task Name Enter a descriptive name. DTS generates a name automatically; unique names are not required. Source Database
Parameter Description Select a DMS database instance Optional. Select an existing instance and DTS populates the remaining fields automatically. Database Type Select PolarDB-X 1.0. Connection Type Select Alibaba Cloud Instance. Instance Region The region of the source PolarDB-X 1.0 instance. Cross-account Select No for same-account synchronization. Instance ID The ID of the source PolarDB-X 1.0 instance. Database Account The database account with read permissions on the objects to synchronize. Database Password The password for the database account. Destination Database
Parameter Description Select a DMS database instance Optional. Select an existing instance and DTS populates the remaining fields automatically. Database Type Select Kafka. Connection Type Select Express Connect, VPN Gateway, or Smart Access Gateway. Alibaba Cloud Instance is not supported. Instance Region The region of the destination ApsaraMQ for Kafka instance. Connected VPC The virtual private cloud (VPC) ID of the ApsaraMQ for Kafka instance. To find the VPC ID: go to the instance details page in the ApsaraMQ for Kafka console, and check Configuration Information on the Instance Information tab. Domain Name or IP An IP address of the ApsaraMQ for Kafka instance. To find the IP: go to the instance details page in the ApsaraMQ for Kafka console, and copy an IP from the Default Endpoint field under Endpoint Information on the Instance Information tab. Port Number The service port of the ApsaraMQ for Kafka instance. Default: 9092.Database Account The account for the ApsaraMQ for Kafka instance. Required only if access control list (ACL) authentication is enabled. See Grant permissions to SASL users. Database Password The password for the Kafka account. Required only if ACL authentication is enabled. Kafka Version The version of the destination ApsaraMQ for Kafka instance. Encryption Select Non-encrypted or SCRAM-SHA-256 based on your business and security requirements. Topic The topic that receives the synchronized data. Select from the drop-down list. Topic That Stores DDL Information The topic that stores DDL information. If left blank, DDL information is stored in the topic specified by Topic. Use Kafka Schema Registry Whether to use Kafka Schema Registry for Avro schema storage. Select Yes and provide the Schema Registry URL to enable it, or select No to disable it. -
Click Test Connectivity and Proceed. DTS automatically adds its server CIDR blocks to the whitelist of Alibaba Cloud database instances or to the security group rules of Elastic Compute Service (ECS) instances. For self-managed databases in data centers or hosted by third-party providers, manually add DTS server CIDR blocks to the database whitelist. See Add the CIDR blocks of DTS servers.
WarningAdding DTS server CIDR blocks to whitelists or security groups introduces security exposure. Take preventive measures: strengthen credentials, restrict exposed ports, authenticate API calls, audit whitelist rules regularly, and consider connecting through Express Connect, VPN Gateway, or Smart Access Gateway.
-
Select the objects to synchronize and configure synchronization settings.
Parameter Description Synchronization Types By default, Incremental Data Synchronization is selected. You must also select Schema Synchronization and Full Data Synchronization. Running all three ensures historical data is loaded first, forming a consistent baseline for incremental synchronization. Processing Mode of Conflicting Tables Precheck and Report Errors (default): fails the precheck if the destination contains tables with the same names. Ignore Errors and Proceed: skips the check, but may cause data inconsistency — use with caution. Data Format in Kafka DTS Avro (default and only option for PolarDB-X 1.0): data is parsed using the DTS Avro schema definition. Canal JSON is not supported for PolarDB-X 1.0. For the DTS Avro schema, see GitHub. Policy for Shipping Data to Kafka Partitions Not supported. Capitalization of Object Names in Destination Instance Controls the case of database, table, and column names in the destination. Default: DTS default policy. See Specify the capitalization of object names. Source Objects Select objects from Source Objects and click the arrow icon to move them to Selected Objects. Select individual tables rather than entire databases. If you select an entire database, DTS does not synchronize CREATE TABLE or DROP TABLE operations. Selected Objects To rename a single object, right-click it and select a mapping option. To rename multiple objects at once, click Batch Edit. To filter rows by condition, right-click an object and specify a WHERE clause. See Map object names and Specify filter conditions. -
Click Next: Advanced Settings and configure the following options.
Parameter Description Migrate a DTS instance from a dedicated cluster to a shared cluster Leave blank unless you need a DTS dedicated cluster. See What is a DTS dedicated cluster? Monitoring and Alerting Select Yes to receive notifications when the task fails or synchronization latency exceeds a threshold. Configure the alert threshold and contacts. See Configure monitoring and alerting. Retry Time for Failed Connections The window within which DTS retries failed connections. Range: 10–1440 minutes. Default: 720 minutes. Set to at least 30 minutes. If multiple tasks share the same source or destination database, the shortest retry window takes precedence. When DTS retries a connection, you are charged for the DTS instance. We recommend that you specify the retry time range based on your business requirements. You can also release the DTS instance at your earliest opportunity after the source and destination instances are released. The wait time before a retry when other issues occur in the source and destination databases The window within which DTS retries failed DML or DDL operations. Range: 1–1440 minutes. Default: 10 minutes. Set to at least 10 minutes, and keep this value lower than the Retry Time for Failed Connections setting. Configure ETL Select Yes to apply extract, transform, and load (ETL) transformations and enter processing statements in the code editor. See Configure ETL. -
Save the task settings and run a precheck.
-
To preview the API parameters for this task configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
-
Click Next: Save Task Settings and Precheck.
NoteThe task starts only after passing the precheck. If any item fails, click View Details to review the cause, fix the issue, and rerun the precheck. For alert items that can be safely ignored, click Confirm Alert Details, then Ignore, then Precheck Again.
-
-
Wait until the Success Rate reaches 100%, then click Next: Purchase Instance.
-
On the Buy page, configure the billing and instance settings.
Parameter Description Billing Method Subscription: pay upfront for a fixed term; more cost-effective for long-term use. Pay-as-you-go: billed hourly; suitable for short-term use. Release the instance when no longer needed to stop billing. Resource Group Settings The resource group for the synchronization instance. Default: default resource group. See What is Resource Management? Instance Class The synchronization throughput tier. See Instance classes of data synchronization instances. Subscription Duration Available only for the Subscription billing method. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years. -
Read and accept Data Transmission Service (Pay-as-you-go) Service Terms.
-
Click Buy and Start, then click OK in the dialog box.
The task appears in the task list. You can monitor the synchronization progress from there.