All Products
Search
Document Center

Data Transmission Service:Synchronize data from a PolarDB for MySQL cluster to a DataHub project

Last Updated:Mar 28, 2026

Data Transmission Service (DTS) streams incremental data changes from a PolarDB for MySQL cluster to a DataHub project in real time. Once data lands in DataHub, you can feed it into downstream analytics services such as Realtime Compute for Apache Flink.

Prerequisites

Before you begin, make sure you have:

Limitations

DTS does not synchronize foreign keys from the source to the destination. Cascade and delete operations on the source are not replicated to the destination.

Source Database

LimitationDetails
Primary key or UNIQUE constraint requiredTables without PRIMARY KEY or UNIQUE constraints may produce duplicate records in the destination. If your tables lack these constraints, enable the Exactly-Once write feature. See Synchronize tables without primary keys or UNIQUE constraints.
1,000-table limit per task (with renaming)If you select tables as sync objects and need to rename tables or columns in the destination, a single task supports up to 1,000 tables. Exceeding this limit causes a request error. Split the workload across multiple tasks, or sync at the database level instead.
Binary logging and loose_polar_log_bin required for incremental syncBinary logging must be enabled and loose_polar_log_bin must be set to on. Otherwise, the precheck fails and the task cannot start. See Enable binary logging and Modify parameters. Enabling binary logging incurs storage charges.
Binary log retention periodRetain binary logs for at least 24 hours for incremental-only sync, or at least 7 days for full + incremental sync. Insufficient retention may cause task failure and, in exceptional cases, data loss. After full data synchronization completes, you can set the retention period to more than 24 hours.
No DDL during schema or full data syncDo not run DDL statements while schema synchronization or full data synchronization is in progress. Doing so causes task failure.

Other limitations

LimitationDetails
2 MB string limitA single string in the destination DataHub project cannot exceed 2 MB.
Sync object typesOnly tables and databases can be selected as sync objects.
Read-only nodes excludedDTS does not sync read-only nodes of the source PolarDB for MySQL cluster.
OSS external tables excludedDTS does not sync Object Storage Service (OSS) external tables from the source cluster.
Avoid pt-online-schema-changeUsing tools like pt-online-schema-change for DDL during sync causes task failure.
Online DDL during sync (single source only)If no other sources write to the destination during sync, you can use Data Management (DMS) for lock-free DDL on source tables. See Perform lock-free DDL operations.
Data loss risk with concurrent writesIf other sources write to the destination while you run online DDL via DMS, data loss may occur in the destination.
Task restoration SLAIf a task fails, DTS support attempts restoration within 8 hours. The task may be restarted and task parameters may be modified during restoration. Database parameters are not modified.

Special case: DTS periodically executes CREATE DATABASE IF NOT EXISTS `test` on the source database to advance the binary log file position.

Billing

Synchronization typeFee
Schema synchronization and full data synchronizationFree
Incremental data synchronizationCharged. See Billing overview.

Supported synchronization topologies

  • One-way one-to-one synchronization

  • One-way one-to-many synchronization

  • One-way many-to-one synchronization

  • One-way cascade synchronization

For the full topology reference, see Synchronization topologies.

SQL operations that can be synchronized

INSERT, UPDATE, and DELETE.

Permissions required

The database account of the source PolarDB for MySQL cluster needs at least read permissions on the objects to be synchronized.

Create a data synchronization task

The following steps are based on the new DTS console. If there are discrepancies between the DTS console and the DTS module in the Data Management (DMS) console, the DMS console takes precedence.
  1. Go to the Data Synchronization Tasks page.

    1. Log on to the Data Management (DMS) console.

    2. In the top navigation bar, click Data + AI.

    3. In the left-side navigation pane, choose DTS (DTS) > Data Synchronization.

    Steps may vary based on the DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console. Alternatively, go directly to the Data Synchronization Tasks page.
  2. Select the region where the data synchronization instance resides.

    In the new DTS console, select the region from the top navigation bar.
  3. Click Create Task. In the wizard, configure the source and destination databases.

    Source Database

    ParameterDescription
    Select DMS Database InstanceSelect an existing database instance to auto-populate the fields, or leave blank to configure manually.
    Database TypeSelect PolarDB for MySQL.
    Connection TypeSelect Alibaba Cloud Instance.
    Instance RegionThe region where the source PolarDB for MySQL cluster resides.
    Cross-accountSelect No for same-account sync.
    PolarDB Cluster IDThe ID of the source PolarDB for MySQL cluster.
    Database AccountThe database account for the source cluster. See Permissions required.
    Database PasswordThe password for the database account.

    Destination Database

    ParameterDescription
    Select DMS Database InstanceSelect an existing database instance to auto-populate the fields, or leave blank to configure manually.
    Database TypeSelect DataHub.
    Connection TypeSelect Alibaba Cloud Instance.
    Instance RegionThe region where the destination DataHub project resides.
    ProjectThe DataHub project to receive the synchronized data.
  4. Click Test Connectivity and Proceed. DTS automatically adds its server CIDR blocks to the whitelist of Alibaba Cloud database instances (such as ApsaraDB RDS for MySQL or ApsaraDB for MongoDB) and to the security group rules of Elastic Compute Service (ECS)-hosted databases. For databases deployed across multiple ECS instances, manually add the DTS CIDR blocks to each instance's security group rules. For self-managed databases in data centers or on third-party clouds, manually add the CIDR blocks to the database whitelist. See Add the CIDR blocks of DTS servers.

    Warning

    Adding DTS CIDR blocks to whitelists or security groups introduces security exposure. Before proceeding, take protective measures such as: using strong credentials, restricting exposed ports, authenticating API calls, regularly checking the whitelist or ECS security group rules and forbidding unauthorized CIDR blocks, and connecting via Express Connect, VPN Gateway, or Smart Access Gateway where possible.

  5. Configure the objects to be synchronized and the synchronization settings.

    ParameterDescription
    Synchronization TypeIncremental Data Synchronization is selected by default. You can also select Schema Synchronization only. Full Data Synchronization is not available for this destination type. During schema synchronization, DTS copies the schemas of the selected tables from the source to the destination DataHub project.
    Processing Mode of Conflicting TablesPrecheck and Report Errors (default): fails the precheck if identically named tables already exist in the destination. To resolve name conflicts without deleting or renaming destination tables, use the object name mapping feature. See Map object names. Ignore Errors and Proceed: skips the name conflict check.
    Warning

    This option risks data inconsistency. During full data synchronization, existing destination records with matching primary or unique key values are retained (not overwritten). During incremental sync, they are overwritten. Schema mismatches may cause partial sync or task failure.

    Naming Rules of Additional ColumnsSelect Yes or No to control whether DTS uses the new naming rules for additional columns added to the destination topic. Check for naming conflicts before setting this option — conflicts cause task failure or data loss. See Naming rules for additional columns.
    Case Policy for Destination Object NamesControls the case of database, table, and column names in the destination. Default: DTS default policy. See Specify the capitalization of object names.
    Source ObjectsSelect the tables or databases to synchronize and click the arrow icon to move them to Selected Objects.
    Selected ObjectsRight-click an object to rename it (single object). Click Batch Edit to rename multiple objects at once. See Map object names.
  6. Click Next: Advanced Settings.

    ParameterDescription
    Monitoring and AlertingNo: disables alerting. Yes: enables alerting. Configure the alert threshold and notification contacts. See Configure monitoring and alerting.
    Retry Time for Failed ConnectionsHow long DTS retries failed connections after a task starts. Range: 10–1,440 minutes. Default: 720 minutes. Set this to at least 30 minutes. If DTS reconnects within the retry window, the task resumes; otherwise, it fails. When multiple tasks share the same source or destination, the shortest retry window applies to all. DTS charges for the instance during retry — release the instance promptly if the source or destination is decommissioned.
    Configure ETLNo (default) or Yes. If enabled, enter extract, transform, and load (ETL) domain-specific language (DSL) statements in the code editor. See Configure ETL.
  7. (Optional) In the Selected Objects section, right-click a topic name to rename a table or database, or set a shard key for partitioning.

  8. Click Next: Save Task Settings and Precheck. To preview the OpenAPI parameters for this task, hover over the button and click Preview OpenAPI parameters before proceeding.

    DTS runs a precheck before starting the task. The task starts only after passing the precheck. If the precheck fails, click View Details next to each failed item, fix the underlying issue, then click Precheck Again. If an alert item can be ignored, click Confirm Alert Details > Ignore > OK > Precheck Again. Ignoring alerts may cause data inconsistency.
  9. Wait for the Success Rate to reach 100%, then click Next: Purchase Instance.

  10. On the purchase page, configure the instance.

    ParameterDescription
    Billing MethodSubscription: pay upfront for a fixed term — more cost-effective for long-term use. Pay-as-you-go: billed hourly — suitable for short-term use. Release the instance when no longer needed to stop charges.
    Resource Group SettingsThe resource group for the instance. Default: default resource group. See What is Resource Management?
    Instance ClassThe synchronization throughput class. See Instance classes of data synchronization instances.
    Subscription DurationAvailable for subscription billing only. Options: 1–9 months, 1 year, 2 years, 3 years, or 5 years.
  11. Read and select Data Transmission Service (Pay-as-you-go) Service Terms.

  12. Click Buy and Start, then click OK in the confirmation dialog.

The task appears in the task list. You can track its progress there.

DataHub topic schema

When DTS writes incremental data to a DataHub topic, it adds system columns to store change metadata alongside the original data fields.

The following figure shows an example topic schema. In this example, id, name, and address are original data fields. With the previous naming rules, DTS adds a dts_ prefix to all fields including the originals. With the new naming rules, original data fields keep their names without prefixes.

Topic定义

The table below describes each additional column.

Previous column nameNew column nameTypeDescription
dts_record_idnew_dts_sync_dts_record_idStringUnique ID of the incremental log entry. Auto-increments under normal conditions; may not increment after a rollback in disaster recovery scenarios, so some IDs can be duplicated. For UPDATE operations, both log entries (pre-update and post-update) share the same dts_record_id.
dts_operation_flagnew_dts_sync_dts_operation_flagStringOperation type. Values: I = INSERT, D = DELETE, U = UPDATE, F = full data synchronization.
dts_instance_idnew_dts_sync_dts_instance_idStringThe server ID of the database.
dts_db_namenew_dts_sync_dts_db_nameStringThe database name.
dts_table_namenew_dts_sync_dts_table_nameStringThe table name.
dts_utc_timestampnew_dts_sync_dts_utc_timestampStringUTC timestamp of the operation (also the log file timestamp).
dts_before_flagnew_dts_sync_dts_before_flagStringWhether the row values are pre-update values. Y = yes, N = no. INSERT: N. UPDATE pre-update entry: Y. UPDATE post-update entry: N. DELETE: Y.
dts_after_flagnew_dts_sync_dts_after_flagStringWhether the row values are post-update values. Y = yes, N = no. INSERT: Y. UPDATE pre-update entry: N. UPDATE post-update entry: Y. DELETE: N.

Flag values by operation type

The dts_before_flag and dts_after_flag columns encode which version of a row a given log entry represents.

INSERT

An INSERT entry records the newly inserted values (post-update values).

dts_before_flagdts_after_flag
NY
INSERT操作

UPDATE

DTS generates two log entries per UPDATE: one for the pre-update state and one for the post-update state. Both entries share the same dts_record_id, dts_operation_flag, and dts_utc_timestamp values.

Entrydts_before_flagdts_after_flag
Pre-update (entry 1)YN
Post-update (entry 2)NY
UPDATE操作

DELETE

A DELETE entry records the deleted values (pre-update values).

dts_before_flagdts_after_flag
YN
DELETE操作

What's next

After the synchronization task is running, use Realtime Compute for Apache Flink to analyze the data flowing into the DataHub project. See What is Alibaba Cloud Realtime Compute for Apache Flink?