All Products
Search
Document Center

Data Transmission Service:Synchronize data from a PolarDB for MySQL cluster to a self-managed Doris database

Last Updated:Mar 28, 2026

Data Transmission Service (DTS) lets you continuously sync data from a PolarDB for MySQL cluster to a self-managed Doris database for large-scale analytics. This topic walks through the setup using a Doris database deployed on an Elastic Compute Service (ECS) instance.

Prerequisites

Before you begin, make sure you have:

  • A destination Doris database with available storage space larger than the storage used by the source PolarDB for MySQL cluster

  • Binary logging enabled on the source PolarDB for MySQL cluster, with the loose_polar_log_bin parameter set to ON. See Enable binary logging and Modify parameters

  • Binary log retention period of at least three days (seven days recommended)

  • Database accounts with the required permissions (see Permissions required)

Enabling binary logging on a PolarDB for MySQL cluster incurs storage charges for binary log files.

For supported source and destination database versions, see Overview of data synchronization solutions.

Permissions required

DatabaseRequired permissionsReference
Source PolarDB for MySQL clusterRead and write permissions on the objects to be synchronizedCreate and manage a database account
Destination Doris databaseUSAGE_PRIV and the following: SELECT_PRIV, LOAD_PRIV, ALTER_PRIV, CREATE_PRIV, DROP_PRIVAuthentication and authorization

Billing

Synchronization typeCost
Schema synchronization and full data synchronizationFree
Incremental data synchronizationCharged. See Billing overview

Supported SQL operations for incremental synchronization

Operation typeStatements
DML (Data Manipulation Language)INSERT, UPDATE, DELETE
DDL (Data Definition Language)ADD COLUMN, MODIFY COLUMN, CHANGE COLUMN, DROP COLUMN, DROP TABLE, TRUNCATE TABLE, RENAME TABLE
Important

The RENAME TABLE operation can cause data inconsistency. If you rename a table that is selected as a synchronization object, its data stops syncing to the destination. To avoid this, select the entire database as the synchronization object instead of individual tables, and make sure that both the original and renamed table belong to databases included in the synchronization scope.

Limitations

Source database

  • Tables with PRIMARY KEY or UNIQUE constraints: all fields must be unique, or the destination may contain duplicate data.

  • Tables without PRIMARY KEY or UNIQUE constraints: select Schema Synchronization for Synchronization Types and duplicate for Engine in the Configurations for Databases, Tables, and Columns step.

  • If you select tables (not databases) as synchronization objects and need to rename tables or columns in the destination, a single task supports up to 1,000 tables. Exceeding this limit causes a request error. Configure multiple tasks or sync the entire database instead.

  • Binary log requirements:

    • loose_polar_log_bin must be set to ON

    • Retention period: at least three days, seven days recommended

  • Do not run DDL operations that change database or table schemas during initial schema synchronization or initial full data synchronization. DTS queries the source database during full sync, which creates metadata locks that may block DDL operations.

  • Changes from operations not recorded in binary logs—such as data restored from physical backups or generated by cascade operations—are not synced to the destination. If this occurs, you can remove and re-add the affected database or table from the synchronization objects if your business permits. See Modify synchronization objects.

General limitations

  • DTS does not sync read-only nodes of the source PolarDB for MySQL cluster.

  • DTS does not sync Object Storage Service (OSS) external tables from the source.

  • Primary/standby switchover is not supported during initial full data synchronization. If a switchover occurs, reconfigure the synchronization task.

  • Only tables using the Unique Key Model or Duplicate Key Model in Doris are supported. When using the Duplicate Key Model, DTS converts UPDATE and DELETE statements to INSERT statements. Duplicate data may appear if any of the following occur: Use the additional columns _is_deleted, _version, and _record_id to deduplicate if needed (see Additional columns added by DTS).

    • A retry operation runs on the data synchronization instance

    • The instance is restarted

    • Two or more DML operations are applied to the same row after the instance starts

  • Only the bucket_count parameter can be specified in the Selected Objects section. The value must be a positive integer. Default: auto.

  • Do not create clusters in the destination Doris database during synchronization. If this happens, the task fails. Restart the data synchronization instance to resume.

  • Doris only supports database and table names that start with a letter. Use the object name mapping feature to rename any object whose name starts with a non-letter character.

  • If a database, table, or column name contains Chinese characters, use object name mapping to rename it (for example, to an English name). Otherwise, the task may fail.

  • You cannot modify DDL operations on multiple columns at once, or modify DDL operations on a table consecutively.

  • Do not add backend (BE) nodes to the destination Doris database during synchronization. If this happens, the task fails. Restart the data synchronization instance to resume.

  • In multi-table merge scenarios (multiple source tables syncing to one destination table), the schemas of all source tables must be identical. Otherwise, data inconsistency or task failure may occur.

  • In PolarDB for MySQL, VARCHAR(M) uses character length for M. In Doris, VARCHAR(N) uses byte length for N. If you are not using schema synchronization, set the Doris VARCHAR length to four times the corresponding MySQL VARCHAR length to avoid data loss.

  • When using DMS or gh-ost to perform online DDL changes on the source, DTS syncs only the original DDL to the destination. Online DDL changes made with pt-online-schema-change are not supported and may cause data loss or synchronization failure.

  • During initial full data synchronization, DTS uses read and write resources of both databases. Evaluate the performance impact beforehand and schedule synchronization during off-peak hours, when CPU usage on both ends is below 30%.

  • Concurrent INSERT operations during initial full data synchronization cause table fragmentation in the destination. After full sync completes, destination tables may occupy more storage than the source.

  • Do not use pt-online-schema-change or similar tools for online DDL operations on synchronization objects during synchronization. This causes sync failure.

  • If data is written to the destination database from sources other than DTS during synchronization, data inconsistency may occur between source and destination.

  • DTS periodically executes CREATE DATABASE IF NOT EXISTS \test\`` on the source to advance the binary log position.

  • During incremental synchronization, DTS uses a batch strategy and writes to each synchronization object at most once every five seconds. This may introduce a normal synchronization latency of less than 10 seconds. To reduce latency, adjust the selectdb.reservoir.timeout.milliseconds parameter (range: 1,000–10,000 milliseconds) in the DTS console. Shorter batching intervals increase write frequency, which may raise the destination's load and response time.

  • If an instance fails, DTS support attempts recovery within eight hours. During recovery, the instance may be restarted or its parameters adjusted. Only DTS instance parameters are modified—database parameters remain unchanged.

Create a data synchronization task

Step 1: Go to the data synchronization page

Use one of the following methods:

DTS console

  1. Log on to the DTS console.DTS console

  2. In the left-side navigation pane, click Data Synchronization.

  3. In the upper-left corner, select the region where the data synchronization task resides.

DMS console

Steps may vary based on the DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console.

  1. Log on to the DMS console.DMS console

  2. In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.

  3. From the drop-down list next to Data Synchronization Tasks, select the region where the instance resides.

Step 2: Configure source and destination databases

  1. Click Create Task.

  2. Configure the Task Name. DTS generates a name automatically. Specify a descriptive name to make the task easy to identify—a unique name is not required.

  3. Configure the source and destination database connections using the parameters below.

Source database (PolarDB for MySQL)

ParameterValue
Select Existing ConnectionSelect a registered instance from the list to auto-populate the following fields, or configure them manually
Database TypePolarDB for MySQL
Access MethodAlibaba Cloud Instance
Instance RegionThe region where the source PolarDB for MySQL cluster resides
Replicate Data Across Alibaba Cloud AccountsNo (this example uses the current account)
PolarDB Cluster IDThe ID of the source PolarDB for MySQL cluster
Database AccountThe account for the source cluster (see Permissions required)
Database PasswordThe password for the database account
EncryptionSelect as needed. For SSL configuration, see Configure SSL encryption

Destination database (Doris)

ParameterValue
Select Existing ConnectionSelect a registered instance from the list to auto-populate the following fields, or configure them manually
Database TypeDoris
Access MethodSelf-managed Database on ECS (this example). For other access methods, see Preparations
Instance RegionThe region where the destination Doris database resides
ECS Instance IDThe ID of the ECS instance running the Doris database. If BE or frontend (FE) nodes are on separate ECS instances, add the CIDR blocks of DTS servers to each instance's security rules
Port NumberThe service port of the destination Doris database. Default: 9030
Database AccountThe account for the destination database (see Permissions required)
Database PasswordThe password for the database account
  1. Click Test Connectivity and Proceed. In the CIDR Blocks of DTS Servers dialog box, click Test Connectivity.

Make sure that the CIDR blocks of DTS servers are added to the security settings of both source and destination databases. See Add DTS server IP addresses to a whitelist.

Step 3: Select synchronization objects

  1. In the Configure Objects step, configure the following:

ParameterDescription
Synchronization TypesSelect Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. Full sync establishes the baseline data before incremental sync begins. If you skip schema synchronization, create the destination tables using the Unique Key Model or Duplicate Key Model in advance. See Data type mappings, Additional columns added by DTS, and Unique Key Model
Processing Mode of Conflicting TablesPrecheck and Report Errors: reports an error if a table with the same name exists in the destination. Ignore Errors and Proceed: skips the check. If table schemas differ, data initialization may fail or only partial columns sync. If schemas are the same, source data overwrites destination records with matching primary or unique keys
Capitalization of Object Names in Destination InstanceDefaults to DTS default policy. Adjust as needed. See Specify the capitalization of object names
Source ObjectsSelect databases or tables to sync, then click the right arrow icon to add them to Selected Objects
Selected ObjectsRight-click an object to rename it, filter data with WHERE conditions, or select specific SQL operations. If schema synchronization is enabled, right-click a table and configure bucket_count in Parameter Settings
  1. Click Next: Advanced Settings and configure the following:

ParameterDescription
Dedicated Cluster for Task SchedulingBy default, DTS uses the shared cluster. For higher stability, purchase a dedicated cluster. See What is a DTS dedicated cluster
Retry Time for Failed ConnectionsHow long DTS retries after a connection failure. Range: 10–1,440 minutes. Default: 720. Set to at least 30. If reconnection succeeds within this window, sync resumes; otherwise, the task fails. The shortest value across tasks sharing the same source or destination applies
Retry Time for Other IssuesHow long DTS retries after DDL or DML failures. Range: 1–1,440 minutes. Default: 10. Set to at least 10. Must be less than Retry Time for Failed Connections
Enable Throttling for Full Data SynchronizationThrottle full sync by setting QPS (queries per second) to the source, RPS (rows per second) of full migration, and data migration speed. Available only when Full Data Synchronization is selected
Enable Throttling for Incremental Data SynchronizationThrottle incremental sync by setting RPS and data sync speed
Whether to delete SQL operations on heartbeat tables of forward and reverse tasksYesalert notification settings: does not write heartbeat SQL to the source (latency may be displayed). No: writes heartbeat SQL (may affect physical backup and cloning of the source)
Environment TagOptional. Tag the instance for environment identification
Configure ETLEnable extract, transform, and load (ETL) to apply data transformations. See What is ETL? and Configure ETL
Monitoring and AlertingConfigure alerts for task failures or latency exceeding a threshold. See Configure monitoring and alerting
  1. (Optional) Click Next: Configure Database and Table Fields. Specify Primary Key Column, Distribution Key, and Engine for each table.

This step is available only when Schema Synchronization is selected. Set Definition Status to All to view all tables.
Primary Key Column supports multiple columns. Distribution Key must be a subset of the primary key columns.
For tables without primary keys or UNIQUE constraints, select duplicate for Engine. Otherwise, the task may fail or data loss may occur.

Step 4: Run the precheck

Click Next: Save Task Settings and Precheck.

DTS runs a precheck before starting the task. The task starts only after passing all checks.
If any check fails, click View Details to see the cause, fix the issue, and rerun the precheck.
If a warning appears: fix and rerun if the item cannot be ignored; click Confirm Alert Details > Ignore > OK > Precheck Again if the item can be safely ignored. Ignoring alerts may cause data inconsistency.

To preview the OpenAPI parameters for this task configuration, move the pointer over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.

Step 5: Purchase an instance

  1. Wait until Success Rate reaches 100%, then click Next: Purchase Instance.

  2. On the purchase page, configure the following:

ParameterDescription
Billing MethodSubscription: pay upfront for a fixed term, more cost-effective for long-term use. Pay-as-you-go: billed hourly, suitable for short-term use. Release the instance when no longer needed to stop charges
Resource Group SettingsThe resource group for the instance. Default: default resource group. See What is Resource Management?
Instance ClassSelect based on your throughput requirements. See Instance classes of data synchronization instances
Subscription DurationAvailable for subscription billing. Options: 1–9 months, 1, 2, 3, or 5 years
  1. Read and select Data Transmission Service (Pay-as-you-go) Service Terms.

  2. Click Buy and Start, then click OK in the dialog box.

The task appears in the task list. Monitor its progress from there.

Data type mappings

The following table shows how PolarDB for MySQL data types map to Doris data types.

CategoryPolarDB for MySQLDoris
NumericTINYINTTINYINT
TINYINT UNSIGNEDSMALLINT
SMALLINTSMALLINT
SMALLINT UNSIGNEDINT
MEDIUMINTINT
MEDIUMINT UNSIGNEDBIGINT
INTINT
INT UNSIGNEDBIGINT
BIGINTBIGINT
BIGINT UNSIGNEDLARGEINT
BIT(M)INT
DecimalDecimalDecimal (ZEROFILL not supported)
NumericDecimal
FloatFloat
DoubleDOUBLE
BOOL / BOOLEANBOOLEAN
Date and timeDATEDATEV2
DATETIME[(fsp)]DATETIMEV2
Timestamp[(fsp)]DATETIMEV2
Time[(fsp)]VARCHAR
YEAR[(4)]INT
StringCHAR / VARCHARVARCHAR
BINARY / VARBINARYSTRING
TINYTEXT / TEXT / MEDIUMTEXT / LONGTEXTSTRING
TINYBLOB / BLOB / MEDIUMBLOB / LONGBLOBSTRING
ENUMSTRING
SETSTRING
JSONSTRING

CHAR and VARCHAR conversion notes:

  • CHAR and VARCHAR(n) values are converted to VARCHAR(4*n) to avoid data loss. This accounts for the difference in how PolarDB for MySQL (character length) and Doris (byte length) define VARCHAR size.

  • If no length is specified, the default is VARCHAR(65533).

  • Values exceeding 65,533 characters are converted to the STRING type.

Important

If you are not using schema synchronization, set the Doris VARCHAR length to four times the corresponding MySQL VARCHAR length.

Additional columns added by DTS

When syncing to a table using the Duplicate Key Model, DTS automatically adds the following columns to the destination table:

ColumnData typeDefaultDescription
_is_deletedInt0Deletion flag: 0 for INSERT and UPDATE operations, 1 for DELETE operations
_versionBigint00 during full sync; the corresponding binary log timestamp (in seconds) during incremental sync
_record_idBigint00 during full sync; the unique, incremental record ID from the incremental log during incremental sync

Use these columns to deduplicate data when the destination table uses the Duplicate Key Model.

What's next