Data Transmission Service (DTS) lets you continuously sync data from a PolarDB for MySQL cluster to a self-managed Doris database for large-scale analytics. This topic walks through the setup using a Doris database deployed on an Elastic Compute Service (ECS) instance.
Prerequisites
Before you begin, make sure you have:
A destination Doris database with available storage space larger than the storage used by the source PolarDB for MySQL cluster
Binary logging enabled on the source PolarDB for MySQL cluster, with the
loose_polar_log_binparameter set toON. See Enable binary logging and Modify parametersBinary log retention period of at least three days (seven days recommended)
Database accounts with the required permissions (see Permissions required)
Enabling binary logging on a PolarDB for MySQL cluster incurs storage charges for binary log files.
For supported source and destination database versions, see Overview of data synchronization solutions.
Permissions required
| Database | Required permissions | Reference |
|---|---|---|
| Source PolarDB for MySQL cluster | Read and write permissions on the objects to be synchronized | Create and manage a database account |
| Destination Doris database | USAGE_PRIV and the following: SELECT_PRIV, LOAD_PRIV, ALTER_PRIV, CREATE_PRIV, DROP_PRIV | Authentication and authorization |
Billing
| Synchronization type | Cost |
|---|---|
| Schema synchronization and full data synchronization | Free |
| Incremental data synchronization | Charged. See Billing overview |
Supported SQL operations for incremental synchronization
| Operation type | Statements |
|---|---|
| DML (Data Manipulation Language) | INSERT, UPDATE, DELETE |
| DDL (Data Definition Language) | ADD COLUMN, MODIFY COLUMN, CHANGE COLUMN, DROP COLUMN, DROP TABLE, TRUNCATE TABLE, RENAME TABLE |
The RENAME TABLE operation can cause data inconsistency. If you rename a table that is selected as a synchronization object, its data stops syncing to the destination. To avoid this, select the entire database as the synchronization object instead of individual tables, and make sure that both the original and renamed table belong to databases included in the synchronization scope.
Limitations
Source database
Tables with PRIMARY KEY or UNIQUE constraints: all fields must be unique, or the destination may contain duplicate data.
Tables without PRIMARY KEY or UNIQUE constraints: select Schema Synchronization for Synchronization Types and duplicate for Engine in the Configurations for Databases, Tables, and Columns step.
If you select tables (not databases) as synchronization objects and need to rename tables or columns in the destination, a single task supports up to 1,000 tables. Exceeding this limit causes a request error. Configure multiple tasks or sync the entire database instead.
Binary log requirements:
loose_polar_log_binmust be set toONRetention period: at least three days, seven days recommended
Do not run DDL operations that change database or table schemas during initial schema synchronization or initial full data synchronization. DTS queries the source database during full sync, which creates metadata locks that may block DDL operations.
Changes from operations not recorded in binary logs—such as data restored from physical backups or generated by cascade operations—are not synced to the destination. If this occurs, you can remove and re-add the affected database or table from the synchronization objects if your business permits. See Modify synchronization objects.
General limitations
DTS does not sync read-only nodes of the source PolarDB for MySQL cluster.
DTS does not sync Object Storage Service (OSS) external tables from the source.
Primary/standby switchover is not supported during initial full data synchronization. If a switchover occurs, reconfigure the synchronization task.
Only tables using the Unique Key Model or Duplicate Key Model in Doris are supported. When using the Duplicate Key Model, DTS converts UPDATE and DELETE statements to INSERT statements. Duplicate data may appear if any of the following occur: Use the additional columns
_is_deleted,_version, and_record_idto deduplicate if needed (see Additional columns added by DTS).A retry operation runs on the data synchronization instance
The instance is restarted
Two or more DML operations are applied to the same row after the instance starts
Only the
bucket_countparameter can be specified in the Selected Objects section. The value must be a positive integer. Default: auto.Do not create clusters in the destination Doris database during synchronization. If this happens, the task fails. Restart the data synchronization instance to resume.
Doris only supports database and table names that start with a letter. Use the object name mapping feature to rename any object whose name starts with a non-letter character.
If a database, table, or column name contains Chinese characters, use object name mapping to rename it (for example, to an English name). Otherwise, the task may fail.
You cannot modify DDL operations on multiple columns at once, or modify DDL operations on a table consecutively.
Do not add backend (BE) nodes to the destination Doris database during synchronization. If this happens, the task fails. Restart the data synchronization instance to resume.
In multi-table merge scenarios (multiple source tables syncing to one destination table), the schemas of all source tables must be identical. Otherwise, data inconsistency or task failure may occur.
In PolarDB for MySQL,
VARCHAR(M)uses character length for M. In Doris,VARCHAR(N)uses byte length for N. If you are not using schema synchronization, set the Doris VARCHAR length to four times the corresponding MySQL VARCHAR length to avoid data loss.When using DMS or gh-ost to perform online DDL changes on the source, DTS syncs only the original DDL to the destination. Online DDL changes made with pt-online-schema-change are not supported and may cause data loss or synchronization failure.
During initial full data synchronization, DTS uses read and write resources of both databases. Evaluate the performance impact beforehand and schedule synchronization during off-peak hours, when CPU usage on both ends is below 30%.
Concurrent INSERT operations during initial full data synchronization cause table fragmentation in the destination. After full sync completes, destination tables may occupy more storage than the source.
Do not use pt-online-schema-change or similar tools for online DDL operations on synchronization objects during synchronization. This causes sync failure.
If data is written to the destination database from sources other than DTS during synchronization, data inconsistency may occur between source and destination.
DTS periodically executes
CREATE DATABASE IF NOT EXISTS \test\`` on the source to advance the binary log position.During incremental synchronization, DTS uses a batch strategy and writes to each synchronization object at most once every five seconds. This may introduce a normal synchronization latency of less than 10 seconds. To reduce latency, adjust the
selectdb.reservoir.timeout.millisecondsparameter (range: 1,000–10,000 milliseconds) in the DTS console. Shorter batching intervals increase write frequency, which may raise the destination's load and response time.If an instance fails, DTS support attempts recovery within eight hours. During recovery, the instance may be restarted or its parameters adjusted. Only DTS instance parameters are modified—database parameters remain unchanged.
Create a data synchronization task
Step 1: Go to the data synchronization page
Use one of the following methods:
DTS console
Log on to the DTS console.DTS console
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the data synchronization task resides.
DMS console
Steps may vary based on the DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console.DMS console
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list next to Data Synchronization Tasks, select the region where the instance resides.
Step 2: Configure source and destination databases
Click Create Task.
Configure the Task Name. DTS generates a name automatically. Specify a descriptive name to make the task easy to identify—a unique name is not required.
Configure the source and destination database connections using the parameters below.
Source database (PolarDB for MySQL)
| Parameter | Value |
|---|---|
| Select Existing Connection | Select a registered instance from the list to auto-populate the following fields, or configure them manually |
| Database Type | PolarDB for MySQL |
| Access Method | Alibaba Cloud Instance |
| Instance Region | The region where the source PolarDB for MySQL cluster resides |
| Replicate Data Across Alibaba Cloud Accounts | No (this example uses the current account) |
| PolarDB Cluster ID | The ID of the source PolarDB for MySQL cluster |
| Database Account | The account for the source cluster (see Permissions required) |
| Database Password | The password for the database account |
| Encryption | Select as needed. For SSL configuration, see Configure SSL encryption |
Destination database (Doris)
| Parameter | Value |
|---|---|
| Select Existing Connection | Select a registered instance from the list to auto-populate the following fields, or configure them manually |
| Database Type | Doris |
| Access Method | Self-managed Database on ECS (this example). For other access methods, see Preparations |
| Instance Region | The region where the destination Doris database resides |
| ECS Instance ID | The ID of the ECS instance running the Doris database. If BE or frontend (FE) nodes are on separate ECS instances, add the CIDR blocks of DTS servers to each instance's security rules |
| Port Number | The service port of the destination Doris database. Default: 9030 |
| Database Account | The account for the destination database (see Permissions required) |
| Database Password | The password for the database account |
Click Test Connectivity and Proceed. In the CIDR Blocks of DTS Servers dialog box, click Test Connectivity.
Make sure that the CIDR blocks of DTS servers are added to the security settings of both source and destination databases. See Add DTS server IP addresses to a whitelist.
Step 3: Select synchronization objects
In the Configure Objects step, configure the following:
| Parameter | Description |
|---|---|
| Synchronization Types | Select Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. Full sync establishes the baseline data before incremental sync begins. If you skip schema synchronization, create the destination tables using the Unique Key Model or Duplicate Key Model in advance. See Data type mappings, Additional columns added by DTS, and Unique Key Model |
| Processing Mode of Conflicting Tables | Precheck and Report Errors: reports an error if a table with the same name exists in the destination. Ignore Errors and Proceed: skips the check. If table schemas differ, data initialization may fail or only partial columns sync. If schemas are the same, source data overwrites destination records with matching primary or unique keys |
| Capitalization of Object Names in Destination Instance | Defaults to DTS default policy. Adjust as needed. See Specify the capitalization of object names |
| Source Objects | Select databases or tables to sync, then click the right arrow icon to add them to Selected Objects |
| Selected Objects | Right-click an object to rename it, filter data with WHERE conditions, or select specific SQL operations. If schema synchronization is enabled, right-click a table and configure bucket_count in Parameter Settings |
Click Next: Advanced Settings and configure the following:
| Parameter | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | By default, DTS uses the shared cluster. For higher stability, purchase a dedicated cluster. See What is a DTS dedicated cluster |
| Retry Time for Failed Connections | How long DTS retries after a connection failure. Range: 10–1,440 minutes. Default: 720. Set to at least 30. If reconnection succeeds within this window, sync resumes; otherwise, the task fails. The shortest value across tasks sharing the same source or destination applies |
| Retry Time for Other Issues | How long DTS retries after DDL or DML failures. Range: 1–1,440 minutes. Default: 10. Set to at least 10. Must be less than Retry Time for Failed Connections |
| Enable Throttling for Full Data Synchronization | Throttle full sync by setting QPS (queries per second) to the source, RPS (rows per second) of full migration, and data migration speed. Available only when Full Data Synchronization is selected |
| Enable Throttling for Incremental Data Synchronization | Throttle incremental sync by setting RPS and data sync speed |
| Whether to delete SQL operations on heartbeat tables of forward and reverse tasks | Yesalert notification settings: does not write heartbeat SQL to the source (latency may be displayed). No: writes heartbeat SQL (may affect physical backup and cloning of the source) |
| Environment Tag | Optional. Tag the instance for environment identification |
| Configure ETL | Enable extract, transform, and load (ETL) to apply data transformations. See What is ETL? and Configure ETL |
| Monitoring and Alerting | Configure alerts for task failures or latency exceeding a threshold. See Configure monitoring and alerting |
(Optional) Click Next: Configure Database and Table Fields. Specify Primary Key Column, Distribution Key, and Engine for each table.
This step is available only when Schema Synchronization is selected. Set Definition Status to All to view all tables.
Primary Key Column supports multiple columns. Distribution Key must be a subset of the primary key columns.
For tables without primary keys or UNIQUE constraints, select duplicate for Engine. Otherwise, the task may fail or data loss may occur.
Step 4: Run the precheck
Click Next: Save Task Settings and Precheck.
DTS runs a precheck before starting the task. The task starts only after passing all checks.
If any check fails, click View Details to see the cause, fix the issue, and rerun the precheck.
If a warning appears: fix and rerun if the item cannot be ignored; click Confirm Alert Details > Ignore > OK > Precheck Again if the item can be safely ignored. Ignoring alerts may cause data inconsistency.
To preview the OpenAPI parameters for this task configuration, move the pointer over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
Step 5: Purchase an instance
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
On the purchase page, configure the following:
| Parameter | Description |
|---|---|
| Billing Method | Subscription: pay upfront for a fixed term, more cost-effective for long-term use. Pay-as-you-go: billed hourly, suitable for short-term use. Release the instance when no longer needed to stop charges |
| Resource Group Settings | The resource group for the instance. Default: default resource group. See What is Resource Management? |
| Instance Class | Select based on your throughput requirements. See Instance classes of data synchronization instances |
| Subscription Duration | Available for subscription billing. Options: 1–9 months, 1, 2, 3, or 5 years |
Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the dialog box.
The task appears in the task list. Monitor its progress from there.
Data type mappings
The following table shows how PolarDB for MySQL data types map to Doris data types.
| Category | PolarDB for MySQL | Doris |
|---|---|---|
| Numeric | TINYINT | TINYINT |
| TINYINT UNSIGNED | SMALLINT | |
| SMALLINT | SMALLINT | |
| SMALLINT UNSIGNED | INT | |
| MEDIUMINT | INT | |
| MEDIUMINT UNSIGNED | BIGINT | |
| INT | INT | |
| INT UNSIGNED | BIGINT | |
| BIGINT | BIGINT | |
| BIGINT UNSIGNED | LARGEINT | |
| BIT(M) | INT | |
| Decimal | Decimal | Decimal (ZEROFILL not supported) |
| Numeric | Decimal | |
| Float | Float | |
| Double | DOUBLE | |
| BOOL / BOOLEAN | BOOLEAN | |
| Date and time | DATE | DATEV2 |
| DATETIME[(fsp)] | DATETIMEV2 | |
| Timestamp[(fsp)] | DATETIMEV2 | |
| Time[(fsp)] | VARCHAR | |
| YEAR[(4)] | INT | |
| String | CHAR / VARCHAR | VARCHAR |
| BINARY / VARBINARY | STRING | |
| TINYTEXT / TEXT / MEDIUMTEXT / LONGTEXT | STRING | |
| TINYBLOB / BLOB / MEDIUMBLOB / LONGBLOB | STRING | |
| ENUM | STRING | |
| SET | STRING | |
| JSON | STRING |
CHAR and VARCHAR conversion notes:
CHAR and VARCHAR(n) values are converted to
VARCHAR(4*n)to avoid data loss. This accounts for the difference in how PolarDB for MySQL (character length) and Doris (byte length) define VARCHAR size.If no length is specified, the default is
VARCHAR(65533).Values exceeding 65,533 characters are converted to the STRING type.
If you are not using schema synchronization, set the Doris VARCHAR length to four times the corresponding MySQL VARCHAR length.
Additional columns added by DTS
When syncing to a table using the Duplicate Key Model, DTS automatically adds the following columns to the destination table:
| Column | Data type | Default | Description |
|---|---|---|---|
_is_deleted | Int | 0 | Deletion flag: 0 for INSERT and UPDATE operations, 1 for DELETE operations |
_version | Bigint | 0 | 0 during full sync; the corresponding binary log timestamp (in seconds) during incremental sync |
_record_id | Bigint | 0 | 0 during full sync; the unique, incremental record ID from the incremental log during incremental sync |
Use these columns to deduplicate data when the destination table uses the Duplicate Key Model.