Data Transmission Service (DTS) migrates data from a PolarDB for MySQL cluster to a self-managed Doris database, enabling large-scale analytics workloads on Doris. This topic walks through the setup using a Doris database deployed on an ECS instance as the destination.
To complete this migration:
Grant the required permissions to database accounts.
Create a DTS task and connect the source and destination databases.
Select migration types and objects.
Run a precheck, purchase a DTS instance, and start the migration.
Prerequisites
A destination Doris database is created. The available storage space of this database must be larger than the storage space used by the source PolarDB for MySQL cluster.
For more information about the supported versions of the source and destination databases, see Overview of migration solutions.
Permissions required for database accounts
The permissions required vary based on which migration types you select.
| Database | Schema migration | Full data migration | Incremental data migration |
|---|---|---|---|
| Source PolarDB for MySQL cluster | SELECT | SELECT | Read and write permissions |
| Destination Doris database | Usage_priv, Select_priv, Load_priv, Alter_priv, Create_priv, Drop_priv | (same) | (same) |
To create and authorize database accounts:
For a PolarDB for MySQL cluster, see Create and manage a database account.
For a Doris database, see Authentication and Authorization.
Billing
| Migration type | Link configuration fees | Data transfer cost |
|---|---|---|
| Schema migration and full data migration | Free | Free, unless Access Method for the destination is set to Public IP Address. See Billing overview. |
| Incremental data migration | Charged. See Billing overview. | — |
Supported SQL operations for incremental migration
| Operation type | SQL statements |
|---|---|
| DML | INSERT, UPDATE, DELETE |
| DDL | ADD COLUMN, MODIFY COLUMN, CHANGE COLUMN, DROP COLUMN, DROP TABLE, TRUNCATE TABLE, RENAME TABLE |
RENAME TABLE operations may cause data inconsistency. If you select a table as the migration object and rename it during migration, that table's data is not migrated to the destination. To prevent this, select the database containing the table as the migration object, and make sure both the pre-rename and post-rename databases are included in the migration objects.
Constraints and limits
Review the following constraints before configuring your migration task.
Source database limits
Outbound bandwidth: the server hosting the source database must have enough outbound bandwidth. Insufficient bandwidth reduces migration speed.
Table requirements:
Tables with primary keys or UNIQUE constraints: make sure table fields are unique to avoid duplicate data in the destination.
Tables without primary keys or UNIQUE constraints: when configuring the task, select Schema Migration under Migration Types, then set Engine to duplicate for these tables in the Configurations for Databases, Tables, and Columns step. Otherwise, the task may fail or data may be lost.
Column mapping limit: if you map column names or perform per-column edits, a single task can migrate a maximum of 1,000 tables. To exceed this limit, split the tables across multiple tasks or migrate the entire database without per-column edits.
Incremental migration only:
Enable binary logging and set
loose_polar_log_bintoon. Without this, the precheck fails and the task cannot start. See Enable binary logging and Modify parameters. > Note: Enabling binary logging on a PolarDB for MySQL cluster consumes storage space and incurs fees.Retain binary logs for at least 3 days (7 days recommended). A shorter retention period may prevent DTS from reading the required logs, which can cause data inconsistency or data loss—issues not covered by the DTS Service-Level Agreement (SLA). See Modify the retention period.
Operations to avoid during migration:
Do not perform DDL operations that change the database or table schema during schema migration or full data migration. Doing so causes the task to fail.
If you run full data migration only (without incremental), do not write new data to the source. To maintain real-time consistency, select schema migration, full data migration, and incremental data migration together.
Data changes from operations not recorded in binary logs—such as physical backup recovery or cascade operations—are not migrated. > Note: If this occurs, run a full data migration again when your business allows.
Other limits
Read-only nodes: migration of read-only nodes from the source PolarDB for MySQL cluster is not supported.
OSS external tables: migration of OSS external tables from the source is not supported.
Primary/secondary failover: DTS does not support source database primary/secondary failover during full migration. If a failover occurs, reconfigure the task promptly.
Destination table models: data can be migrated only to tables that use the Unique Key model or Duplicate Key model in Doris.
Duplicate Key model: duplicate data may appear if a retry or restart occurs, or if two or more DML operations are performed on the same row after the task starts. Deduplicate using the additional columns (
_is_deleted,_version,_record_id). DTS converts UPDATE and DELETE statements to INSERT statements for Duplicate Key model tables.
`bucket_count` parameter: the only configurable parameter in the Selected Objects box. Must be a positive integer. Default: auto.
Cluster creation: do not create clusters in the destination Doris database during migration. If this causes a failure, restart the migration instance to resume.
Object name requirements:
Doris supports only database and table names that start with a letter. Use the object name mapping feature to rename objects that do not meet this requirement.
If a database, table, or column name contains Chinese characters, use the object name mapping feature to rename it (for example, to an English equivalent). Otherwise, the task may fail.
DDL restrictions: DDL operations that modify multiple columns simultaneously, or that modify the same table in succession, are not supported.
Backend (BE) nodes: do not add BE nodes to the destination Doris database during migration. If this causes a failure, restart the migration instance to resume.
Multi-table merge: when migrating data from multiple source tables into a single destination table, all source table schemas must be consistent. Inconsistent schemas may cause data inconsistency or task failure.
VARCHAR length: in PolarDB for MySQL,
VARCHAR(M)specifies character length. In Doris,VARCHAR(N)specifies byte length. If you are not using DTS schema migration, set the VARCHAR field length in Doris to four times the MySQL length.Online DDL tools:
DTS supports online DDL changes performed using DMS or gh-ost. DTS migrates only the original DDL statements, not temporary table data. However, the destination table may be locked during this process.
DTS does not support online DDL changes performed using tools such as pt-online-schema-change on the source. Such changes on the source may cause data loss or task failure.
Performance impact: during full migration, DTS reads from the source and writes to the destination, increasing database server load. Run migration during off-peak hours when the CPU load is below 30%.
Tablespace fragmentation: concurrent INSERT operations during full migration cause table fragmentation in the destination. After full migration, the used tablespace in the destination will be larger than in the source.
Concurrent writes to destination: writing data from other sources to the destination during migration may cause data inconsistency.
Heartbeat table: DTS periodically runs
CREATE DATABASE IF NOT EXISTS \test\`` on the source to advance the binary log offset.Incremental migration latency: DTS uses a batch synchronization policy to reduce load on the destination. By default, DTS writes to a single object at most every 5 seconds, resulting in a typical synchronization latency within 10 seconds. To reduce latency, adjust the
selectdb.reservoir.timeout.millisecondsparameter in the console. Valid range: 1,000–10,000 milliseconds.A smaller value increases write frequency, which may increase the load and response time (RT) of the destination and, in turn, increase synchronization latency. Adjust based on the destination load.
Instance recovery: if a DTS instance fails, the helpdesk attempts recovery within 8 hours. During recovery, DTS may restart the instance or adjust its parameters (only DTS instance parameters—not database parameters). See Modify instance parameters for the parameters that may be modified.
Migrate data from PolarDB for MySQL to Doris
Step 1: Open the Data Migration page
Use one of the following methods to navigate to the Data Migration page and select the region where the migration instance resides.
DTS console
Log on to the DTS console.DTS console
In the left-side navigation pane, click Data Migration.
In the upper-left corner, select the region where the data migration instance resides.
DMS console
The actual steps may vary based on the mode and layout of the DMS console. See Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console.DMS console
In the top navigation bar, move the pointer over Data + AI > DTS (DTS) > Data Migration.
From the drop-down list to the right of Data Migration Tasks, select the region where the data migration instance resides.
Step 2: Configure source and destination databases
Click Create Task.
Configure the source and destination database parameters described in the following table.
Category Parameter Description N/A Task Name The name of the DTS task. DTS auto-generates a name. Specify a descriptive name that makes the task easy to identify. The name does not need to be unique. Source Database Select Existing Connection If the instance is registered with DTS, select it from the drop-down list. DTS auto-fills the parameters below. If not registered, configure the parameters manually. Database Type Select PolarDB for MySQL. Access Method Select Alibaba Cloud Instance. Instance Region Select the region where the source PolarDB for MySQL cluster resides. Replicate Data Across Alibaba Cloud Accounts Select No to use an instance under the current Alibaba Cloud account. PolarDB Cluster ID Select the ID of the source PolarDB for MySQL cluster. Database Account Enter the database account of the source cluster. See Permissions required for database accounts. Database Password Enter the password for the database account. Encryption Select a connection method. For SSL encryption, see Set SSL encryption. Destination Database Select Existing Connection If the instance is registered with DTS, select it from the drop-down list. DTS auto-fills the parameters below. If not registered, configure the parameters manually. Database Type Select Doris. Access Method Select Self-managed Database on ECS. For other connection types, see Preparations. Instance Region Select the region where the destination Doris database resides. ECS Instance ID Select the ID of the ECS instance where the Doris database is deployed. If Doris is deployed across multiple ECS instances (for example, BE or Frontend (FE) nodes on separate instances), add the CIDR blocks of DTS servers to the security rules of each ECS instance. Port Number Enter the service port of the destination Doris database. Default: 9030. Database Account Enter the database account of the destination Doris database. See Permissions required for database accounts. Database Password Enter the password for the database account. Click Test Connectivity and Proceed, then click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.
Make sure the CIDR blocks of DTS servers are added to the security settings of the source and destination databases. See Add DTS server IP addresses to a whitelist.
Step 3: Configure migration objects
On the Configure Objects page, set the following parameters.
Parameter Description Migration Types Select the migration types based on your requirements: - Full migration only: select Schema Migration and Full Data Migration.
- Full migration with ongoing sync: select Schema Migration, Full Data Migration, and Incremental Data Migration.
Important- After migration, PolarDB for MySQL data types are converted to Doris types. If you skip Schema Migration, create destination tables using the Unique Key model or Duplicate Key model before migration. See Data type mappings, Additional column information, and Unique Key Model.
- If you do not select Incremental Data Migration, do not write data to the source during migration.
Processing Mode of Conflicting Tables - Precheck and Report Errors: DTS checks for tables with the same name in the destination. If conflicts exist, the precheck fails and the task does not start. To resolve conflicts without deleting the destination table, use object name mapping to rename the object. See Map object names.
- Ignore Errors and Proceed: DTS skips the name conflict check. Warning
This may cause data inconsistency. If schemas match, source records overwrite destination records with the same primary key. If schemas differ, partial migration or failure may occur.
Capitalization of object names in destination instance Controls the capitalization of database, table, and column names in the destination. Default: DTS default policy. See Specify the capitalization of object names in the destination instance. Source Objects Select one or more objects from Source Objects, then click the arrow icon to add them to Selected Objects. Select databases or tables as migration objects. Selected Objects - To rename a migration object in the destination, right-click the object. See Map object names.
- To set the bucket count (
bucket_count) for a table: right-click the table, set Enable Parameter Settings to Yesalert notification settings, enter the value, and click OK. Available only when Schema Migration is selected. - To select incremental SQL operations at the database or table level, right-click the object and select the SQL operations.
- To filter rows using WHERE conditions, right-click the table and configure conditions. See Set filter conditions.
- If you use object name mapping, other objects that depend on the mapped object may fail to migrate.
Click Next: Advanced Settings and configure the following parameters.
Parameter Description Dedicated cluster for task scheduling DTS schedules tasks to the shared cluster by default. For improved stability, purchase a dedicated cluster. See What is a DTS dedicated cluster. Retry time for failed connections The time range DTS retries a connection when the source or destination cannot be reached after the task starts. Valid values: 10–1,440 minutes. Default: 720 minutes. Set to a value greater than 30. If DTS reconnects within this window, it resumes the task; otherwise, the task fails.<br> NoteIf multiple tasks share the same source or destination, the most recently specified value takes precedence. DTS charges for the instance during retries.
Retry time for other issues The time range DTS retries failed DDL or DML operations after the task starts. Valid values: 1–1,440 minutes. Default: 10 minutes. Set to a value greater than 10. ImportantThis value must be smaller than Retry time for failed connections.
Enable throttling for full data migration Limits read and write operations during full migration to reduce server load. Configure Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Migration is selected. Enable throttling for incremental data migration Limits operations during incremental migration. Configure RPS of Incremental Data Migration and Data migration speed for incremental migration (MB/s). Available only when Incremental Data Migration is selected. Whether to delete SQL operations on heartbeat tables of forward and reverse tasks Controls whether DTS writes SQL operations on heartbeat tables to the source while the instance is running. - Yes: does not write heartbeat table operations. A latency may appear on the DTS instance.
- No: writes heartbeat table operations. This may affect features such as physical backup and cloning of the source database.
Environment tag (Optional) Select a tag to identify the instance. Configure ETL Specifies whether to configure the extract, transform, and load (ETL) feature. See What is ETL? - Yes: enable ETL and enter processing statements in the code editor. See Configure ETL in a data migration or data synchronization task.
- No: skip ETL configuration.
Monitoring and alerting Configures alerts for the task. If the task fails or migration latency exceeds the threshold, alert contacts are notified. - No: no alerting.
- Yes: configure alert threshold and notification settings. See Configure monitoring and alerting when you create a DTS task.
(Optional) Click Next: Configure Database and Table Fields to set the Primary Key Column, Distribution Key, and Engine for destination tables.
- This step is available only if you selected Schema Migration in the previous step. Set Definition Status to All to view and modify all tables. - For Primary Key Column, select one or more columns to form a composite primary key. At least one Primary Key Column must also be set as the Distribution Key. - For tables without primary keys or UNIQUE constraints, set Engine to duplicate. Otherwise, migration may fail or data may be lost.
Step 4: Run a precheck and start migration
Click Next: Save Task Settings and Precheck.
- To view the API parameters for this task configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters. - DTS runs a precheck before starting the task. The task starts only after passing the precheck. - If the precheck fails, click View Details next to each failed item, fix the issues, and run the precheck again. - If an alert is triggered during the precheck: for alerts that cannot be ignored, fix the issue and rerun the precheck. For alerts that can be ignored, click Confirm Alert Details > Ignore > OK, then run the precheck again. Ignoring alerts may cause data inconsistency or expose your business to risk.
After Success Rate reaches 100%, click Next: Purchase Instance.
On the Purchase Instance page, configure the following parameters.
Section Parameter Description New Instance Class Resource Group The resource group for the migration instance. Default: default resource group. See What is Resource Management? Instance Class The instance class determines migration speed. Select a class based on your requirements. See Instance classes of data migration instances. Read and accept Data Transmission Service (Pay-as-you-go) Service Terms, then click Buy and Start. In the confirmation message, click OK.
Track progress on the Data Migration page.
If the task cannot migrate incremental data, it stops automatically. The Status column shows Completed.
If the task migrates incremental data, it runs continuously and does not stop automatically. The Status column shows Running.
Data type mappings
When DTS migrates data from PolarDB for MySQL to Doris, the following type conversions apply.
| Category | PolarDB for MySQL type | Doris type | Notes |
|---|---|---|---|
| Integer | TINYINT | TINYINT | |
| TINYINT UNSIGNED | SMALLINT | Unsigned types are promoted to the next signed type to avoid overflow. | |
| SMALLINT | SMALLINT | ||
| SMALLINT UNSIGNED | INT | ||
| MEDIUMINT | INT | ||
| MEDIUMINT UNSIGNED | BIGINT | ||
| INT | INT | ||
| INT UNSIGNED | BIGINT | ||
| BIGINT | BIGINT | ||
| BIGINT UNSIGNED | LARGEINT | ||
| BIT(M) | INT | ||
| Decimal | DECIMAL | DECIMAL | ZEROFILL is not supported. |
| NUMERIC | DECIMAL | ||
| FLOAT | FLOAT | ||
| DOUBLE | DOUBLE | ||
| BOOL / BOOLEAN | BOOLEAN | ||
| Date and time | DATE | DATEV2 | |
| DATETIME[(fsp)] | DATETIMEV2 | ||
| TIMESTAMP[(fsp)] | DATETIMEV2 | ||
| TIME[(fsp)] | VARCHAR | Doris has no native TIME type. | |
| YEAR[(4)] | INT | ||
| String | CHAR / VARCHAR | VARCHAR | Important To avoid data loss, CHAR and VARCHAR(n) are converted to VARCHAR(4×n). If no length is specified, the default is VARCHAR(65533). Data longer than 65,533 characters is converted to STRING. |
| BINARY / VARBINARY | STRING | ||
| TINYTEXT / TEXT / MEDIUMTEXT / LONGTEXT | STRING | ||
| TINYBLOB / BLOB / MEDIUMBLOB / LONGBLOB | STRING | ||
| ENUM | STRING | ||
| SET | STRING | ||
| JSON | STRING |
Additional column information
For tables that use the Duplicate Key model, DTS automatically adds the following columns to the destination tables. Use these columns to deduplicate data.
| Column name | Data type | Default value | Description |
|---|---|---|---|
_is_deleted | INT | 0 | Indicates whether the row is deleted. INSERT and UPDATE set this to 0; DELETE sets this to 1. |
_version | BIGINT | 0 | For full data migration: 0. For incremental data migration: the timestamp in seconds from the source binary log. |
_record_id | BIGINT | 0 | For full data migration: 0. For incremental data migration: the unique, auto-incrementing record ID of the incremental log entry. |