Use Data Transmission Service (DTS) to replicate data from a PolarDB-X 2.0 instance to an ApsaraDB for SelectDB instance. DTS handles schema synchronization, full data loading, and ongoing incremental replication via binlog.
This feature is in the grayscale testing phase and is available only to some users.
Prerequisites
Before you begin, make sure you have:
A source PolarDB-X 2.0 instance
A destination SelectDB instance
The required permissions on both instances (see Permissions required for database accounts)
Billing
| Synchronization type | Fee |
|---|---|
| Schema synchronization and full data synchronization | Free |
| Incremental data synchronization | Charged. See Billing overview. |
Limitations
Source database requirements
Bandwidth: The server hosting the source database must have at least 100 Mbit/s outbound bandwidth. Lower bandwidth reduces synchronization speed.
Instance type: Enterprise Edition PolarDB-X 2.0 read-only instances are not supported.
Unsupported objects: INDEX, PARTITION, VIEW, PROCEDURE, FUNCTION, TRIGGER, and FK cannot be synchronized. TABLEGROUP objects and databases or schemas with a Locality attribute are also not supported.
Table names: Tables with uppercase letters in their names cannot be synchronized.
Primary key requirement: Tables must have a PRIMARY KEY or UNIQUE constraint with all fields unique. Without this, the destination may contain duplicate records.
Table limit per task: If you select individual tables as objects to synchronize and need to rename tables or columns in the destination, a single task supports up to 5,000 tables. For more than 5,000 tables, split into multiple tasks or synchronize at the database level.
Binary logging: Binary logging is enabled by default on PolarDB-X 2.0. Verify that the
binlog_row_imageparameter is set tofull. If it is not, the precheck fails and the task cannot start. See Parameter settings.DDL during synchronization: Do not run DDL statements that change database or table schemas during schema synchronization or full data synchronization. Doing so causes the task to fail.
Network changes: If the network configuration of the PolarDB-X 2.0 instance changes, the synchronization instance may experience latency for a period of time.
Destination database requirements
Supported table types: Data can only be synchronized to Unique or Duplicate engine tables in SelectDB.
Unique engine tables: All unique keys in the destination table must exist in the source table and be included in the objects to synchronize. Missing keys cause data inconsistency.
Duplicate engine tables: Duplicate records may appear when the synchronization instance is retried or restarted, or when the same record is modified by DML operations more than once after the task starts. DTS converts UPDATE and DELETE statements to INSERT statements for Duplicate engine tables. Use the additional columns (
_is_deleted,_version,_record_id) to deduplicate. See Additional columns for Duplicate engine tables.Do not create clusters during synchronization: Creating clusters in the destination SelectDB instance while the task is running causes the task to fail. If the task fails, restart the synchronization instance to resume.
Do not add backend nodes during synchronization: Adding backend nodes to the SelectDB database during synchronization causes the task to fail. Restart the synchronization instance to resume.
Naming requirements: SelectDB supports only database and table names that start with a letter. Use the object name mapping feature to rename any database or table whose name starts with a non-letter character.
Chinese character names: If a database, table, or column name contains Chinese characters, use object name mapping to rename it before synchronizing. Otherwise, the task may fail.
VARCHAR length difference: In PolarDB-X 2.0,
VARCHAR(M)defines character length. In SelectDB,VARCHAR(N)defines byte length. If you are not using DTS schema synchronization, set the VARCHAR field length in SelectDB to 4x the corresponding PolarDB-X 2.0 length to avoid data loss.DDL restrictions: You cannot modify DDL operations on multiple columns at a time, or run consecutive DDL operations on the same table.
Operational considerations
Heartbeat writes: DTS periodically runs
CREATE DATABASE IF NOT EXISTS `test`in the source database to write heartbeat data that advances the binlog position. If you set Whether to delete SQL operations on heartbeat tables of forward and reverse tasks to Yes (or the source account lacks the CREATE DATABASE permission), and no DML operations run on the source for an extended period, the displayed synchronization latency may be inaccurate. To refresh the latency display, run a DML operation on the source database.`bucket_count` parameter: In the Selected Objects section, you can specify the
bucket_countparameter. The value must be a positive integer. The default isauto.Full data synchronization load: During full data synchronization, DTS uses read and write resources on both the source and destination databases, which increases server load. Run the task during off-peak hours, when CPU usage on both databases is below 30%.
Storage fragmentation: Concurrent INSERT operations during full initialization may leave fragments in destination tables. After full initialization, destination tables may occupy more storage space than their source counterparts.
External writes during synchronization: If sources other than DTS write to the destination database during synchronization, data inconsistency may occur and the task may fail.
Task failure recovery: If a DTS task fails, DTS support attempts to restore it within 8 hours. During restoration, the task may be restarted and task parameters may be modified. Database parameters are not modified. The parameters that may be modified include but are not limited to the parameters in the "Modify instance parameters" section of the Modify the parameters of a DTS instance topic.
Supported SQL operations for incremental synchronization
| Operation type | SQL statements |
|---|---|
| DML | INSERT, UPDATE, DELETE |
| DDL | ADD COLUMN; DROP COLUMN; DROP TABLE; TRUNCATE TABLE; RENAME TABLE |
RENAME TABLE may cause data inconsistency if you selected the table (rather than its parent database) as the synchronization object. If a table is renamed and its new parent database is not included in the synchronized objects, data for that table stops synchronizing. To avoid this, select the database as the synchronization object instead of individual tables.
Permissions required for database accounts
| Database | Required permissions | References |
|---|---|---|
| Source PolarDB-X 2.0 | REPLICATION SLAVE, REPLICATION CLIENT, SELECT (on objects to be synchronized) | Manage database accounts and Account permission issues during data synchronization |
| Destination SelectDB | Usage_priv (cluster access), Select_priv, Load_priv, Alter_priv, Create_priv, Drop_priv | Cluster permission management and Basic permission management |
Create a data synchronization task
This procedure has seven steps:
Go to the Data Synchronization page.
Configure source and destination databases.
Select objects to synchronize.
Configure advanced settings.
(Optional) Configure database and table fields.
Run a precheck.
Purchase an instance.
Step 1: Go to the Data Synchronization page
Use either the DTS console or the DMS console.
DTS console:
Log on to the DTS console.
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the data synchronization task resides.
DMS console:
The exact steps may vary based on your DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console.
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list to the right of Data Synchronization Tasks, select the region where the data synchronization instance resides.
Step 2: Configure source and destination databases
Click Create Task.
Configure the parameters described in the following table.
Section Parameter Description N/A Task Name A descriptive name for this DTS task. DTS generates a default name. The name does not need to be unique. Source Database Select Existing Connection Select a registered database instance to auto-fill the connection parameters, or configure the connection manually. Database Type Select PolarDB-X 2.0. Access Method Select Alibaba Cloud Instance. Instance Region Select the region of the source PolarDB-X 2.0 instance. Replicate Data Across Alibaba Cloud Accounts Select No to use an instance in the current Alibaba Cloud account. Instance ID Select the ID of the source PolarDB-X 2.0 instance. Database Account Enter the account for the source database. For required permissions, see Permissions required for database accounts. Database Password Enter the password for the source database account. Destination Database Select Existing Connection Select a registered database instance to auto-fill the connection parameters, or configure the connection manually. Database Type Select SelectDB. Access Method Select Alibaba Cloud Instance. Instance Region Select the region of the destination SelectDB instance. Replicate Data Across Alibaba Cloud Accounts Select No to use an instance in the current Alibaba Cloud account. Instance ID Select the ID of the destination SelectDB instance. Database Account Enter the account for the destination database. For required permissions, see Permissions required for database accounts. Database Password Enter the password for the destination database account. Click Test Connectivity and Proceed.
DTS server CIDR blocks must be added to the security settings of both the source and destination databases. This may happen automatically, or you may need to add them manually. See Add the CIDR blocks of DTS servers.
Step 3: Select objects to synchronize
In the Configure Objects step, configure the following parameters:
| Parameter | Description |
|---|---|
| Synchronization Types | Select Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. Incremental Data Synchronization is selected by default. Full data synchronization loads the existing data in the source into the destination before incremental synchronization begins. Important If you do not select Schema Synchronization, create the Unique or Duplicate model tables in the destination SelectDB instance manually before starting the task. See Data type mapping, Additional columns for Duplicate engine tables, and Data model. |
| Processing Mode of Conflicting Tables | Precheck and Report Errors: The precheck fails if the destination contains tables with the same names as source tables. This is the safe default. If identical table names exist and cannot be deleted or renamed in the destination, use the object name mapping feature to rename them. See Map object names. Ignore Errors and Proceed: Skips the identical table name check. Warning This may cause data inconsistency. During full synchronization, records with matching primary or unique key values are skipped; the existing destination record is kept. During incremental synchronization, matching records overwrite the destination. If schemas differ, initialization may fail or only some columns synchronize. |
| Capitalization of Object Names in Destination Instance | Controls the case of database, table, and column names in the destination. The default is DTS default policy. See Specify the capitalization of object names in the destination instance. |
| Source Objects | Select databases or tables from the Source Objects section and click the |
| Selected Objects | Right-click an object to rename it, filter rows with WHERE conditions, or select specific SQL operations to synchronize. If you selected Schema Synchronization, right-click a table, enable Parameter Settings, set bucket_count to a positive integer, and click OK. Note Renaming an object with the object name mapping feature may cause dependent objects to fail synchronization. See Specify filter conditions. |
Step 4: Configure advanced settings
Click Next: Advanced Settings and configure the following parameters:
| Parameter | Description |
|---|---|
| Dedicated Cluster for Task Scheduling | By default, DTS uses the shared cluster. For higher stability, purchase and select a dedicated cluster. See What is a DTS dedicated cluster. |
| Retry Time for Failed Connections | How long DTS retries failed connections before failing the task. Valid values: 10–1440 minutes. Default: 720. We recommend that you set this parameter to a value greater than 30. If multiple tasks share the same source or destination, the shortest retry time takes precedence. DTS charges for the instance during retries. |
| Retry Time for Other Issues | How long DTS retries failed DDL or DML operations before failing the task. Valid values: 1–1440 minutes. Default: 10. We recommend that you set this parameter to a value greater than 10. Must be less than Retry Time for Failed Connections. |
| Enable Throttling for Full Data Synchronization | Limit the read/write load on source and destination during full data synchronization by setting Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected. |
| Enable Throttling for Incremental Data Synchronization | Limit the load during incremental synchronization by setting RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s). |
| Whether to delete SQL operations on heartbeat tables of forward and reverse tasks | Yes: DTS does not write heartbeat operations to the source database. The displayed synchronization latency may be inaccurate if no DML activity occurs on the source for an extended period. No: DTS writes heartbeat operations, which may affect physical backup and cloning of the source database. |
| Environment Tag | An optional tag to identify the DTS instance. |
| Configure ETL | Yes: Configure extract, transform, and load (ETL) processing with data processing statements. See Configure ETL in a data migration or data synchronization task. No: Skip ETL configuration. |
| Monitoring and Alerting | Yes: Configure alert thresholds and notification recipients for task failures or latency spikes. See Configure monitoring and alerting when you create a DTS task. No: No alerts. |
Step 5: (Optional) Configure database and table fields
Click Next: Configure Database and Table Fields to specify how DTS maps source table structures to the destination.
This step is available only if Schema Synchronization is selected. Set Definition Status to All to view and edit all tables.
| Parameter | Description |
|---|---|
| Primary Key Column | Select one or more columns as the primary key in the destination. |
| Distribution Key | Select one or more columns from the primary key columns as the distribution key. |
| Engine | Select the destination table engine: Unique or Duplicate. If the source table has no primary key or UNIQUE constraint, select duplicate. Using the wrong engine may cause the task to fail or result in data loss. |
Step 6: Run a precheck
Click Next: Save Task Settings and Precheck.
To preview the API parameters for this configuration, hover over the button and click Preview OpenAPI parameters before proceeding.
DTS runs a precheck before the task starts. If the precheck fails:
Click View Details next to a failed item, fix the reported issue, and click Precheck Again.
If an item generates an alert that can be ignored, click Confirm Alert Details, then Ignore, then OK, and then Precheck Again. Ignoring alerts may result in data inconsistency.
Step 7: Purchase an instance
Wait until Success Rate reaches 100%, then click Next: Purchase Instance.
On the buy page, configure the following parameters:
Section Parameter Description New Instance Class Billing Method Subscription: Pay upfront for a fixed term. More cost-effective for long-term use. Pay-as-you-go: Billed hourly. Release the instance when it is no longer needed to stop charges. Resource Group Settings The resource group for this instance. Default: default resource group. See What is Resource Management? Instance Class The synchronization speed varies by instance class. See Instance classes of data synchronization instances. Subscription Duration Available only for the Subscription billing method. Valid values: 1–9 months, 1 year, 2 years, 3 years, or 5 years. Read and accept Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the confirmation dialog.
The task appears in the task list. Monitor its progress there.
Data type mapping
DTS automatically converts PolarDB-X 2.0 data types to compatible SelectDB types during schema synchronization. Review the mappings before synchronizing to identify any columns that require schema adjustments.
| Category | PolarDB-X 2.0 type | SelectDB type | Notes |
|---|---|---|---|
| Numeric | TINYINT | TINYINT | |
| TINYINT UNSIGNED | SMALLINT | Unsigned range exceeds TINYINT; promotes to SMALLINT. | |
| SMALLINT | SMALLINT | ||
| SMALLINT UNSIGNED | INT | ||
| MEDIUMINT | INT | ||
| MEDIUMINT UNSIGNED | BIGINT | ||
| INT | INT | ||
| INT UNSIGNED | BIGINT | ||
| BIGINT | BIGINT | ||
| BIGINT UNSIGNED | LARGEINT | ||
| BIT(M) | INT | ||
| Decimal | DECIMAL | DECIMAL | zerofill is not supported. |
| NUMERIC | DECIMAL | ||
| FLOAT | FLOAT | ||
| DOUBLE | DOUBLE | ||
| BOOL / BOOLEAN | BOOLEAN | ||
| Date and time | DATE | DATEV2 | |
| DATETIME[(fsp)] | DATETIMEV2 | ||
| TIMESTAMP[(fsp)] | DATETIMEV2 | ||
| TIME[(fsp)] | VARCHAR | SelectDB has no TIME type; stored as a string. | |
| YEAR[(4)] | INT | ||
| String | CHAR, VARCHAR | VARCHAR | Converted to VARCHAR(4xn) to prevent data loss, because PolarDB-X 2.0 counts characters while SelectDB counts bytes. If no length is specified, defaults to VARCHAR(65533). If the converted length exceeds 65533, the type becomes STRING. |
| BINARY, VARBINARY | STRING | ||
| TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT | STRING | ||
| TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB | STRING | ||
| ENUM | STRING | ||
| SET | STRING | ||
| JSON | STRING |
Additional columns for Duplicate engine tables
DTS automatically adds the following columns to Duplicate engine tables in the destination. If you created the destination table manually (without using DTS schema synchronization), add these columns before starting the task.
| Column | Data type | Default | Description |
|---|---|---|---|
_is_deleted | INT | 0 | Indicates whether the record was deleted. 0 for INSERT and UPDATE; 1 for DELETE. |
_version | BIGINT | 0 | Version timestamp. 0 for records loaded during full synchronization. For incremental records, the timestamp (in seconds) from the source binlog. |
_record_id | BIGINT | 0 | Unique record identifier. 0 for full synchronization records. For incremental records, the unique, incrementing record ID from the incremental log. |
Use these columns to identify and remove duplicate records when deduplication is required.