Sync Data from PolarDB-X 2.0 to SelectDB via DTS - Data Transmission Service

Use Data Transmission Service (DTS) to replicate data from a PolarDB-X 2.0 instance to an ApsaraDB for SelectDB instance. DTS handles schema synchronization, full data loading, and ongoing incremental replication via binlog.

Important

This feature is in the grayscale testing phase and is available only to some users.

Prerequisites

Before you begin, make sure you have:

A source PolarDB-X 2.0 instance
A destination SelectDB instance
The required permissions on both instances (see Permissions required for database accounts)

Billing

Synchronization type	Fee
Schema synchronization and full data synchronization	Free
Incremental data synchronization	Charged. See Billing overview.

Limitations

Source database requirements

Bandwidth: The server hosting the source database must have at least 100 Mbit/s outbound bandwidth. Lower bandwidth reduces synchronization speed.
Instance type: Enterprise Edition PolarDB-X 2.0 read-only instances are not supported.
Unsupported objects: INDEX, PARTITION, VIEW, PROCEDURE, FUNCTION, TRIGGER, and FK cannot be synchronized. TABLEGROUP objects and databases or schemas with a Locality attribute are also not supported.
Table names: Tables with uppercase letters in their names cannot be synchronized.
Primary key requirement: Tables must have a PRIMARY KEY or UNIQUE constraint with all fields unique. Without this, the destination may contain duplicate records.
Table limit per task: If you select individual tables as objects to synchronize and need to rename tables or columns in the destination, a single task supports up to 5,000 tables. For more than 5,000 tables, split into multiple tasks or synchronize at the database level.
Binary logging: Binary logging is enabled by default on PolarDB-X 2.0. Verify that the binlog_row_image parameter is set to full. If it is not, the precheck fails and the task cannot start. See Parameter settings.
DDL during synchronization: Do not run DDL statements that change database or table schemas during schema synchronization or full data synchronization. Doing so causes the task to fail.
Network changes: If the network configuration of the PolarDB-X 2.0 instance changes, the synchronization instance may experience latency for a period of time.

Destination database requirements

Supported table types: Data can only be synchronized to Unique or Duplicate engine tables in SelectDB.
Unique engine tables: All unique keys in the destination table must exist in the source table and be included in the objects to synchronize. Missing keys cause data inconsistency.
Duplicate engine tables: Duplicate records may appear when the synchronization instance is retried or restarted, or when the same record is modified by DML operations more than once after the task starts. DTS converts UPDATE and DELETE statements to INSERT statements for Duplicate engine tables. Use the additional columns (_is_deleted, _version, _record_id) to deduplicate. See Additional columns for Duplicate engine tables.
Do not create clusters during synchronization: Creating clusters in the destination SelectDB instance while the task is running causes the task to fail. If the task fails, restart the synchronization instance to resume.
Do not add backend nodes during synchronization: Adding backend nodes to the SelectDB database during synchronization causes the task to fail. Restart the synchronization instance to resume.
Naming requirements: SelectDB supports only database and table names that start with a letter. Use the object name mapping feature to rename any database or table whose name starts with a non-letter character.
Chinese character names: If a database, table, or column name contains Chinese characters, use object name mapping to rename it before synchronizing. Otherwise, the task may fail.
VARCHAR length difference: In PolarDB-X 2.0, VARCHAR(M) defines character length. In SelectDB, VARCHAR(N) defines byte length. If you are not using DTS schema synchronization, set the VARCHAR field length in SelectDB to 4x the corresponding PolarDB-X 2.0 length to avoid data loss.
DDL restrictions: You cannot modify DDL operations on multiple columns at a time, or run consecutive DDL operations on the same table.

Operational considerations

Heartbeat writes: DTS periodically runs CREATE DATABASE IF NOT EXISTS `test` in the source database to write heartbeat data that advances the binlog position. If you set Whether to delete SQL operations on heartbeat tables of forward and reverse tasks to Yes (or the source account lacks the CREATE DATABASE permission), and no DML operations run on the source for an extended period, the displayed synchronization latency may be inaccurate. To refresh the latency display, run a DML operation on the source database.
`bucket_count` parameter: In the Selected Objects section, you can specify the bucket_count parameter. The value must be a positive integer. The default is auto.
Full data synchronization load: During full data synchronization, DTS uses read and write resources on both the source and destination databases, which increases server load. Run the task during off-peak hours, when CPU usage on both databases is below 30%.
Storage fragmentation: Concurrent INSERT operations during full initialization may leave fragments in destination tables. After full initialization, destination tables may occupy more storage space than their source counterparts.
External writes during synchronization: If sources other than DTS write to the destination database during synchronization, data inconsistency may occur and the task may fail.
Task failure recovery: If a DTS task fails, DTS support attempts to restore it within 8 hours. During restoration, the task may be restarted and task parameters may be modified. Database parameters are not modified. The parameters that may be modified include but are not limited to the parameters in the "Modify instance parameters" section of the Modify the parameters of a DTS instance topic.

Supported SQL operations for incremental synchronization

Operation type	SQL statements
DML	INSERT, UPDATE, DELETE
DDL	ADD COLUMN; DROP COLUMN; DROP TABLE; TRUNCATE TABLE; RENAME TABLE

Important

RENAME TABLE may cause data inconsistency if you selected the table (rather than its parent database) as the synchronization object. If a table is renamed and its new parent database is not included in the synchronized objects, data for that table stops synchronizing. To avoid this, select the database as the synchronization object instead of individual tables.

Permissions required for database accounts

Database	Required permissions	References
Source PolarDB-X 2.0	REPLICATION SLAVE, REPLICATION CLIENT, SELECT (on objects to be synchronized)	Manage database accounts and Account permission issues during data synchronization
Destination SelectDB	Usage_priv (cluster access), Select_priv, Load_priv, Alter_priv, Create_priv, Drop_priv	Cluster permission management and Basic permission management

Create a data synchronization task

This procedure has seven steps:

Go to the Data Synchronization page.
Configure source and destination databases.
Select objects to synchronize.
Configure advanced settings.
(Optional) Configure database and table fields.
Run a precheck.
Purchase an instance.

Step 1: Go to the Data Synchronization page

Use either the DTS console or the DMS console.

DTS console:

Log on to the DTS console.
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner, select the region where the data synchronization task resides.

DMS console:

The exact steps may vary based on your DMS console mode and layout. See Simple mode and Customize the layout and style of the DMS console.

Log on to the DMS console.
In the top navigation bar, move the pointer over Data + AI and choose DTS (DTS) > Data Synchronization.
From the drop-down list to the right of Data Synchronization Tasks, select the region where the data synchronization instance resides.

Step 2: Configure source and destination databases

Click Create Task.

Configure the parameters described in the following table.

Section	Parameter	Description
N/A	Task Name	A descriptive name for this DTS task. DTS generates a default name. The name does not need to be unique.
Source Database	Select Existing Connection	Select a registered database instance to auto-fill the connection parameters, or configure the connection manually.
	Database Type	Select PolarDB-X 2.0.
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	Select the region of the source PolarDB-X 2.0 instance.
	Replicate Data Across Alibaba Cloud Accounts	Select No to use an instance in the current Alibaba Cloud account.
	Instance ID	Select the ID of the source PolarDB-X 2.0 instance.
	Database Account	Enter the account for the source database. For required permissions, see Permissions required for database accounts.
	Database Password	Enter the password for the source database account.
Destination Database	Select Existing Connection	Select a registered database instance to auto-fill the connection parameters, or configure the connection manually.
	Database Type	Select SelectDB.
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	Select the region of the destination SelectDB instance.
	Replicate Data Across Alibaba Cloud Accounts	Select No to use an instance in the current Alibaba Cloud account.
	Instance ID	Select the ID of the destination SelectDB instance.
	Database Account	Enter the account for the destination database. For required permissions, see Permissions required for database accounts.
	Database Password	Enter the password for the destination database account.

Click Test Connectivity and Proceed.
DTS server CIDR blocks must be added to the security settings of both the source and destination databases. This may happen automatically, or you may need to add them manually. See Add the CIDR blocks of DTS servers.

Step 3: Select objects to synchronize

In the Configure Objects step, configure the following parameters:

Parameter	Description
Synchronization Types	Select Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization. Incremental Data Synchronization is selected by default. Full data synchronization loads the existing data in the source into the destination before incremental synchronization begins. Important If you do not select Schema Synchronization, create the Unique or Duplicate model tables in the destination SelectDB instance manually before starting the task. See Data type mapping, Additional columns for Duplicate engine tables, and Data model.
Processing Mode of Conflicting Tables	Precheck and Report Errors: The precheck fails if the destination contains tables with the same names as source tables. This is the safe default. If identical table names exist and cannot be deleted or renamed in the destination, use the object name mapping feature to rename them. See Map object names. Ignore Errors and Proceed: Skips the identical table name check. Warning This may cause data inconsistency. During full synchronization, records with matching primary or unique key values are skipped; the existing destination record is kept. During incremental synchronization, matching records overwrite the destination. If schemas differ, initialization may fail or only some columns synchronize.
Capitalization of Object Names in Destination Instance	Controls the case of database, table, and column names in the destination. The default is DTS default policy. See Specify the capitalization of object names in the destination instance.
Source Objects	Select databases or tables from the Source Objects section and click the icon to move them to Selected Objects.
Selected Objects	Right-click an object to rename it, filter rows with WHERE conditions, or select specific SQL operations to synchronize. If you selected Schema Synchronization, right-click a table, enable Parameter Settings, set `bucket_count` to a positive integer, and click OK. Note Renaming an object with the object name mapping feature may cause dependent objects to fail synchronization. See Specify filter conditions.

Step 4: Configure advanced settings

Click Next: Advanced Settings and configure the following parameters:

Parameter	Description
Dedicated Cluster for Task Scheduling	By default, DTS uses the shared cluster. For higher stability, purchase and select a dedicated cluster. See What is a DTS dedicated cluster.
Retry Time for Failed Connections	How long DTS retries failed connections before failing the task. Valid values: 10–1440 minutes. Default: 720. We recommend that you set this parameter to a value greater than 30. If multiple tasks share the same source or destination, the shortest retry time takes precedence. DTS charges for the instance during retries.
Retry Time for Other Issues	How long DTS retries failed DDL or DML operations before failing the task. Valid values: 1–1440 minutes. Default: 10. We recommend that you set this parameter to a value greater than 10. Must be less than Retry Time for Failed Connections.
Enable Throttling for Full Data Synchronization	Limit the read/write load on source and destination during full data synchronization by setting Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s). Available only when Full Data Synchronization is selected.
Enable Throttling for Incremental Data Synchronization	Limit the load during incremental synchronization by setting RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s).
Whether to delete SQL operations on heartbeat tables of forward and reverse tasks	Yes: DTS does not write heartbeat operations to the source database. The displayed synchronization latency may be inaccurate if no DML activity occurs on the source for an extended period. No: DTS writes heartbeat operations, which may affect physical backup and cloning of the source database.
Environment Tag	An optional tag to identify the DTS instance.
Configure ETL	Yes: Configure extract, transform, and load (ETL) processing with data processing statements. See Configure ETL in a data migration or data synchronization task. No: Skip ETL configuration.
Monitoring and Alerting	Yes: Configure alert thresholds and notification recipients for task failures or latency spikes. See Configure monitoring and alerting when you create a DTS task. No: No alerts.

Step 5: (Optional) Configure database and table fields

Click Next: Configure Database and Table Fields to specify how DTS maps source table structures to the destination.

This step is available only if Schema Synchronization is selected. Set Definition Status to All to view and edit all tables.

Parameter	Description
Primary Key Column	Select one or more columns as the primary key in the destination.
Distribution Key	Select one or more columns from the primary key columns as the distribution key.
Engine	Select the destination table engine: Unique or Duplicate. If the source table has no primary key or UNIQUE constraint, select duplicate. Using the wrong engine may cause the task to fail or result in data loss.

Step 6: Run a precheck

Click Next: Save Task Settings and Precheck.

To preview the API parameters for this configuration, hover over the button and click Preview OpenAPI parameters before proceeding.

DTS runs a precheck before the task starts. If the precheck fails:

Click View Details next to a failed item, fix the reported issue, and click Precheck Again.
If an item generates an alert that can be ignored, click Confirm Alert Details, then Ignore, then OK, and then Precheck Again. Ignoring alerts may result in data inconsistency.

Step 7: Purchase an instance

Wait until Success Rate reaches 100%, then click Next: Purchase Instance.

On the buy page, configure the following parameters:

Section	Parameter	Description
New Instance Class	Billing Method	Subscription: Pay upfront for a fixed term. More cost-effective for long-term use. Pay-as-you-go: Billed hourly. Release the instance when it is no longer needed to stop charges.
	Resource Group Settings	The resource group for this instance. Default: default resource group. See What is Resource Management?
	Instance Class	The synchronization speed varies by instance class. See Instance classes of data synchronization instances.
	Subscription Duration	Available only for the Subscription billing method. Valid values: 1–9 months, 1 year, 2 years, 3 years, or 5 years.

Read and accept Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, then click OK in the confirmation dialog.

The task appears in the task list. Monitor its progress there.

Data type mapping

DTS automatically converts PolarDB-X 2.0 data types to compatible SelectDB types during schema synchronization. Review the mappings before synchronizing to identify any columns that require schema adjustments.

Category	PolarDB-X 2.0 type	SelectDB type	Notes
Numeric	TINYINT	TINYINT
	TINYINT UNSIGNED	SMALLINT	Unsigned range exceeds TINYINT; promotes to SMALLINT.
	SMALLINT	SMALLINT
	SMALLINT UNSIGNED	INT
	MEDIUMINT	INT
	MEDIUMINT UNSIGNED	BIGINT
	INT	INT
	INT UNSIGNED	BIGINT
	BIGINT	BIGINT
	BIGINT UNSIGNED	LARGEINT
	BIT(M)	INT
Decimal	DECIMAL	DECIMAL	`zerofill` is not supported.
	NUMERIC	DECIMAL
	FLOAT	FLOAT
	DOUBLE	DOUBLE
	BOOL / BOOLEAN	BOOLEAN
Date and time	DATE	DATEV2
	DATETIME[(fsp)]	DATETIMEV2
	TIMESTAMP[(fsp)]	DATETIMEV2
	TIME[(fsp)]	VARCHAR	SelectDB has no TIME type; stored as a string.
	YEAR[(4)]	INT
String	CHAR, VARCHAR	VARCHAR	Converted to VARCHAR(4xn) to prevent data loss, because PolarDB-X 2.0 counts characters while SelectDB counts bytes. If no length is specified, defaults to VARCHAR(65533). If the converted length exceeds 65533, the type becomes STRING.
	BINARY, VARBINARY	STRING
	TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT	STRING
	TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB	STRING
	ENUM	STRING
	SET	STRING
	JSON	STRING

Additional columns for Duplicate engine tables

DTS automatically adds the following columns to Duplicate engine tables in the destination. If you created the destination table manually (without using DTS schema synchronization), add these columns before starting the task.

Column	Data type	Description
`_is_deleted`	INT	Indicates whether the record was deleted. `0` for INSERT and UPDATE; `1` for DELETE.
`_version`	BIGINT	Version timestamp. `0` for records loaded during full synchronization. For incremental records, the timestamp (in seconds) from the source binlog.
`_record_id`	BIGINT	Unique record identifier. `0` for full synchronization records. For incremental records, the unique, incrementing record ID from the incremental log.

Use these columns to identify and remove duplicate records when deduplication is required.