How to synchronize data from PolarDB for MySQL to AnalyticDB for MySQL - Data Transmission Service

Data Transmission Service (DTS) synchronizes data from PolarDB for MySQL to AnalyticDB for MySQL for real-time analytics. After synchronization, you can use AnalyticDB for MySQL to build BI systems, interactive query systems, or real-time reporting systems.

Prerequisites

Before you begin, make sure that you have:

An AnalyticDB for MySQL cluster with enough storage for the data to synchronize. See Create an AnalyticDB for MySQL cluster
Binary logging enabled for the PolarDB for MySQL cluster. See Enable binary logging

Billing

Synchronization type	Fee
Schema synchronization and full data synchronization	Free of charge
Incremental data synchronization	Charged. See Billing overview.

Supported SQL operations

DDL operations: CREATE TABLE, DROP TABLE, RENAME TABLE, TRUNCATE TABLE, ADD COLUMN, DROP COLUMN, and MODIFY COLUMN
DML operations: INSERT, UPDATE, and DELETE

Note

If the data type of a field in the source table is changed during synchronization, an error is reported and the task is interrupted. See Troubleshoot the synchronization failure that occurs due to field type changes.

Required database account permissions

Database	Required permission
PolarDB for MySQL cluster	Read permissions on the objects to be synchronized
AnalyticDB for MySQL cluster	Read and write permissions on the required objects

To create and authorize a database account, see Create and manage a database account for PolarDB for MySQL and Create a database account for AnalyticDB for MySQL.

Data type mappings

See Data type mappings for schema synchronization.

Limitations

DTS uses read and write resources of the source and destination instances during initial full data synchronization. This may increase the database load. If instance performance is degraded, specifications are low, or data volume is large, database services may become unavailable. For example, DTS occupies a large amount of read and write resources when a large number of slow SQL queries run on the source instance, tables have no primary keys, or a deadlock occurs in the destination instance. Evaluate the performance impact before synchronization. Synchronize data during off-peak hours when CPU utilization of the source and destination instances is less than 30%.
Do not use gh-ost or pt-online-schema-change to perform DDL operations on synchronized objects during data synchronization. Otherwise, data may fail to be synchronized.
If the disk space usage of the nodes in an AnalyticDB for MySQL cluster exceeds 80%, the cluster is locked. Estimate the required disk space based on the objects to synchronize. Make sure that the destination cluster has enough storage.
Prefix indexes cannot be synchronized. If the source database contains prefix indexes, data may fail to be synchronized.

Procedure

Step 1: Purchase a DTS instance

Purchase a data synchronization instance. See Purchase a data synchronization instance.

On the buy page, set Source Instance to PolarDB, set Target Instance to AnalyticDB for MySQL, and set Synchronization Topology to One-way Synchronization.

Step 2: Configure source and destination databases

Log on to the DTS console.
If you are redirected to the Data Management (DMS) console, click the icon in the to go to the previous version of the DTS console.
In the left-side navigation pane, click Data Synchronization.
At the top of the Data Synchronization Tasks page, select the region where the destination instance resides.
Find the data synchronization instance and click Configure Task in the Actions column.

Configure the source and destination databases.

Section	Parameter	Description
N/A	Task Name	DTS automatically generates a task name. Specify a descriptive name to identify the task. A unique task name is not required.
Source Database	Database Type	Set to PolarDB for MySQL. This value cannot be changed.
	Instance Region	The source region you selected on the buy page. This value cannot be changed.
	PolarDB Cluster ID	The ID of the source PolarDB for MySQL cluster.
	Database Account	The database account of the source cluster. See Required database account permissions.
	Database Password	The password of the database account.
Destination Database	Database Type	Set to AnalyticDB for MySQL 3.0 under the Data Warehouse tab. This value cannot be changed.
	Instance Region	The destination region you selected on the buy page. This value cannot be changed.
	Instance ID	The ID of the destination AnalyticDB for MySQL cluster.
	Database Account	The account of the AnalyticDB for MySQL database. See Required database account permissions.
	Database Password	The password of the database account.

In the lower-right corner of the page, click Test Connectivity and Proceed. DTS automatically adds the CIDR blocks of DTS servers to the whitelist of Alibaba Cloud database instances (such as ApsaraDB RDS for MySQL or ApsaraDB for MongoDB). For self-managed databases hosted on Elastic Compute Service (ECS) instances, DTS automatically adds the CIDR blocks to the ECS security group rules. Make sure that the ECS instance can access the database. If the self-managed database is hosted on multiple ECS instances, manually add the CIDR blocks of DTS servers to each ECS security group. For databases in data centers or on third-party cloud platforms, manually add the CIDR blocks to the database whitelist. See Add the CIDR blocks of DTS servers.
Warning
Adding the CIDR blocks of DTS servers to the whitelist or ECS security group rules may pose security risks. Before using DTS, understand the risks and take preventive measures. These measures include strengthening username and password security, limiting exposed ports, authenticating API calls, regularly reviewing the whitelist or ECS security group rules, removing unauthorized CIDR blocks, and connecting to DTS through Express Connect, VPN Gateway, or Smart Access Gateway.

Step 3: Select synchronization policy and objects

Configure the synchronization policy and select the objects to synchronize.

Parameter	Description
Select the initial synchronization types	Select both Initial Schema Synchronization and Initial Full Data Synchronization. After the precheck completes, DTS synchronizes the schema and data of the selected objects from the source instance to the destination cluster. These form the basis for subsequent incremental synchronization.
Processing Mode In Existed Target Table	Precheck and Report Errors: Checks whether the source and destination databases contain tables with the same names. If no matching table names exist in the destination database, the precheck passes. Otherwise, an error is returned and the task cannot start. You can use the object name mapping feature to rename tables synchronized to the destination database. See Rename an object to be synchronized. Ignore Errors and Proceed: Skips the precheck for identical table names. Warning Selecting Ignore Errors and Proceed may cause data inconsistency. If the source and destination databases have the same schema, DTS does not synchronize data records that have the same primary keys as records in the destination database. If the schemas differ, initial data synchronization may fail. Only specific columns may be migrated, or the task may fail.
Merge Multi Tables	Yes: DTS adds the `__dts_data_source` column to each table to store data sources. DDL operations cannot be synchronized. No (default): DDL operations can be synchronized. If set to Yes, all selected source tables in the task are merged into a destination table. To merge only some source tables, create two data synchronization tasks.
Select the operation types to be synchronized	Select the operation types based on your business requirements. All operation types are selected by default. See Supported SQL operations.
Select the objects to be synchronized	Select one or more objects from the Available section and click the icon to add them to the Selected section. You can select tables or databases. If a database is selected, all schema changes in the database are synchronized. If a table is selected, only ADD COLUMN operations on the table are synchronized. By default, object names remain unchanged after synchronization. Use the object name mapping feature to rename objects synchronized to the destination cluster. See Rename an object to be synchronized.
Rename Databases and Tables	Use the object name mapping feature to rename objects synchronized to the destination instance. See Object name mapping.
Replicate Temporary Tables When DMS Performs DDL Operations	If DMS is used to perform online DDL operations on the source database, specify whether to synchronize temporary tables generated by online DDL operations. Yes: DTS synchronizes the data of temporary tables. If online DDL operations generate a large amount of data, the data synchronization task may be delayed. No: DTS does not synchronize the data of temporary tables. Only the original DDL data of the source database is synchronized. If No is selected, tables in the destination database may be locked.
Retry Time for Failed Connections	By default, if DTS fails to connect to the source or destination database, DTS retries within 720 minutes (12 hours). Specify the retry time based on your needs. If DTS reconnects within the specified time, the data synchronization task resumes. Otherwise, the task fails. DTS charges for the instance during retries. Specify the retry time based on your business needs. Release the DTS instance as soon as possible after the source and destination instances are released.

In the lower-right corner of the page, click Next.

Step 4: Configure table types

Specify a type for the tables to synchronize to the destination database.
After you select Initial Schema Synchronization, specify the type, primary key column, and partition key column for the tables to synchronize to the AnalyticDB for MySQL cluster. See CREATE TABLE.

Step 5: Run the precheck and start the task

In the lower-right corner of the page, click Precheck.
DTS performs a precheck before the data synchronization task starts. The task starts only after the precheck passes. If the precheck fails, click the icon next to each failed item to view details. After you troubleshoot the issues, run the precheck again. If troubleshooting is not needed, ignore the failed items and run a new precheck.
Close the Precheck dialog box after the message Precheck Passed appears. The data synchronization task starts automatically.
Wait until initial synchronization completes and the task enters the Synchronizing state. View the task status on the Data Synchronization Tasks page.

Data Transmission Service:Synchronize data from a PolarDB for MySQL cluster to an AnalyticDB for MySQL cluster