Synchronize data from a self-managed MySQL to AnalyticDB for PostgreSQL - Data Transmission Service

Data Transmission Service (DTS) supports synchronizing data from a self-managed MySQL database connected over Express Connect, VPN Gateway, or Smart Access Gateway to an AnalyticDB for PostgreSQL instance. This enables you to easily transfer data and perform centralized analytics on your enterprise data.

Prerequisites

The self-managed MySQL database is version 5.1, 5.5, 5.6, 5.7, or 8.0.
The source tables to be synchronized must have a primary key.
Binary logging (binlog) must be enabled on the source database. We recommend that you create a dedicated account for data synchronization. For more information, see Create an account for a self-managed MySQL database and enable binlog.

Note
The account must have the REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT permissions on all objects to be synchronized.
The on-premises network that hosts the self-managed database must be connected to a VPC over Express Connect, VPN Gateway, or Smart Access Gateway. You have already established connectivity between your on-premises data center and the DTS service through a VPN Gateway.

Note
For more information about connection solutions, see Overview of solutions for connecting an on-premises data center to Alibaba Cloud. This topic does not provide detailed instructions.
You must have a destination AnalyticDB for PostgreSQL instance. To create one, see Create an AnalyticDB for PostgreSQL instance.

Precautions

During initial full data synchronization, DTS consumes read and write resources from both the source and destination databases, increasing the database load. If your databases have low performance, have low specifications, or are under high transaction loads (for example, the source database has a large number of slow SQL queries or tables without primary keys, or the destination database has deadlocks), the additional pressure may degrade performance or even cause service outages. Before you start, evaluate the performance of your source and destination databases. We recommend that you run the data synchronization task during off-peak hours, for example, when the CPU utilization of both databases is below 30%.

Billing

Synchronization type	Pricing
Schema synchronization and full data synchronization	Free of charge.
Incremental data synchronization	Charged. For more information, see Billing overview.

Limitations

You can select only tables as synchronization objects.
DTS does not support the synchronization of the following data types: BIT, VARBIT, GEOMETRY, ARRAY, UUID, TSQUERY, TSVECTOR, TXID_SNAPSHOT, and POINT.
Tables with prefix indexes cannot be synchronized and may cause task failure.
During data synchronization, do not use online DDL tools such as gh-ost or pt-online-schema-change to perform schema changes on the synchronization objects. Otherwise, the synchronization task will fail.

Supported SQL operations

DML operations: INSERT, UPDATE, and DELETE.
DDL operation: ADD COLUMN.

Note
The CREATE TABLE operation is not supported. To synchronize a new table, you must add it as a synchronization object.

Supported synchronization topologies

One-to-one, one-way synchronization.
One-to-many, one-way synchronization.
Many-to-one, one-way synchronization.

Term mappings

MySQL	AnalyticDB for PostgreSQL
Database	Schema
Table	Table

Procedure

Purchase a data synchronization task. For more information, see Purchase process.

Note
On the purchase page, set the source instance to MySQL, the destination instance to AnalyticDB for PostgreSQL, and the synchronization topology to One-way Synchronization.
Log on to the DTS console.
In the left-side navigation pane, click Data Synchronization.
At the top of the Synchronization Tasks page, select the region where the destination instance is located.
Find your data synchronization task and click Configure Synchronization Channel.

Configure the source and destination instances.

Category	Parameter	Description
N/A	Synchronization task name	DTS automatically generates a task name. We recommend that you specify an informative name for easier identification. The name does not need to be unique.
Source instance details	Instance type	Select User-Created Database Connected over Express Connect, VPN Gateway, or Smart Access Gateway.
	Instance region	The region of the source instance that you selected when you purchased the data synchronization task. This value cannot be changed.
	Peer VPC	Select the ID of the VPC to which the self-managed database is connected.
	Database type	The value is fixed as MySQL and cannot be changed.
	IP address	Enter the server IP address of the self-managed MySQL database.
	Port	Enter the service port of the self-managed database. The default value is 3306.
	Database account	Enter the database account of the self-managed MySQL database. Note The account must have the REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT permissions on all objects to be synchronized.
	Database password	Enter the password for the database account.
Destination instance details	Instance type	The value is fixed as AnalyticDB for PostgreSQL. No configuration is required.
	Instance region	The region of the destination instance that you selected when you purchased the data synchronization task. You cannot change this value.
	Instance ID	Select the ID of the destination AnalyticDB for PostgreSQL instance.
	Database name	Enter the name of the database that contains the destination tables.
	Database account	Enter the Initial Account of the AnalyticDB for PostgreSQL instance. For more information, see Create and manage database accounts. Note You can also enter an account that has the RDS_SUPERUSER permission. For information about how to create such an account, see Create and manage database accounts.
	Database password	Enter the password for the database account.

In the lower-right corner of the page, click Set Whitelist and Next.

If the source or destination database is an Alibaba Cloud database instance, such as an ApsaraDB RDS for MySQL or ApsaraDB for MongoDB instance, DTS automatically adds the CIDR blocks of DTS servers to the IP address whitelist of the instance. If the source or destination database is a self-managed database hosted on an Elastic Compute Service (ECS) instance, DTS automatically adds the CIDR blocks of DTS servers to the security group rules of the ECS instance, and you must make sure that the ECS instance can access the database. If the self-managed database is hosted on multiple ECS instances, you must manually add the CIDR blocks of DTS servers to the security group rules of each ECS instance. If the source or destination database is a self-managed database that is deployed in a data center or provided by a third-party cloud service provider, you must manually add the CIDR blocks of DTS servers to the IP address whitelist of the database to allow DTS to access the database. For more information, see Whitelist DTS server IP addresses.

Warning
Adding the public IP address blocks of the DTS service, either automatically or manually, may pose security risks. Using this product, you acknowledge that you understand and accept the potential security risks and that you must implement basic security measures. These measures include, but are not limited to, strengthening password security, limiting the ports open to each CIDR block, using authentication for internal API calls, and regularly checking and restricting unnecessary CIDR blocks. Alternatively, you can connect through a private network using a leased line, VPN Gateway, or Smart Access Gateway.

Configure the synchronization policy and objects.

Category	Parameter	Description
Synchronization policy	Initial synchronization	By default, you must select both Initial Schema Synchronization and Initial Full Data Synchronization. After the precheck is complete, DTS initializes the schema and data of the selected objects in the destination instance. This establishes the baseline for subsequent incremental synchronization.
	Processing mode for existing target tables	Clear Target Table Skips the Schema Name Conflict check item during the precheck phase. Clears the data in the destination table before full initialization. This is suitable for production synchronization after you complete testing the synchronization task. Ignore Errors and Proceed Skips the Schema Name Conflict check item during the precheck phase. During full data initialization, data is appended directly. This applies to data aggregation scenarios where data from multiple tables is synchronized into a single table.
	Synchronization type	Select the operation types to synchronize based on your business requirements: Insert Update Delete AlterTable
Select synchronization objects	N/A	In the Source Objects box, select the tables to synchronize, and then click the > icon () to move them to the Selected Objects box. Note You can select objects only at the table level. If you need the column names in the destination table to be different from those in the source table, use the field mapping feature of DTS. For more information, see Set object names in the destination instance.
Object name mapping	N/A	Change the names of synchronized objects in the destination instance. For more information, see Map databases, tables, and columns.
Replicate temporary tables during DMS online DDL operations	N/A	If you use Data Management (DMS) to perform online DDL changes on the source database, you can choose whether to synchronize the temporary tables generated by the DDL changes. Yes: Synchronizes the temporary tables generated by online DDL changes. Note If a large amount of temporary table data is generated by online DDL changes, the data synchronization task may be delayed. No: Does not synchronize the temporary tables generated by online DDL changes. Only the original DDL operations from the source database are synchronized. Note This option causes tables in the destination database to be locked.
Retry duration for failed connections	N/A	If DTS cannot connect to the source or destination instance, it retries for 720 minutes (12 hours) by default. You can also specify a custom retry duration. If DTS reconnects to the source or destination instance within the specified duration, the synchronization task automatically resumes. Otherwise, the task fails. Note You are billed for task run time during connection retries. Customize the retry duration based on your business needs, or release the DTS instance as soon as the source and destination instances are released.

Set the primary key and distribution columns for the synchronized tables in the AnalyticDB for PostgreSQL instance.

After you complete the settings, the Status column displays Defined. After you confirm the settings, click Pre-check and Start.

Note
This page appears only if you selected Initial Schema Synchronization in the previous step. For more information about primary key and distribution columns, see Table constraints and Table distribution.
After completing the preceding configurations, click Precheck and Start in the lower-right corner of the page.
Note
- A precheck runs before the synchronization task starts, and you can only start the task after it passes.
- If the precheck fails, click the icon next to the failed item to view the details.
  
  You can fix the issues based on the cause and run the precheck again.
  
  If you do not need to fix the items that triggered warnings, you can click Ignore or Ignore Warnings and Rerun Precheck to skip the warnings and run the precheck again.
After the Precheck dialog box shows that Precheck Passed, close the Precheck dialog box. The synchronization task starts.
Wait for the task to complete initialization and enter the Synchronizing state.

You can view the status of the data synchronization task on the Data Synchronization page.