RDS PostgreSQL as a real-time analytics database for RDS MySQL - ApsaraDB RDS

This topic describes how to migrate data from an ApsaraDB RDS for MySQL instance to an ApsaraDB RDS for PostgreSQL instance and use the PostgreSQL instance for real-time data analytics.

Background information

If you want to analyze data from an ApsaraDB RDS for MySQL instance using PostgreSQL features, or process spatio-temporal data or analyze user profile data using GIS, you can use Data Transmission Service (DTS) to migrate data from the MySQL instance to a PostgreSQL instance. You can then use the PostgreSQL instance as a real-time analytics database for your MySQL instance.

Prerequisites

An ApsaraDB RDS for MySQL instance is created. This instance is known as the source instance. For more information, see (Deprecated, redirected to "Step 1") Create an ApsaraDB RDS for MySQL instance.
An ApsaraDB RDS for PostgreSQL instance is created. This instance is known as the destination instance. For more information, see Create an ApsaraDB RDS for PostgreSQL instance.
The available storage of the destination instance must be greater than the storage occupied by the source instance.

Procedure

Prepare test data on the source instance.
Create a migration task.
Migrate the data of the source instance to the destination instance.

Prepare test data on the source instance

Connect to the source instance.

mysql -h <Endpoint> -u <Username> -P <Port> -p

Create a test database named db1.
```
CREATE DATABASE db1;
```
Log on to the db1 database.
```
USE db1;
```

Create a test table named test_mm and a test table named test_innodb.

CREATE TABLE `test_mm` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`user_id` VARCHAR (20) NOT NULL,
`group_id` INT (11) NOT NULL,
`create_time` datetime NOT NULL,
PRIMARY KEY (`id`),KEY `index_user_id` (`user_id`) USING HASH
) ENGINE = innodb AUTO_INCREMENT = 1 
DEFAULT CHARSET = utf8;

CREATE TABLE `test_innodb` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`user_id` VARCHAR (20) NOT NULL,
`group_id` INT (11) NOT NULL,
`create_time` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `index_user_id` (`user_id`) USING HASH
) ENGINE = innodb AUTO_INCREMENT = 1 
DEFAULT CHARSET = utf8;

Create a rand_string function that is used to generate random strings.

delimiter $$
CREATE FUNCTION rand_string(n int) RETURNS varchar(255)
begin
declare chars_str varchar(100)
default "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
declare return_str varchar(255) default "";
declare i int default 0;
while i < n do
set return_str=concat(return_str,substring(chars_str,floor(1+rand()*62),1));
set i= i+1;
end while;
return return_str;
end $$
delimiter ;

Create a stored procedure that is used to insert test data.

delimiter $$
CREATE PROCEDURE `insert_data`(IN n int)
BEGIN
DECLARE i INT DEFAULT 1;
WHILE (i <= n ) DO
INSERT into test_mm (user_id,group_id,create_time ) VALUEs
(rand_string(20),FLOOR(RAND() * 100) ,now() );
set i=i+1;
END WHILE;
END $$
delimiter ;

Call the stored procedure that you created.

CALL insert_data(1000000);
INSERT INTO test_innodb SELECT * FROM test_mm;

Create a migration task

Before you create a migration task, you must log on to the ApsaraDB RDS console and create a database in the destination instance. The database that you create is the destination database to which you can migrate the data of the source instance. For more information, see Create a database.
Note
In the example provided in this section, the database that you create is named db2.
Use one of the following methods to go to the Data Migration page and select the region in which the data migration instance resides.
DTS console
1. Log on to the DTS console.
2. In the left-side navigation pane, click Data Migration.
3. In the upper-left corner of the page, select the region in which the data migration instance resides.
DMS console
Note
The actual operation may vary based on the mode and layout of the DMS console. For more information, see Simple mode and Customize the layout and style of the DMS console.
1. Log on to the DMS console.
2. In the top navigation bar, move the pointer over Data + AI > DTS (DTS) > Data Migration.
3. From the drop-down list to the right of Data Migration Tasks, select the region in which the data synchronization instance resides.
Click Create Task to go to the task configuration page.

Specify the information about the source database and destination database.

Section	Parameter	Description
N/A	Task Name	The name of the DTS task. DTS automatically generates a task name. We recommend that you specify a descriptive name that makes it easy to identify the task. You do not need to specify a unique task name.
Source Database	Database Type	Select MySQL.
	Access Method	Select Cloud Instance.
	Instance Region	Select the region where the source ApsaraDB RDS for MySQL instance resides.
	Cross-account Migration	In this example, the migration is performed between instances that belong to the same Alibaba Cloud account. Select No.
	RDS Instance ID	Select the ID of the source ApsaraDB RDS for MySQL instance.
	Database Account	Enter the username of the account that has permissions on the source ApsaraDB RDS for MySQL instance. The account must have the following permissions: The SELECT permission that is required for schema migration. The SELECT permission that is required for full data migration. The REPLICATION CLIENT permission, REPLICATION SLAVE permission, SHOW VIEW permission, and SELECT permission that are required for incremental data migration.
	Database Password	The password that is used to access the database.
	Encryption	Specifies whether to encrypt the connection to the database. You can select Non-encrypted or SSL-encrypted based on your business requirements. If you want to set this parameter to SSL-encrypted, you must enable SSL encryption for the ApsaraDB RDS for MySQL instance before you configure the DTS task. For more information, see Use a cloud certificate to enable SSL encryption.
Destination Database	Database Type	Select PostgreSQL.
	Access Method	Select Cloud Instance.
	Instance Region	Select the region where the destination ApsaraDB RDS for PostgreSQL instance resides.
	Instance ID	Select the ID of the destination ApsaraDB RDS for PostgreSQL instance.
	Database Name	Enter the name of the database to which the objects will be migrated in the destination ApsaraDB RDS for PostgreSQL instance.
	Database Account	Enter the username of the account that has permissions on the destination ApsaraDB RDS for PostgreSQL instance.
	Database Password	The password that is used to access the database.
	Encryption	Specifies whether to encrypt the connection to the source database. You can configure this parameter based on your business requirements. In this example, Non-encrypted is selected. If you want to establish an SSL-encrypted connection to the source database, perform the following steps: Select SSL-encrypted, upload CA Certificate, Client Certificate, and Private Key of Client Certificate as needed, and then specify Private Key Password of Client Certificate. Note If you set Encryption to SSL-encrypted for a self-managed PostgreSQL database, you must upload CA Certificate. If you want to use the client certificate, you must upload Client Certificate and Private Key of Client Certificate and specify Private Key Password of Client Certificate. For information about how to configure SSL encryption for an ApsaraDB RDS for PostgreSQL instance, see SSL encryption.

In the lower part of the page, click Test Connectivity and Proceed.
Note
- Make sure that the CIDR blocks of DTS servers can be automatically or manually added to the security settings of the source and destination databases to allow access from DTS servers. For more information, see Add the CIDR blocks of DTS servers.
- If the source or destination database is a self-managed database and its Access Method is not set to Alibaba Cloud Instance, click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.

Configure the objects to be migrated.

On the Configure Objects page, configure the objects that you want to migrate.

Parameter	Description
Migration Types	Select Schema Migration, Full Data Migration, and Incremental Data Migration.
Processing Mode of Conflicting Tables	Precheck and Report Errors: checks whether the destination database contains tables that use the same names as tables in the source database. If the source and destination databases do not contain tables that have identical table names, the precheck is passed. Otherwise, an error is returned during the precheck and the data migration task cannot be started. Note If the source and destination databases contain tables with identical names and the tables in the destination database cannot be deleted or renamed, you can use the object name mapping feature to rename the tables that are migrated to the destination database. For more information, see Map object names. Ignore Errors and Proceed: skips the precheck for identical table names in the source and destination databases. Warning If you select Ignore Errors and Proceed, data inconsistency may occur and your business may be exposed to the following potential risks: If the source and destination databases have the same schema, and a data record has the same primary key as an existing data record in the destination database, the following scenarios may occur: During full data migration, DTS does not migrate the data record to the destination database. The existing data record in the destination database is retained. During incremental data migration, DTS migrates the data record to the destination database. The existing data record in the destination database is overwritten. If the source and destination databases have different schemas, only specific columns are migrated or the data migration task fails. Proceed with caution.
Capitalization of Object Names in Destination Instance	The capitalization of database names, table names, and column names in the destination instance. By default, DTS default policy is selected. You can select other options to make sure that the capitalization of object names is consistent with that of the source or destination database. For more information, see Specify the capitalization of object names in the destination instance.
Source Objects	Select one or more objects from the Source Objects section. Click the icon to add the objects to the Selected Objects section. Note You can select databases, tables, or columns as the objects to be migrated.
Selected Objects	To rename an object that you want to migrate to the destination instance, right-click the object in the Selected Objects section. For more information, see Map the name of a single object. To rename multiple objects at a time, click Batch Edit in the upper-right corner of the Selected Objects section. For more information, see Map multiple object names at a time. Note Timestamp fields are supported. When a time field in the source database is 0, the time field in the destination database is automatically converted to null. To set a WHERE condition to filter data, right-click the table that you want to migrate in the Selected Objects section and configure the WHERE condition in the dialog box that appears. For more information, see Configure a filter condition. If you use the object name mapping feature, other objects that depend on this object may fail to be migrated.

Click Next: Advanced Settings to configure advanced settings.

Parameter	Description
Dedicated Cluster for Task Scheduling	By default, DTS schedules the data migration task to the shared cluster if you do not specify a dedicated cluster. If you want to improve the stability of data migration tasks, purchase a dedicated cluster. For more information, see What is a DTS dedicated cluster.
Retry Time for Failed Connections	The retry time range for failed connections. If the source or destination database fails to be connected after the data migration task is started, DTS immediately retries a connection within the retry time range. Valid values: 10 to 1,440. Unit: minutes. Default value: 720. We recommend that you set the parameter to a value greater than 30. If DTS is reconnected to the source and destination databases within the specified retry time range, DTS resumes the data migration task. Otherwise, the data migration task fails. Note If you specify different retry time ranges for multiple data migration tasks that share the same source or destination database, the value that is specified later takes precedence. When DTS retries a connection, you are charged for the DTS instance. We recommend that you specify the retry time range based on your business requirements. You can also release the DTS instance at the earliest opportunity after the source database and destination instance are released.
Retry Time for Other Issues	The retry time range for other issues. For example, if DDL or DML operations fail to be performed after the data migration task is started, DTS immediately retries the operations within the retry time range. Valid values: 1 to 1440. Unit: minutes. Default value: 10. We recommend that you set the parameter to a value greater than 10. If the failed operations are successfully performed within the specified retry time range, DTS resumes the data migration task. Otherwise, the data migration task fails. Important The value of the Retry Time for Other Issues parameter must be smaller than the value of the Retry Time for Failed Connections parameter.
Enable Throttling for Full Data Migration	Specifies whether to enable throttling for full data migration. During full data migration, DTS uses the read and write resources of the source and destination databases. This may increase the loads of the database servers. You can enable throttling for full data migration based on your business requirements. To configure throttling, you must configure the Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) parameters. This reduces the loads of the destination database server. Note You can configure this parameter only if you select Full Data Migration for the Migration Types parameter.
Enable Throttling for Incremental Data Migration	Specifies whether to enable throttling for incremental data migration. To configure throttling, you must configure the RPS of Incremental Data Migration and Data migration speed for incremental migration (MB/s) parameters. This reduces the loads of the destination database server. Note You can configure this parameter only if you select Incremental Data Migration for the Migration Types parameter.
Environment Tag	You can select an environment label to identify the instance as needed. No selection is required in this example.
Whether to delete SQL operations on heartbeat tables of forward and reverse tasks	Specifies whether to write SQL operations on heartbeat tables to the source database while the DTS instance is running. Valid values: Yes: does not write SQL operations on heartbeat tables. In this case, a latency of the DTS instance may be displayed. No: writes SQL operations on heartbeat tables. In this case, features such as physical backup and cloning of the source database may be affected.
Configure ETL	Specifies whether to enable the extract, transform, and load (ETL) feature. For more information, see What is ETL? Valid values: Yes: configures the ETL feature. You can enter data processing statements in the code editor. For more information, see Configure ETL in a data migration or data synchronization task. No: does not configure the ETL feature.
Monitoring and Alerting	Specifies whether to configure alerting for the data migration task. If the task fails or the migration latency exceeds the specified threshold, the alert contacts receive notifications. Valid values: No: does not configure alerting. Yes: configures alerting. In this case, you must also configure the alert threshold and alert notification settings. For more information, see the Configure monitoring and alerting when you create a DTS task section of the Configure monitoring and alerting topic.

Click Next Step: Data Verification to configure the data verification task.
For more information about how to use the data verification feature, see Configure a data verification task.

Save the task settings and run a precheck.
- To view the parameters to be specified when you call the relevant API operation to configure the DTS task, move the pointer over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
- If you do not need to view or have viewed the parameters, click Next: Save Task Settings and Precheck in the lower part of the page.
Note
- Before you can start the data migration task, DTS performs a precheck. You can start the data migration task only after the task passes the precheck.
- If the task fails to pass the precheck, click View Details next to each failed item. After you analyze the causes based on the check results, troubleshoot the issues. Then, run a precheck again.
- If an alert is triggered for an item during the precheck:
  If an alert item cannot be ignored, click View Details next to the failed item and troubleshoot the issues. Then, run a precheck again.
  If the alert item can be ignored, click Confirm Alert Details. In the View Details dialog box, click Ignore. In the message that appears, click OK. Then, click Precheck Again to run a precheck again. If you ignore the alert item, data inconsistency may occur, and your business may be exposed to potential risks.

Purchase an instance.

Wait until the Success Rate becomes 100%. Then, click Next: Purchase Instance.

On the Purchase Instance page, configure the Instance Class parameter for the data migration instance. The following table describes the parameters.

Section	Parameter	Description
New Instance Class	Resource Group	The resource group to which the data migration instance belongs. Default value: default resource group. For more information, see What is Resource Management?
New Instance Class	Instance Class	DTS provides instance classes that vary in the migration speed. You can select an instance class based on your business scenario. For more information, see Instance classes of data migration instances.

Read and agree to Data Transmission Service (Pay-as-you-go) Service Terms by selecting the check box.
Click Buy and Start. In the message that appears, click OK.
You can view the progress of the task on the Data Migration page.
Note
- If a data migration task cannot be used to migrate incremental data, the task automatically stops. The Completed is displayed in the Status section.
- If a data migration task can be used to migrate incremental data, the task does not automatically stop. The incremental data migration task never stops or completes. The Running is displayed in the Status section.

Real-time migration of MySQL data to PostgreSQL

View the progress of the full data migration operation.
1. Connect to the destination instance.
```
psql -h <Endpoint> -U <Username> -p <Port number> -d db2
```
  Note
  The db2 database that is created in the "Create a migration task" section runs as the destination database. Therefore, when you connect to the destination instance, you can specify the database name as db2.
2. Run the \dn command to check whether the db1 database is mapped as a schema named db1 in the db2 database.
```
   List of schemas
  Name  |   Owner
--------+------------
 db1    | test1
 public | pg*******
(2 rows)
```
  Note
  After the data of the db1 database in the source instance is migrated to the db2 database in the destination instance, a schema named db1 is generated in the db2 database to accommodate the data of the db1 database.
3. Run the \dt+ db1.* command to view the status of the tables in the db1 database.
```
                            List of relations
 Schema |    Name     | Type  | Owner | Persistence | Size  | Description
--------+-------------+-------+-------+-------------+-------+-------------
 db1    | test_innodb | table | test1 | permanent   | 65 MB |
 db1    | test_mm     | table | test1 | permanent   | 65 MB |
(2 rows)
                                
```
4. Run the following commands to query the number of data records in the test_innodb table and the number of data records in the test_mm table:
  - Query the number of data records in the test_innodb table.
```
SELECT COUNT(*) FROM db1.test_innodb;
```
```
  count
---------
 1000000
(1 row)
```
  - Query the number of data records in the test_mm table.
```
SELECT COUNT(*) FROM db1.test_mm;
```
```
  count
---------
 1000000
(1 row) 
```
  Note
  Because a MySQL database is mapped to a schema in PostgreSQL, you must specify the schema when you query data in db1.
  If you do not want to specify the schema every time when you query the data of the db1 database, you can set the search_path parameter.
  View the search_path.
  show search_path;
  search_path ----------------- "$user", public (1 row)
  Set the search_path.
  set search_path = db1, "$user", public;
  SET
  View the modification result.
  show search_path;
  search_path ---------------------- db1, "$user", public (1 row)

Test whether the data that is inserted into the source database is continuously migrated to the destination database.

Insert data into the source database.

INSERT INTO test_innodb (user_id, group_id, `create_time`) VALUES ('testuser', 1, '2021-07-29 12:00:00');

Result

SELECT * FROM test_Innodb WHERE user_id = 'testuser';

+---------+----------+----------+---------------------+
| id      | user_id  | group_id | create_time         |
+---------+----------+----------+---------------------+
| 1000001 | testuser |        1 | 2021-07-29 12:00:00 |
+---------+----------+----------+---------------------+
1 row in set (0.03 sec)

Check whether the data that is inserted into the source database is migrated to the destination database.

SELECT * FROM test_Innodb WHERE user_id = 'testuser';

   id    | user_id  | group_id |     create_time
---------+----------+----------+---------------------
 1000001 | testuser |        1 | 2021-07-29 12:00:00
(1 row)

Test whether the data that is updated in the source database is continuously migrated to the destination database.

Update data in the source database.

UPDATE test_innodb set group_id = 2 WHERE user_id = 'testuser';

Result:

SELECT * FROM test_innodb WHERE user_id = 'testuser';

+---------+----------+----------+---------------------+
| id      | user_id  | group_id | create_time         |
+---------+----------+----------+---------------------+
| 1000001 | testuser |        2 | 2021-07-29 12:00:00 |
+---------+----------+----------+---------------------+
1 row in set (0.03 sec)

Check whether the data that is updated in the source database is migrated to the destination database.

SELECT * FROM test_innodb WHERE user_id = 'testuser';

   id    | user_id  | group_id |     create_time
---------+----------+----------+---------------------
 1000001 | testuser |        2 | 2021-07-29 12:00:00
(1 row)

Delete the real-time data migration test.
1. Delete data from the source database.
```
DELETE FROM test_innodb WHERE user_id = 'testuser';
```
  Result:
  - Query the data record for which the value of the user_id field is testuser.
    SELECT * FROM test_innodb WHERE user_id = 'testuser';
    Empty set (0.03 sec)
  - Query the maximum value of the id field.
    SELECT MAX(id) FROM test_innodb;
    +---------+ | MAX(id) | +---------+ | 1000000 | +---------+ 1 row in set (0.03 sec)
  Note
  When no data is inserted, the maximum value of the id field in the return result is 1000000. After data is inserted, the value of the id field increases to 1000001. After data is deleted, the value of the id field decreases to 1000000.
2. Check whether the data that is deleted from the source database is also deleted from the destination database.
```
SELECT * FROM test_innodb WHERE user_id = 'testuser';
```
```
 id | user_id | group_id | create_time
----+---------+----------+-------------
(0 rows)
```
```
SELECT MAX(id) FROM test_innodb;
```
```
   max
---------
 1000000
(1 row)
```