All Products
Search
Document Center

:Regularly back up data from relational databases to OSS in DMS

Last Updated:Jul 27, 2020

Background

Data Management Service (DMS) is a web service platform provided by Alibaba Cloud for data operations, data security management, and data development. DMS serves as a database client, supporting multiple database engines, such as MySQL, PostgreSQL, SQL Server, Oracle, Redis, and MongoDB. DMS supports unified permission management and stable changes for registered databases. It also integrates data development capabilities including data warehouse development and task orchestration.
1

Scenarios of DMS-based data development

The data development capabilities provided by DMS can be used in various scenarios, such as:

  1. Offline data: Generate reports for offline data on a T+1 basis.
  2. Real-time data: Generate reports for data in the last five minutes.
  3. Intelligent data: Connect to computing models such as AI computing.
  4. Cold data: Regularly back up cold data to Object Storage Service (OSS).
  5. Transaction data: Regularly delete or update a large amount of data.

1

This topic describes how to regularly back up cold data and expired data to OSS by using the data development feature provided by DMS.

Back up cold data and expired data in DMS

To ensure stable operations, high performance, and high cost-effectiveness, ApsaraDB for RDS removes cold data and expired data based on the configured lifecycle. However, cold data and expired data cannot be directly deleted because they may be needed in some scenarios, such as historical record query and long-term data mining. To reduce costs, you need to migrate these data to a more cost-effective storage service. In Data Lake Analytics (DLA) provided by Alibaba Cloud, you can use an SQL-like syntax to read historical data and cold data from ApsaraDB for RDS and write the data to OSS, where you can store data at a lower cost.
The database management feature provided by DMS allows you to manage ApsaraDB for RDS. It also allows you to manage OSS through DLA. Therefore, you can conveniently back up data from ApsaraDB for RDS to OSS through DLA in the DMS console. In addition, the task orchestration feature provided by DMS allows you to regularly back up cold data to OSS through DLA.

Case background

This topic provides a specific use case, involving a table named orders in an ApsaraDB for RDS instance and a database named demo_schema in a DLA instance. In this use case, data in the orders table is backed up to the demo_schema database by day. Data of the previous day in the orders table is backed up at 05:00 every day. You can execute the following SQL statement to create the orders table.

  1. create table orders(
  2. order_id bigint,
  3. product_name varchar(32),
  4. price double,
  5. total_amount double,
  6. created_date date
  7. );

To create the demo_schema database in DLA, execute the following SQL statement:

  1. CREATE DATABASE demo_schema
  2. WITH DBPROPERTIES (
  3. catalog = 'oss',
  4. location = 'oss://xxxxxx/dla_demo/'
  5. )

2

Preparations

Execute the following SQL statements in the SQLConsole to insert some data to the orders table:

  1. insert into orders values(1, 'product1', 1.0, 10.9, date_add(curdate(), interval -1 day));
  2. insert into orders values(2, 'product1', 2.0, 20.9, date_add(curdate(), interval -1 day));
  3. insert into orders values(3, 'product1', 3.0, 30.9, date_add(curdate(), interval -1 day));
  4. insert into orders values(4, 'product1', 4.0, 40.9, date_add(curdate(), interval -1 day));
  5. insert into orders values(5, 'product1', 5.0, 50.9, curdate());
  6. insert into orders values(6, 'product1', 6.0, 60.9, curdate());
  7. insert into orders values(7, 'product1', 7.0, 70.9, curdate());
  8. insert into orders values(8, 'product1', 8.0, 80.9, curdate());

3

Implement a task flow

This section describes how to back up historical data from ApsaraDB for RDS to OSS.

Create a task flow

In the top navigation bar in the DMS console, choose Data Factory > Task orchestration. On the Task orchestration page, click New task flow in the Free orchestration tasks section. In the dialog box that appears, enter rds_data_to_oss in the Task Flow Name field and click OK.
5

Create DLA-SQL task nodes

On the rds_data_to_oss page that appears, create the following three DLA-SQL task nodes in sequence:

  1. Create a schema for mapping to the source: This task node creates a schema in DLA to map to an ApsaraDB for RDS instance.
  2. Create a backup table in OSS: This task node creates a table in OSS through DLA to store historical data from the ApsaraDB for RDS instance.
  3. Back up data: This task node backs up data of the previous day from the ApsaraDB for RDS instance to the backup table.

6

Create the Create a schema for mapping to the source task node

Drag a DLA-SQL task node to the canvas and click the task node. On the Node Name tab, set Task Name to Create a schema for mapping to the source. On the Content tab, write an SQL script for creating a schema, dla_mysql_rds, that maps to the ApsaraDB for RDS instance. The destination database of this task node is demo_schema in the DLA instance. By creating the dla_mysql_rds schema, you can read data from the ApsaraDB for RDS instance directly in DLA.

  1. CREATE SCHEMA if not exists dla_mysql_rds WITH DBPROPERTIES (
  2. CATALOG = 'mysql',
  3. LOCATION = 'jdbc:mysql://xxxxxx.rds.aliyuncs.com:3306/dmstest',
  4. USER = 'dmstest',
  5. PASSWORD = 'xxxxxxxxx',
  6. INSTANCE_ID = 'xxxxxxxxx',
  7. VPC_ID = 'xxxxxxxxx'
  8. );
  9. msck repair database dla_mysql_rds;

Before you create the schema that maps to the ApsaraDB for RDS instance, you must add the CIDR block 100.104.0.0/16 to the whitelist for accessing the ApsaraDB for RDS instance.The ApsaraDB for RDS instance is in a Virtual Private Cloud (VPC), and by default DLA has no access to resources in the VPC. To enable access to the ApsaraDB for RDS instance from DLA, you need to use the reverse VPC technology, that is, add the CIDR block 100.104.0.0/16 to the whitelist for accessing the ApsaraDB for RDS instance.

7

Create the Create a backup table in OSS task node

Drag a DLA-SQL task node to the canvas and click the task node. On the Node Name tab, set Task Name to Create a backup table in OSS. On the Content tab, write an SQL script for creating a backup table named oss_orders in OSS through DLA. The oss_orders table is used to store data migrated from the orders table in the ApsaraDB for RDS instance. It shares the same schema with the orders table and is partitioned by year, month, and day.

  1. CREATE EXTERNAL TABLE oss_orders (
  2. order_id bigint,
  3. product_name varchar(32),
  4. price double,
  5. total_amount double,
  6. created_date date)
  7. PARTITIONED BY (y string, m string, d string)
  8. STORED AS TEXTFILE
  9. LOCATION 'oss://xxxxxx/dla_demo/';

8

Create the Back up data task node

Drag a DLA-SQL task node to the canvas and click the task node. On the Node Name tab, set Task Name to Back up data. On the Variable Setting tab, define the time variables to be used in SQL statements for backing up data. The destination database of this task node is demo_schema in the DLA instance.

Define time variables

In this example, define the following three time variables:

  • year: the year of the previous day, in the format of yyyy.
  • month: the month of the previous day, in the format of mm.
  • day: the date of the previous day, in the format of dd.

9

Implement data backup

To implement data backup, follow these steps:

  • Create a temporary table in OSS through DLA
    Create a temporary table named oss_orders_tmp in the directory of the oss_orders table in OSS. The name of the directory needs to include year, month, and day. The temporary table automatically becomes a partition of the oss_orders table.
    The temporary table shares the same schema with the oss_orders table.
    The specific location of the temporary table depends on the current date.

  • Back up data by day
    Execute an INSERT INTO SELECT statement to read data from the orders table in dla_mysql_rds and write the data to the temporary table in OSS.

  • Update partition information and delete the temporary table
    Update the metadata information of the oss_orders table. Load data of the temporary table to the oss_orders table.
    Delete the temporary table.

SQL script for backing up data

  1. /* Create a temporary table */
  2. CREATE EXTERNAL TABLE oss_orders_tmp (
  3. order_id bigint,
  4. product_name varchar(32),
  5. price double,
  6. total_amount double,
  7. created_date date)
  8. STORED AS TEXTFILE
  9. LOCATION 'oss://xxxxxx/dla_demo/y=${year}/m=${month}/d=${day}'
  10. TBLPROPERTIES('auto.create.location'= 'true');
  11. /* Back up data by day */
  12. insert into oss_orders_tmp
  13. SELECT * FROM mysql_rds_to_oss.orders
  14. where DATE_FORMAT(created_date, '%Y') = '${year}' and
  15. DATE_FORMAT(created_date, '%m') = '${month}' and
  16. DATE_FORMAT(created_date, '%d') = '${day}';
  17. /* Update the partition information of the backup table and delete the temporary table */
  18. msck repair table oss_orders;
  19. drop table oss_orders_tmp;

Run the task flow

After the preceding task nodes are configured, click Try Run in the upper-left corner to run the task flow. Go to the SQLConsole and query the oss_orders table created in OSS through DLA.

Configure a scheduling policy

After you click Try Run, the system will check whether the task flow is correctly configured. After the task flow is successfully run, click the blank area. On the Scheduling tab that appears, set the task flow to be run at 05:00 every morning.
11

Considerations

  • Before you create the task flow, you need to create a table named orders in the ApsaraDB for RDS instance and a database named demo_schema in the DLA instance.
  • Make sure that one or more fields in the orders table record the time when a row of data is inserted.
  • If the ApsaraDB for RDS instance is managed in the Secure Collaboration mode, to insert data to the orders table, you need to apply for the permission to change the table and modify security rules to allow SQL statements to be executed in the SQLConsole.
  • If the ApsaraDB for RDS and DLA instances are both manged in the Secure Collaboration mode, to run SQL scripts written for the preceding task nodes, you need to apply for permissions and modify security rules to allow the scripts to be run.

Summary

With a specific use case, this topic describes in detail how to periodically back up cold data and expired data from ApsaraDB for RDS to OSS by using DLA in DMS. In this use case, stable operations and high performance of ApsaraDB for RDS are effectively guaranteed and the data storage cost is reduced. This use case shows the powerful capabilities provided by DMS for backing up cold data and expired data.

Except for backup of cold data and expired data, the data development feature provided by DMS can also be used in other scenarios. For more information, see DMS documentation.