Use the Task Orchestration feature in Data Management (DMS) to build a scheduled, multi-step task flow that automatically archives historical data. By the end of this tutorial, you will have a working task flow that:
Creates a monthly archive table for records older than six months
Copies the qualifying rows from the source table into the archive table
Deletes the migrated rows from the source table
Runs automatically at 01:00 on the first day of each month
Steps in this tutorial:
Prerequisites
Before you begin, ensure that you have:
A database instance registered with DMS. For details, see Register an Alibaba Cloud database instance
An instance with a Normal status. To check, go to the DMS console home page, hover over the instance in the left-side navigation pane, and read the status tooltip
Owner access to the task flow, if the instance control mode is Flexible Management or Stable Change. This requirement does not apply to instances in Security Collaboration mode. For details, see Overview
Supported databases
Task Orchestration supports the following database types:
Relational databases
MySQL: ApsaraDB RDS for MySQL, PolarDB for MySQL, ApsaraDB MyBase for MySQL, PolarDB for Xscale, and MySQL databases from other sources
SQL Server: ApsaraDB RDS for SQL Server, ApsaraDB MyBase for SQL Server, and SQL Server databases from other sources
PostgreSQL: ApsaraDB RDS for PostgreSQL, PolarDB for PostgreSQL, ApsaraDB MyBase for PostgreSQL, and PostgreSQL databases from other sources
OceanBase: ApsaraDB for OceanBase in MySQL mode, ApsaraDB for OceanBase in Oracle mode, and self-managed OceanBase databases
PolarDB for PostgreSQL (Compatible with Oracle)
Oracle
Dameng (DM)
Db2
NoSQL database: Lindorm
Data warehouses: AnalyticDB for MySQL, AnalyticDB for PostgreSQL, Data Lake Analytics (DLA), MaxCompute, Hologres
Object storage: Object Storage Service (OSS)
Set up the source table
This tutorial uses a source table named test. Run the following statements to create the table and insert sample data:
-- Create the source table
CREATE TABLE test (
`id` bigint(20) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT COMMENT 'Primary key',
`gmt_create` DATETIME NOT NULL COMMENT 'Creation time',
`gmt_modified` DATETIME NOT NULL COMMENT 'Modification time',
`content` TEXT COMMENT 'Test data'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Table for a task orchestration test';
-- Insert sample data
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-01-01 01:00:00', '2020-01-01 01:00:00', 'value1');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-02-01 01:00:00', '2020-02-01 01:00:00', 'value2');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-03-01 01:00:00', '2020-03-01 01:00:00', 'value3');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-04-01 01:00:00', '2020-04-01 01:00:00', 'value4');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-05-01 01:00:00', '2020-05-01 01:00:00', 'value5');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-06-01 01:00:00', '2020-06-01 01:00:00', 'value6');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-07-01 01:00:00', '2020-07-01 01:00:00', 'value7');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-08-01 01:00:00', '2020-08-01 01:00:00', 'value8');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-09-01 01:00:00', '2020-09-01 01:00:00', 'value9');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-10-01 01:00:00', '2020-10-01 01:00:00', 'value10');You can run these statements on the SQLConsole tab in DMS or by submitting a data change ticket. For details, see Manage a database on the SQLConsole tab and Normal Data Modify.
Step 1: Create a task flow
Log on to the DMS console V5.0.
In the top navigation bar, choose All Features > Data + AI > Data Development > Task Orchestration.
If you are using the normal mode of the DMS console, hover over the icon in the upper-left corner and choose Data + AI > Data Development > Task Orchestration.
Click Create Task Flow.
In the New Task Flow dialog box, enter a Task Flow Name and Description, then click OK.
Step 2: Configure time variables
The task flow uses two time variables as SQL filter conditions so the archival window automatically shifts back six months relative to the current date. All nodes in the task flow can reference these variables using the syntax ${variable_name}.
On the Variable Configuration tab at the bottom of the page, click Task Flow Variable in the left-side pane and create the following two variables:
| Variable | Time format | Offset | Description | Example (if today is June 27, 2023) |
|---|---|---|---|---|
yearmonth6_name | yyyy-MM | -, 6, Month | Month six months prior, accurate to the month | 2022-12 |
yearmonth6_date | yyyy-MM-01 | -, 6, Month | First day of the month six months prior, accurate to the day | 2022-12-01 |
yearmonth6_nameis used as a suffix in the archive table name (for example,test_2022-12).yearmonth6_dateis used as the cutoff date in theWHEREclause of both SQL nodes.
For the full variable syntax and options, see Variables.
Step 3: Create and configure nodes
The task flow requires two Single Instance SQL nodes: one to migrate data into the archive table, and one to delete the migrated rows from the source table.
Create the Migrate Data node
This node creates a monthly archive table (if it does not already exist) and copies all records older than six months from test into it.
From the Task Type panel on the left side of the canvas, drag a Single Instance SQL node onto the canvas.
Right-click the node and select Rename. Enter Migrate Data as the node name.
Double-click the node to open the SQL editor.
Select your database in the database search box and enter the following SQL:
CREATE TABLE IF NOT EXISTS `test_${yearmonth6_name}` LIKE test; INSERT INTO `test_${yearmonth6_name}` SELECT * FROM `test` WHERE gmt_create < '${yearmonth6_date}';The first statement creates
test_${yearmonth6_name}with the same schema astest, skipping creation if the table already exists. The second statement copies all rows with agmt_createdate before the six-month cutoff into the archive table.Click Preview to validate the SQL statements.
Create the Clear Historical Data node
This node deletes from test all rows that were copied to the archive table.
Drag another Single Instance SQL node onto the canvas.
Right-click the node and select Rename. Enter Clear Historical Data as the node name.
Double-click the node to open the SQL editor.
Select your database in the database search box and enter the following SQL:
WarningExecute the DELETE statement during off-peak hours. Deleting large volumes of data can hold table locks for an extended period and interrupt other operations.
DELETE FROM `test` WHERE gmt_create < '${yearmonth6_date}';This statement removes all rows already copied to the archive table, keeping
testfree of records older than six months.Click Preview to validate the SQL statement.
Step 4: Connect nodes and set up scheduling
Connect the nodes
On the canvas, hover over the Migrate Data node, click the hollow circle on its right side, and drag the connection line to the Clear Historical Data node. This defines the execution order: Migrate Data runs first, then Clear Historical Data.
Configure scheduling
At the bottom of the page, click the Task Flow Information tab.
In the Scheduling Settings section, configure the following parameters:
Parameter Value Enable scheduling Turn on Scheduling type Timed Scheduling/Cyclic scheduling Effective time 1970-01-01 – 9999-01-01 (default; the schedule runs indefinitely) Scheduling cycle Month Specified time 1st of each month Specific point in time 01:00 Time zone Your preferred time zone Cron expression Auto-generated based on the settings above This example schedules the task flow to run at 01:00 on the first day of each month. Adjust the settings to fit your requirements. For more options, see Overview.
Step 5: Run the task flow and verify the results
Test run
Click Try Run in the upper-left corner of the canvas. Check the log output:
status SUCCEEDEDin the last line: the test run completed successfully.status FAILEDin the last line: the test run failed. Review the logs to identify which node failed and why, fix the configuration, and try again.
Verify the results
On the SQLConsole tab, run queries to confirm that:
Records with
gmt_createbefore the six-month cutoff have been copied to the archive tabletest_${yearmonth6_name}.Those same records no longer exist in the
testtable.
For details on querying data, see Manage a database on the SQLConsole tab.