Build Automated Task Workflows with DAG Scheduling - DMS

Use the Task Orchestration feature in Data Management (DMS) to build a scheduled, multi-step task flow that automatically archives historical data. By the end of this tutorial, you will have a working task flow that:

Creates a monthly archive table for records older than six months
Copies the qualifying rows from the source table into the archive table
Deletes the migrated rows from the source table
Runs automatically at 01:00 on the first day of each month

Steps in this tutorial:

Create a task flow
Configure time variables
Create and configure nodes
Connect nodes and set up scheduling
Run the task flow and verify the results

Prerequisites

Before you begin, ensure that you have:

A database instance registered with DMS. For details, see Register an Alibaba Cloud database instance
An instance with a Normal status. To check, go to the DMS console home page, hover over the instance in the left-side navigation pane, and read the status tooltip
Owner access to the task flow, if the instance control mode is Flexible Management or Stable Change. This requirement does not apply to instances in Security Collaboration mode. For details, see Overview

Supported databases

Task Orchestration supports the following database types:

Relational databases

MySQL: ApsaraDB RDS for MySQL, PolarDB for MySQL, ApsaraDB MyBase for MySQL, PolarDB for Xscale, and MySQL databases from other sources
SQL Server: ApsaraDB RDS for SQL Server, ApsaraDB MyBase for SQL Server, and SQL Server databases from other sources
PostgreSQL: ApsaraDB RDS for PostgreSQL, PolarDB for PostgreSQL, ApsaraDB MyBase for PostgreSQL, and PostgreSQL databases from other sources
OceanBase: ApsaraDB for OceanBase in MySQL mode, ApsaraDB for OceanBase in Oracle mode, and self-managed OceanBase databases
PolarDB for PostgreSQL (Compatible with Oracle)
Oracle
Dameng (DM)
Db2

NoSQL database: Lindorm

Data warehouses: AnalyticDB for MySQL, AnalyticDB for PostgreSQL, Data Lake Analytics (DLA), MaxCompute, Hologres

Object storage: Object Storage Service (OSS)

Set up the source table

This tutorial uses a source table named test. Run the following statements to create the table and insert sample data:

-- Create the source table
CREATE TABLE test (
    `id` bigint(20) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT COMMENT 'Primary key',
    `gmt_create` DATETIME NOT NULL COMMENT 'Creation time',
    `gmt_modified` DATETIME NOT NULL COMMENT 'Modification time',
    `content` TEXT COMMENT 'Test data'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Table for a task orchestration test';

-- Insert sample data
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-01-01 01:00:00', '2020-01-01 01:00:00', 'value1');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-02-01 01:00:00', '2020-02-01 01:00:00', 'value2');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-03-01 01:00:00', '2020-03-01 01:00:00', 'value3');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-04-01 01:00:00', '2020-04-01 01:00:00', 'value4');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-05-01 01:00:00', '2020-05-01 01:00:00', 'value5');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-06-01 01:00:00', '2020-06-01 01:00:00', 'value6');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-07-01 01:00:00', '2020-07-01 01:00:00', 'value7');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-08-01 01:00:00', '2020-08-01 01:00:00', 'value8');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-09-01 01:00:00', '2020-09-01 01:00:00', 'value9');
INSERT INTO test(`gmt_create`, `gmt_modified`, `content`) VALUES ('2020-10-01 01:00:00', '2020-10-01 01:00:00', 'value10');

You can run these statements on the SQLConsole tab in DMS or by submitting a data change ticket. For details, see Manage a database on the SQLConsole tab and Normal Data Modify.

Step 1: Create a task flow

Log on to the DMS console V5.0.
In the top navigation bar, choose All Features > Data + AI > Data Development > Task Orchestration.
If you are using the normal mode of the DMS console, hover over the icon in the upper-left corner and choose Data + AI > Data Development > Task Orchestration.
Click Create Task Flow.
In the New Task Flow dialog box, enter a Task Flow Name and Description, then click OK.

Step 2: Configure time variables

The task flow uses two time variables as SQL filter conditions so the archival window automatically shifts back six months relative to the current date. All nodes in the task flow can reference these variables using the syntax ${variable_name}.

On the Variable Configuration tab at the bottom of the page, click Task Flow Variable in the left-side pane and create the following two variables:

Variable	Time format	Offset	Description	Example (if today is June 27, 2023)
`yearmonth6_name`	`yyyy-MM`	-, 6, Month	Month six months prior, accurate to the month	`2022-12`
`yearmonth6_date`	`yyyy-MM-01`	-, 6, Month	First day of the month six months prior, accurate to the day	`2022-12-01`

yearmonth6_name is used as a suffix in the archive table name (for example, test_2022-12).
yearmonth6_date is used as the cutoff date in the WHERE clause of both SQL nodes.

For the full variable syntax and options, see Variables.

Step 3: Create and configure nodes

The task flow requires two Single Instance SQL nodes: one to migrate data into the archive table, and one to delete the migrated rows from the source table.

Create the Migrate Data node

This node creates a monthly archive table (if it does not already exist) and copies all records older than six months from test into it.

From the Task Type panel on the left side of the canvas, drag a Single Instance SQL node onto the canvas.
Right-click the node and select Rename. Enter Migrate Data as the node name.
Double-click the node to open the SQL editor.
Select your database in the database search box and enter the following SQL:
```
CREATE TABLE IF NOT EXISTS `test_${yearmonth6_name}` LIKE test;
INSERT INTO `test_${yearmonth6_name}`
SELECT * FROM `test`
WHERE gmt_create < '${yearmonth6_date}';
```
The first statement creates test_${yearmonth6_name} with the same schema as test, skipping creation if the table already exists. The second statement copies all rows with a gmt_create date before the six-month cutoff into the archive table.
Click Preview to validate the SQL statements.

Create the Clear Historical Data node

This node deletes from test all rows that were copied to the archive table.

Drag another Single Instance SQL node onto the canvas.
Right-click the node and select Rename. Enter Clear Historical Data as the node name.
Double-click the node to open the SQL editor.
Select your database in the database search box and enter the following SQL:
Warning
Execute the DELETE statement during off-peak hours. Deleting large volumes of data can hold table locks for an extended period and interrupt other operations.
```
DELETE FROM `test` WHERE gmt_create < '${yearmonth6_date}';
```
This statement removes all rows already copied to the archive table, keeping test free of records older than six months.
Click Preview to validate the SQL statement.

Step 4: Connect nodes and set up scheduling

Connect the nodes

On the canvas, hover over the Migrate Data node, click the hollow circle on its right side, and drag the connection line to the Clear Historical Data node. This defines the execution order: Migrate Data runs first, then Clear Historical Data.

Configure scheduling

At the bottom of the page, click the Task Flow Information tab.

In the Scheduling Settings section, configure the following parameters:

Parameter	Value
Enable scheduling	Turn on
Scheduling type	Timed Scheduling/Cyclic scheduling
Effective time	1970-01-01 – 9999-01-01 (default; the schedule runs indefinitely)
Scheduling cycle	Month
Specified time	1st of each month
Specific point in time	01:00
Time zone	Your preferred time zone
Cron expression	Auto-generated based on the settings above

This example schedules the task flow to run at 01:00 on the first day of each month. Adjust the settings to fit your requirements. For more options, see Overview.

Step 5: Run the task flow and verify the results

Test run

Click Try Run in the upper-left corner of the canvas. Check the log output:

status SUCCEEDED in the last line: the test run completed successfully.
status FAILED in the last line: the test run failed. Review the logs to identify which node failed and why, fix the configuration, and try again.

Verify the results

On the SQLConsole tab, run queries to confirm that:

Records with gmt_create before the six-month cutoff have been copied to the archive table test_${yearmonth6_name}.
Those same records no longer exist in the test table.

For details on querying data, see Manage a database on the SQLConsole tab.

What's next

References

Overview of task orchestration