The task orchestration feature of Data Management (DMS) is used to orchestrate and schedule tasks. You can create a task flow composed of one or more task nodes to implement complex scheduling and improve data development efficiency.

Supported database types

  • Relational databases
    • MySQL: ApsaraDB RDS for MySQL, PolarDB for MySQL, MyBase for MySQL, PolarDB-X, and MySQL databases from other sources
    • SQL Server: ApsaraDB RDS for SQL Server, MyBase for SQL Server, and SQL Server databases from other sources
    • PostgreSQL: ApsaraDB RDS for PostgreSQL, PolarDB for PostgreSQL, MyBase for PostgreSQL, and PostgreSQL databases from other sources
    • OceanBase: ApsaraDB for OceanBase in MySQL mode, ApsaraDB for OceanBase in Oracle mode, and self-managed OceanBase databases
    • PolarDB for Oracle
    • Oracle
    • DM
    • Db2
  • Data warehouses
    • AnalyticDB for MySQL
    • AnalyticDB for PostgreSQL
    • DLA
    • MaxCompute
    • Hologres
  • Object storage: OSS

Procedure

  1. Go to the DMS console V5.0.
  2. In the top navigation bar, click DTS. In the left-side navigation pane, choose Data Development > Task Orchestration.
  3. Create a task flow.
    1. Click Create Task Flow.
      Note If you are using the previous version of the DMS console, click the Develop Space icon icon on the left-side navigation submenu of the Task Orchestration tab. On the page that appears, click New Task Flow.
    2. In the Create Task Flow dialog box, set the Task Flow Name and Description parameters as needed and click OK.
  4. In the Task Type list on the left side of the canvas, drag task nodes to the blank area on the canvas. For more information, see Task node types.
    Drag task nodes
  5. To configure a task node, click the task node on the canvas and then click the Settings icon icon. You can also double-click the task node to go to the configuration page.
  6. Connect the task nodes to form a task flow. Move the pointer over the upstream node, click and hold the circle on the right side of the upstream node, and then draw a line from the circle to the downstream node.
    For example, you can connect the SQL Assignment for Single Instance node to its downstream Conditional Branch node. To do this, move the pointer over the SQL Assignment for Single Instance node, click and hold the circle on the right side of the SQL Assignment for Single Instance node, and then draw a line from the circle to the Conditional Branch node. Connect nodes
  7. In the lower part of the page, configure and view information about the task flow.
    1. Click the Task Flow Information tab and configure the basic settings of the task flow.
      In the Properties section, set the Task Flow Name, Description, Owner, Stakeholders, Error Handling Policy, and Concurrency Control Policy parameters as needed.
    2. In the Scheduling Settings section, turn on Enable Scheduling to configure the scheduling cycle for the task flow.
      Table 1. Scheduling properties
      Parameter Description
      Scheduling Type The scheduling type of the task flow. Valid values:
      • Cyclic scheduling: The task flow is periodically scheduled. For example, the task flow is run once a week.
      • Schedule once: The task flow is run once at a specific point in time. You need to specify only the point in time when the task flow is run.
      Effective Time The period during which the scheduling properties take effect. The default time period is from January 1, 1970 to January 1, 9999, which indicates that the scheduling properties permanently take effect.
      Scheduling Cycle The scheduling cycle of the task flow. Valid values:
      • Hour: The task flow is run within the hours that you select.
      • Day: The task flow is run at the specified point in time every day.
      • Week: The task flow is run at the specified point in time on the days that you select every week.
      • Month: The task flow is run at the specified point in time on the days that you select every month.
      Timed Scheduling Specify one of the following scheduling methods:
      • Scheduling at a specific interval:
        • Starting Time: the time when the task flow starts to be run.
        • Intervals: the interval at which the task flow is run. Unit: hours.
        • End Time: the time when the task flow stops running.
        For example, you can set the Starting Time parameter to 00:00, the Intervals parameter to 6, and the End Time parameter to 20:59. In this case, DMS runs the task flow at 00:00, 06:00, 12:00, and 18:00.
      • Scheduling at the specified point in time: You must set the Specified Time parameter.

        For example, if you select 0Hour and 5Hour, DMS schedules the task flow at 00:00 and 05:00.

      Specified Time
      • If you set the Scheduling Cycle parameter to Week, you can select one or more days of a week from the drop-down list.
      • If you set the Scheduling Cycle parameter to Month, you can select one or more days of a month from the drop-down list.
      Specific Point in Time Specifies the point in time of the specified days at which the task flow is run.

      For example, if you set this parameter to 02:55, DMS runs the task flow at 02:55 on the specified days.

      Cron Expression The CRON expression that is automatically generated based on the specified scheduling cycle and time settings.
    3. To view the operation logs of the task flow, click the Operations tab.
    4. If you want to receive notifications about the execution status of the task flow, turn on the switches on the Notification Configurations tab as needed. The following list describes the switches:
      • Success Notification: You are notified if the task flow is run as expected.
      • Failure Notification: You are notified if the task flow fails.
      • Timeout Notification: You are notified if the execution of the task flow times out.
  8. Publish the task flow. For more information, see Publish a task flow.

Task node types

Category Task node type Description References
Data integration DTS data migration Migrates data of selected tables or all tables from a database to another database. This type of node supports full data migration and can migrate both data and schemas. Configure a DTS data migration node
Batch Integration Synchronizes data between data sources. You can use this type of node in scenarios such as data migration and data transmission. Configure a batch integration node
Data processing Single Instance SQL Executes SQL statements in a specific relational database. N/A
Cross-Database Spark SQL Uses the Spark engine to process and transmit a large amount of data across databases. This type of node applies to cross-database data synchronization and processing. Configure a cross-database Spark SQL node
Cross-Database SQL Uses dynamic SQL (DSQL) statements for data queries across databases. You can use this type of node to analyze data across databases and migrate a small amount of data. N/A
DLA Serverless Spark Configures Spark jobs based on the serverless Spark engine of Data Lake Analytics (DLA). Create and run Spark jobs
DLA Spark SQL Uses SQL statements to submit jobs to the Spark clusters of DLA. N/A
General operations SQL Assignment for Single Instance Assigns the data that is obtained by using the SELECT statement to its output variables. The output variables can be used as the input variables of the downstream node. Configure an SQL assignment node
Conditional Branch Makes conditional judgment in task flows. During the execution of a task flow, if the conditional expression of a conditional branch node evaluates to true, the subsequent tasks are run. Otherwise, the subsequent tasks are not run. Configure a conditional branch node
Script Uses Database Gateway-based script tasks to execute scripts periodically or at a specific point in time. Configure a script node
Status check Check Whether Data Exists in Table After Specified Time Checks whether incremental data exists in a table after a specific point in time. N/A
Lindorm File Check Checks whether a file exists in an ApsaraDB for Lindorm instance that supports Hadoop Distributed File System (HDFS). N/A
Audit Task Checks the data quality of a table. After you specify a quality rule for the table and a scheduling cycle for the audit task, DMS checks the data quality of the table and generates a report. N/A
Check for Task Flow Dependency Configures self-dependency for a task flow and dependencies across task flows. You can enable the task flow to depend on another task flow or a task node. N/A

References