The task orchestration feature of Data Management (DMS) is used to orchestrate and schedule tasks. You can create a task flow that contains one or more task nodes to implement complex scheduling and improve data development efficiency.

Supported database types

  • Relational databases
    • MySQL: ApsaraDB RDS for MySQL, PolarDB for MySQL, MyBase for MySQL, PolarDB for Xscale, and MySQL databases from other sources
    • SQL Server: ApsaraDB RDS for SQL Server, MyBase for SQL Server, and SQL Server databases from other sources
    • PostgreSQL: ApsaraDB RDS for PostgreSQL, PolarDB for PostgreSQL, MyBase for PostgreSQL, and PostgreSQL databases from other sources
    • OceanBase: ApsaraDB for OceanBase in MySQL mode, ApsaraDB for OceanBase in Oracle mode, and self-managed OceanBase databases
    • PolarDB for PostgreSQL(Compatible with Oracle)
    • Oracle
    • DM
    • Db2
  • NoSQL database: ApsaraDB for Lindorm
  • Data warehouses
    • AnalyticDB for MySQL
    • AnalyticDB for PostgreSQL
    • DLA
    • MaxCompute
    • Hologres
  • Object storage: OSS

Task orchestration flowchart

taskflowchart

Procedure

  1. Log on to the DMS console V5.0.
  2. In the top navigation bar, click DTS. In the left-side navigation pane, choose Data Development > Task Orchestration.
  3. Create a task flow.
    1. Click Create Task Flow.
    2. In the Create Task Flow dialog box, specify the Task Flow Name and Description parameters and click OK.
  4. Create task nodes.
    1. Add task nodes. In the Task Type list on the left side of the canvas, drag task nodes to the blank area on the canvas. For more information, see Task node types.
    2. Configure the task nodes. To configure a task node, click the task node on the canvas and then click the Settings icon icon. You can also double-click the task node to go to the configuration page.
    3. Optional:Connect the task nodes to form a task flow. Move the pointer over the upstream node, click and hold the circle on the right side of the upstream node, and then draw a line from the circle to the downstream node.
      For example, you can connect the SQL Assignment for Single Instance node to its downstream Conditional Branch node. To connect the nodes, move the pointer over the SQL Assignment for Single Instance node, click and hold the circle on the right side of the SQL Assignment for Single Instance node, and then draw a line from the circle to the Conditional Branch node.
  5. In the lower part of the page, configure and view information about the task flow.
    1. Click the Task Flow Information tab and specify the following parameters.
      ParameterDescription
      Task Flow NameThe name of the task flow.
      DescriptionThe purpose or objective of the task. Specify a clear description to reduce communication costs.
      OwnerThe owner can modify the task nodes and configurations of the task flow and perform test runs. The owner can also receive an alert if the task flow fails.
      Note After you change the owner, you must republish the task flow to make the change take effect.
      StakeholdersA stakeholder of a task flow can view the task flow and task configurations, and perform test runs on tasks. However, the stakeholder does not have permissions to edit the task flow and task configurations.
      Note DMS administrators and database administrators (DBAs) are the default stakeholders of each task flow. They can also change the owners of task flows.
      Error Handling PolicyThe action to take when an error occurs for the first time during the execution of a task flow. Valid values:
      1. Complete Running Tasks: If an error occurs, tasks that are running continue to run until the tasks are completed. Other tasks are not run. After the task flow is complete, it is marked as a failed task flow.
      2. Immediately Stop All Tasks: If an error occurs, all tasks in the task flow are stopped.
      3. Complete Unrelated Tasks: If an error occurs in node A, tasks that are running continue to run. Subsequent tasks that are not related to node A are also run.
      Concurrency Control PolicyThe execution policy that is available when one of the two task flows is already running and you run the task flows again. Valid values:
      • Skip: The system skips the task flow.
      • Ignore: The system runs the task flow. Make sure that the previous execution of the task flow is not affected.
      • Run in parallel: The system runs the two task flows in parallel.
        • Run in parallel 1: The system suspends Task A until Task A of the previous task flow is completed.
        • Run in parallel 2: The system suspends Task A until Task A and its downstream tasks in the previous task flow are completed.
      Set task flow to publicSpecifies whether to set the task flow to public. If you set the task flow to public, all the users of the tenant can view the task flow, but they cannot edit or run the task flow. The owner can edit and run the task flow. The task flow is not included in the statistics of task flows in different states on the dashboard unless you are the owner of the task flow.
    2. In the Scheduling Settings section of the Task Flow Information tab, turn on Enable Scheduling and configure the scheduling cycle.
      ParameterDescription
      Scheduling TypeThe scheduling type of the task flow. Valid values:
      • Cyclic scheduling: The task flow is periodically scheduled. For example, the task flow is run once a week.
      • Schedule once: The task flow is run once at a specific point in time. You need to specify only the point in time when the task flow is run.
      Effective TimeThe period during which the scheduling properties take effect. The default time period is from January 1, 1970 to January 1, 9999, which indicates that the scheduling properties permanently take effect.
      Scheduling CycleThe scheduling cycle of the task flow. Valid values:
      • Hour: The task flow is run within the hours that you specify. If you select this value, you must specify the Timed Scheduling parameter based on your business requirements.
      • Day: The task flow is run at the specified point in time every day. If you select this value, you must specify the Specific Point in Time parameter.
      • Week: The task flow is run at the specified point in time on the days that you select every week. If you select this value, you must specify the Specified Time and Specific Point in Time parameters.
      • Month: The task flow is run at the specified point in time on the days that you select every month. If you select this value, you must specify the Specified Time and Specific Point in Time parameters.
      Timed SchedulingThe scheduling method of the task flow. DMS provides the following scheduling methods:
      • Scheduling at a specific interval:
        • Start Time: the time when the task flow starts to be run
        • Intervals: the interval at which the task flow is run. Unit: hours.
        • End Time: the time when the task flow finishes running.
        For example, you can set the Start Time parameter to 00:00, the Intervals parameter to 6, and the End Time parameter to 20:59. In this case, DMS runs the task flow at 00:00, 06:00, 12:00, and 18:00.
      • Scheduling at the specified point in time: You must set the Specified Time parameter.

        For example, if you select 0Hour and 5Hour, DMS runs the task flow at 00:00 and 05:00.

      Specified Time
      • If you set the Scheduling Cycle parameter to Week, you can select one or more days of a week from the drop-down list.
      • If you set the Scheduling Cycle parameter to Month, you can select one or more days of a month from the drop-down list.
      Specific Point in TimeThe point in time of the specified days at which the task flow is run.

      For example, if you set this parameter to 02:55, DMS runs the task flow at 02:55 on the specified days.

      CRON ExpressionThe CRON expression that is automatically generated based on the specified scheduling cycle and time settings.
    3. Optional:To view the operation logs of the task flow, click the Operations tab.
    4. Optional:If you want to receive notifications about the execution status of the task flow, turn on the following switches on the Notification Configurations tab as needed:
      • Success Notification: You are notified if the task flow is run as expected.
      • Failure Notification: You are notified if the task flow fails.
      • Timeout Notification: You are notified if the execution of the task flow times out.
  6. Publish the task flow. For more information, see Publish or unpublish a task flow.

Task node types

CategoryTask node typeDescriptionReferences
Data integrationDTS data migrationMigrates data of selected tables or all tables from a database to another database. This type of node supports full data migration and can migrate both data and schemas. Configure a DTS data migration node
Batch IntegrationSynchronizes data between data sources. You can use this type of node in scenarios such as data migration and data transmission. Configure a batch integration node
Data processingSingle Instance SQLExecutes SQL statements in a specific relational database.
Note If you enable the lock-free schema change feature for the specified database instance, DMS uses this feature when you run Single Instance SQL tasks. This prevents tables from being locked. For more information, see Enable the lock-free schema change feature.
N/A
Cross-Database Spark SQLUses the Spark engine to process and transmit a large amount of data across databases. You can use this type of node for cross-database data synchronization and processing. Configure a cross-database Spark SQL node
DLA Serverless SparkConfigures Spark jobs based on the serverless Spark engine of Data Lake Analytics (DLA). Create and run Spark jobs
Lock-free Data ChangeUses the lock-free data change feature of DMS to perform related operations such as Update and Delete operations.
Note You can use this type of node only if the lock-free schema change feature is enabled for the database instance. For more information, see Enable the lock-free schema change feature.
Overview
DLA Spark SQLUses SQL statements to submit jobs to the Spark clusters of DLA. N/A
General operationsSQL Assignment for Single InstanceAssigns the data that is obtained by using the SELECT statement to the output variables. The output variables can be used as the input variables of the downstream node. Configure an SQL assignment node
Conditional BranchMakes conditional judgment in task flows. During the execution of a task flow, if the conditional expression of a conditional branch node evaluates to true, the subsequent tasks are run. Otherwise, the subsequent tasks are not run. Configure a conditional branch node
DLA one-click DWUploads the data in a database to Object Storage Service (OSS) to build a data warehouse by using the one-click data warehousing feature of DLA. One-click data warehousing
DBS BackupUses Database Backup (DBS) to back up data from a database to the OSS bucket provided by DBS. DBS
ScriptUses script tasks based on Database Gateway to execute scripts periodically or at a specific point in time. Configure a script node
Status checkCheck Whether Data Exists in Table After Specified TimeChecks whether incremental data exists in a table after a specific point in time. N/A
Lindorm File CheckChecks whether a file exists in an ApsaraDB for Lindorm instance that supports Hadoop Distributed File System (HDFS). N/A
SQL Status CheckChecks the status of data by using SQL statements. For example, you can check whether more than 10 boys are in the class. N/A
Audit TaskChecks the data quality of a table. After you specify a quality rule for the table and a scheduling cycle for the audit task, DMS checks the data quality of the table and generates a report. N/A
Check for Task Flow DependencyConfigures self-dependency for a task flow and dependencies across task flows. You can configure the task flow to depend on another task flow or a task node. Configure a dependency check node for a task flow

References