All Products
Search
Document Center

Data Management:Overview

Last Updated:Jun 08, 2023

The task orchestration feature of Data Management (DMS) is used to orchestrate and schedule tasks. You can create a task flow that contains one or more task nodes to implement complex scheduling and improve data development efficiency.

Supported database types

  • Relational databases:

    • MySQL: ApsaraDB RDS for MySQL, PolarDB for MySQL, MyBase for MySQL, PolarDB for Xscale, and MySQL databases from other sources

    • SQL Server: ApsaraDB RDS for SQL Server, MyBase for SQL Server, and SQL Server databases from other sources

    • PostgreSQL: ApsaraDB RDS for PostgreSQL, PolarDB for PostgreSQL, MyBase for PostgreSQL, and PostgreSQL databases from other sources

    • OceanBase: ApsaraDB for OceanBase in MySQL mode, ApsaraDB for OceanBase in Oracle mode, and self-managed OceanBase databases

    • PolarDB for PostgreSQL(Compatible with Oracle)

    • Oracle

    • DM

    • Db2

  • NoSQL: ApsaraDB for Lindorm

  • Data warehouses:

    • AnalyticDB for MySQL

    • AnalyticDB for PostgreSQL

    • DLA

    • MaxCompute

    • Hologres

  • Object storage: OSS

Task orchestration flowchart

taskflowchart

Procedure

  1. Log on to the DMS console V5.0.
  2. In the top navigation bar, click DTS. In the left-side navigation pane, choose Data Development > Task Orchestration.
  3. Create a task flow.

    1. On the Task orchestration page, click Create Task Flow.

    2. In the Create Task Flow dialog box, set the Task Flow Name and Description parameters, and then click OK.

  4. Create and configure task nodes.

    1. Create task nodes. In the Task Type list on the left side of the canvas, drag the required types of task nodes to the blank area on the canvas. For more information, see Task node types.

    2. Configure task nodes. To configure a task node, click the task node on the canvas and then click the 5 Settings icon 2 icon. You can also double-click the task node to go to the configuration page.

    3. Optional: Connect task nodes to form a task flow. Move the pointer over the upstream node, click and hold the circle on the right side of the upstream node, and then draw a line from the circle to the downstream node.

      For example, you can connect the SQL Assignment for Single Instance node to its downstream Conditional Branch node. To connect the nodes, move the pointer over the SQL Assignment for Single Instance node, click and hold the circle on the right side of the SQL Assignment for Single Instance node, and then draw a line from the circle to the Conditional Branch node.

  5. In the lower part of the page, configure and view information about the task flow.

    1. Click the Task Flow Information tab and set the parameters that are described in the following table.

      Parameter

      Description

      Task Flow Name

      The name of the task flow.

      Description

      The purpose or objective of the task flow. Specify a clear description to reduce communication costs.

      Owner

      The owner of the task flow. The owner can modify the task nodes and configurations of the task flow and perform test runs. The owner can also receive an alert if the task flow fails.

      Note

      After you change the owner, you must republish the task flow to make the change take effect.

      Stakeholders

      The stakeholders of the task flow. A stakeholder of a task flow can view the task flow and task configurations, and perform test runs on tasks. However, the stakeholder does not have permissions to modify the task flow and task configurations.

      Note

      DMS administrators and database administrators (DBAs) are the default stakeholders of each task flow. They can also change the owners of task flows.

      Error Handling Policy

      The action to take when an error occurs for the first time during the execution of a task flow. Valid values:

      1. Complete Running Tasks: If an error occurs, tasks that are running continue to run until the tasks are complete. Other tasks are not run. After the task flow is complete, it is marked as a failed task flow.

      2. Immediately Stop All Tasks: If an error occurs, all tasks in the task flow are stopped.

      3. Complete Unrelated Tasks: If an error occurs in node A, tasks that are running continue to run. Subsequent tasks that are not related to node A are also run.

      Concurrency Control Policy

      The execution policy that is available when one of the two task flows is already running and you run the task flows again. Valid values:

      • Skip: The system skips the task flow.

      • Ignore: The system runs the task flow. Make sure that the previous execution of the task flow is not affected.

      • Run in Parallel: The system concurrently runs the two task flows.

        • Mode 1: The system suspends Task A until Task A of the previous task flow is complete.

        • Mode 2: The system suspends Task A until Task A and its downstream tasks in the previous task flow are complete.

      Set task flow to public

      Specifies whether to set the task flow to public. If you set the task flow to public, all the users of the tenant can view the task flow, but they cannot modify or run the task flow. The owner can modify and run the task flow. The task flow is not included in the statistics of task flows in different states on the dashboard unless you are the owner of the task flow.

    2. In the Scheduling Settings section of the Task Flow Information tab, turn on Enable Scheduling and configure the scheduling cycle.

      Parameter

      Description

      Scheduling Type

      The scheduling type of the task flow. Valid values:

      • Cyclic scheduling: The task flow is periodically scheduled. For example, the task flow is run once a week.

      • Schedule once: The task flow is run once at a specific point in time. You need to specify only the point in time when the task flow is run.

      Effective Time

      The period during which the scheduling properties take effect. The default time period is from January 1, 1970 to January 1, 9999, which indicates that the scheduling properties permanently take effect.

      Scheduling Cycle

      The scheduling cycle of the task flow. Valid values:

      • Hour: The task flow is run within the hours that you specify. If you select this value, you must set the Timed Scheduling parameter.

      • Day: The task flow is run at the specified point in time every day. If you select this value, you must set the Specific Point in Time parameter.

      • Week: The task flow is run at the specified point in time on the days that you select every week. If you select this value, you must set the Specified Time and Specific Point in Time parameters.

      • Month: The task flow is run at the specified point in time on the days that you select every month. If you select this value, you must set the Specified Time and Specific Point in Time parameters.

      Timed Scheduling

      The scheduling method of the task flow. DMS provides the following scheduling methods:

      • Run at an interval:

        • Starting Time: the time when DMS runs the task flow.

        • Intervals: the interval at which the task flow is run. Unit: hours.

        • End Time: the time when DMS stops running the task flow.

        For example, you can set the Starting Time parameter to 00:00, the Intervals parameter to 6, and the End Time parameter to 20:59. In this case, DMS runs the task flow at 00:00, 06:00, 12:00, and 18:00.

      • Run at the specified point in time: You can select the hours at which DMS runs the task flow by using the Specified Time parameter.

        For example, if you select 0Hour and 5Hour, DMS runs the task flow at 00:00 and 05:00.

      Specified Time

      • If you set the Scheduling Cycle parameter to Week, you can select one or more days of a week from the drop-down list.

      • If you set the Scheduling Cycle parameter to Month, you can select one or more days of a month from the drop-down list.

      Specific Point in Time

      The point in time of the specified days at which the task flow is run.

      For example, if you set this parameter to 02:55, DMS runs the task flow at 02:55 on the specified days.

      Cron Expression

      The CRON expression that is automatically generated based on the values that you specify for the preceding parameters.

    3. Optional: To view the operation log of the task flow, click the Operations tab.

    4. Optional: If you want to receive notifications about the execution status of the task flow, turn on the following switches on the Notification Configurations tab based on your business requirements.

      • Success Notification: You are notified if the task flow is run as expected.

      • Failure Notification: You are notified if the task flow fails.

      • Timeout Notification: You are notified if the execution of the task flow times out.

  6. Publish the task flow. For more information, see Publish or unpublish a task flow.

Task node types

Category

Task node type

Description

References

Data processing

Single Instance SQL

Executes SQL statements in a specific relational database.

Note

If you enable the lock-free schema change feature for the specified database instance, DMS uses this feature when you run Single Instance SQL tasks. This prevents tables from being locked. For more information, see Enable the lock-free schema change feature.

N/A

General operations

SQL Assignment for Single Instance

Assigns the data that is obtained by using the SELECT statement to the output variables. The output variables can be used as the input variables of the downstream node.

Configure an SQL assignment node

Conditional Branch

Makes conditional judgment in task flows. During the execution of a task flow, if the conditional expression of a conditional branch node evaluates to true, the subsequent tasks are run. Otherwise, the subsequent tasks are not run.

Configure a conditional branch node

ECS Remote Commands

Runs shell, PowerShell, or batch scripts on a remote Elastic Compute Service (ECS) instance by using Cloud Assistant.

N/A

Status checking

Check Whether Data Exists in Table After Specified Time

Checks whether incremental data exists in a table after a specific point in time.

N/A

Audit Task

Checks the data quality of a table. After you specify a quality rule for the table and a scheduling cycle for the audit task, DMS checks the data quality of the table and generates a report.

N/A

Check for Task Flow Dependency

Configures self-dependency for a task flow and dependencies across task flows. You can configure the task flow to depend on another task flow or a task node.

Configure a dependency check node for a task flow

References