All Products
Search
Document Center

Data Management:Terms

Last Updated:Dec 13, 2023

This topic introduces the basic concepts related to task orchestration.

Task

A task is an operation to be performed.

Task node

A task node represents a task in a task flow. A node can be placed at a specific position in a task flow. An edge is used to establish a dependency between task nodes. The following types of task nodes are provided:

  • Data integration nodes

  • Data processing nodes

  • Status checking nodes

  • General nodes

Node instance

Multiple node instances form a task flow instance. The dependencies among node instances may vary based on the version of the task flow.

When you restore a task flow, only the node instances that failed to be run are rerun based on the dependencies. When you rerun a task flow, all node instances are rerun based on the dependencies. Both restore and rerun operations are performed based on the version of the task flow that was run.

A node instance can be in one of the following states:

  • RUNNING

  • SUSPEND

  • FAIL

  • SUCCESS

  • KILLING

  • SKIPPED

  • CANCELLED

  • QUEUED

Task flow

A task flow is a group of task nodes that are arranged in a specific order. A task node can depend on another task node. A directed acyclic graph (DAG) is used to represent the dependencies among task nodes. All task nodes in a task flow can be run.

Task flow version

A task flow version is a snapshot of a task flow at a point in time. A task flow version is in one of the following states:

  • Editing version: the current editable version of a task flow. Each time you test run a task flow, an editing version of the task flow is saved. A test run is manually triggered and is based on the latest editing version.

  • Published version: a version that is generated after a task flow is published. Each scheduled run of a task flow is based on the latest published version.

Trigger method of a task flow

A task flow can be triggered on schedule or manually.

  • Manual trigger: The task flow is manually triggered based on the latest editing version.

  • Scheduled trigger: The task flow is scheduled to run based on the latest published version.

Task flow instance

A task flow instance is the runtime record of a task flow. You can rerun a task flow that is successfully run. You can restore a task flow that failed to be run or change the state of the task flow instance from FAIL to SUCCESS.

Note

When you manually trigger a task flow, a version of the task flow instance is generated.

A task flow instance can be in one of the following states:

  • WAIT_SCHEDULE

  • RUNNING

  • SUSPEND

  • FAIL

  • SUCCESS

  • KILLING

  • QUEUED

Edge

An edge is a directional line in the DAG of a task flow. An edge connects a start node and an end node. An edge is used to establish a dependency between two task nodes. The start node is the upstream of the end node, and the end node is the downstream of the start node. A downstream node is run only when all the upstream nodes are successfully run. The following types of edges are provided:

  • Edge: a directional line that reflects the dependency between two task nodes or a task node and a task flow.

  • Edge in a task flow: a dependency between two task nodes that belong to the same task flow.

  • Edge across task flows: a dependency between two task nodes that belong to different task flows. You can configure event scheduling for a task flow to create an edge of this type.

Business time

The business time is one day earlier than the time when a task flow is run. The business time is represented by the bizdate variable. For more information, see Variables.

Data backfill

A data backfill operation generates one or more task flow instances for a specific point in time or time range based on the published version N of a task flow.

Owner and stakeholder

  • Owner: Only the owner of a task flow can edit the task flow and related configurations. An owner can test run a task flow, and receive runtime alerts on the task flow.

  • Stakeholder: A stakeholder of a task flow has the permissions to view the task flow and the related configurations. A stakeholder can test run a task flow but cannot edit the task flow and related configurations.

Permissions to trigger a task flow

  • Whether the owner and stakeholders of a task flow can trigger the task flow depends on whether the owner has the required permissions on the relevant databases and tables.

    Important

    If the owner does not have the required permissions on the relevant databases and tables, the task flow fails even if the stakeholder who triggers the task flow has the required permissions.

  • The owner and stakeholders of a task flow can trigger the task flow.

Run time

The time when a task is run.

Run mode

  • Try Run

    Run the task flow now.

  • Dry Run

    Task Flow A depends on Task Flow B by using a task node for checking task flow dependencies, but can be run without running Task Flow B. In this case, you can dry run Task Flow B to generate a scheduled trigger record. Then, Task Flow A can be normally run.

  • Run at a Specific Point in Time

    To use this mode, you must define a time variable for a task flow and reference it in SQL statements. The value of the time variable is calculated based on the day before the task flow is run and the specified offset. This allows you to run a task flow at a specific point in time without modifying the SQL statements and related configurations.

  • Run at a Specific Time Range

    To use this mode, you must define multiple time variables for a task flow. Only 50 node instances can be run after a task flow is started in this mode.

    For example, if the scheduling cycle of a task flow is one day, the task flow instance can run only for 50 days.

    Note

    Tasks in a task flow that runs within a time range are serially run. The tasks must be successfully run at the previous business time before they can be run at the next business time.