Basic concepts of task orchestration - Data Management - Alibaba Cloud Documentation Center

Task

A task is an operation to be performed.

Task node

A task node represents a task within a task flow. Nodes can be positioned at any point in the flow, and edges between nodes define execution dependencies. DMS provides four types of task nodes:

Data integration nodes
Data processing nodes
Status checking nodes
General nodes

Node instance

Multiple node instances together form a task flow instance. The dependencies between node instances follow the version of the task flow that was run.

When a task flow fails, you have two options:

Restore: reruns only the failed node instances, based on their dependencies.
Rerun: reruns all node instances from the beginning.

Both operations use the same task flow version that was originally run.

A node instance can be in one of the following states:

RUNNING
SUSPEND
FAIL
SUCCESS
KILLING
SKIPPED
CANCELLED
QUEUED

Task flow

A task flow is a group of task nodes arranged in a specific execution order. Dependencies between nodes are expressed as a directed acyclic graph (DAG), where each downstream node runs only after all its upstream nodes succeed. All task nodes in a task flow can be run.

Task flow version

A task flow version is a snapshot of a task flow at a point in time. There are two version types:

Editing version: the current editable version. Each time you trigger a test run, the editing version is saved. Test runs always use the latest editing version.
Published version: generated when you publish a task flow. Scheduled runs always use the latest published version.

Trigger method

A task flow can be triggered in two ways:

Manual trigger: runs the task flow immediately, based on the latest editing version.
Scheduled trigger: runs the task flow on a schedule, based on the latest published version.

Task flow instance

A task flow instance is the runtime record of a task flow.

If a task flow instance succeeded, you can rerun it.
If a task flow instance failed, you can restore it to rerun only the failed nodes, or manually change its state from FAIL to SUCCESS.

Triggering a task flow manually creates a new version of the task flow instance.

A task flow instance can be in one of the following states:

WAIT_SCHEDULE
RUNNING
SUSPEND
FAIL
SUCCESS
KILLING
QUEUED

Edge

An edge is a directional line in the DAG that connects two task nodes and establishes a dependency between them. The start node is the upstream node; the end node is the downstream node. A downstream node runs only after all its upstream nodes succeed.

DMS provides three types of edges:

Edge: a dependency between two task nodes, or between a task node and a task flow.
Edge in a task flow: a dependency between two task nodes that belong to the same task flow.
Edge across task flows: a dependency between two task nodes that belong to different task flows. Configure event scheduling to create this type of edge.

Business time

Business time is the date that a task flow processes data for — always one day earlier than when the task flow runs. The business time is represented by the bizdate variable. For more information, see Variables.

Data backfill

A data backfill operation generates one or more task flow instances for a specific point in time or time range, based on published version N of the task flow.

Owner and stakeholder

Role	Permissions
Owner	Edit the task flow and its configurations; trigger test runs; receive runtime alerts.
Stakeholder	View the task flow and its configurations; trigger test runs. Cannot edit the task flow.

Permissions to trigger a task flow

Both the owner and stakeholders can trigger a task flow. Whether a trigger succeeds depends on whether the owner has the required permissions on the relevant databases and tables.

Important

If the owner lacks the required permissions, the task flow fails — even if the stakeholder who triggers it has those permissions.

Run time

The time when a task runs.

Run mode

DMS provides four run modes:

Try Run: runs the task flow immediately.
Dry Run: generates a scheduled trigger record for a task flow without actually running it. Use this when Task Flow A depends on Task Flow B, but Task Flow A can be run without running Task Flow B.
Run at a specific point in time: runs a task flow as of a specific point in time without modifying SQL statements. Define a time variable in the task flow and reference it in SQL statements. The variable value is calculated based on the day before the run and the specified offset.
Run at a specific time range: runs a task flow across a range of business dates. Define multiple time variables in the task flow. Tasks within the run are executed serially — each business date must complete successfully before the next one starts.

A single Run at a specific time range execution can cover at most 50 node instances. For example, if the scheduling cycle is one day, the run covers at most 50 days.