This page covers the core concepts for task orchestration in DMS.
Task
A task is an operation to be performed.
Task node
A task node represents a task within a task flow. Nodes can be positioned at any point in the flow, and edges between nodes define execution dependencies. DMS provides four types of task nodes:
-
Data integration nodes
-
Data processing nodes
-
Status checking nodes
-
General nodes
Node instance
Multiple node instances together form a task flow instance. The dependencies between node instances follow the version of the task flow that was run.
When a task flow fails, you have two options:
-
Restore: reruns only the failed node instances, based on their dependencies.
-
Rerun: reruns all node instances from the beginning.
Both operations use the same task flow version that was originally run.
A node instance can be in one of the following states:
-
RUNNING
-
SUSPEND
-
FAIL
-
SUCCESS
-
KILLING
-
SKIPPED
-
CANCELLED
-
QUEUED
Task flow
A task flow is a group of task nodes arranged in a specific execution order. Dependencies between nodes are expressed as a directed acyclic graph (DAG), where each downstream node runs only after all its upstream nodes succeed. All task nodes in a task flow can be run.
Task flow version
A task flow version is a snapshot of a task flow at a point in time. There are two version types:
-
Editing version: the current editable version. Each time you trigger a test run, the editing version is saved. Test runs always use the latest editing version.
-
Published version: generated when you publish a task flow. Scheduled runs always use the latest published version.
Trigger method
A task flow can be triggered in two ways:
-
Manual trigger: runs the task flow immediately, based on the latest editing version.
-
Scheduled trigger: runs the task flow on a schedule, based on the latest published version.
Task flow instance
A task flow instance is the runtime record of a task flow.
-
If a task flow instance succeeded, you can rerun it.
-
If a task flow instance failed, you can restore it to rerun only the failed nodes, or manually change its state from FAIL to SUCCESS.
Triggering a task flow manually creates a new version of the task flow instance.
A task flow instance can be in one of the following states:
-
WAIT_SCHEDULE
-
RUNNING
-
SUSPEND
-
FAIL
-
SUCCESS
-
KILLING
-
QUEUED
Edge
An edge is a directional line in the DAG that connects two task nodes and establishes a dependency between them. The start node is the upstream node; the end node is the downstream node. A downstream node runs only after all its upstream nodes succeed.
DMS provides three types of edges:
-
Edge: a dependency between two task nodes, or between a task node and a task flow.
-
Edge in a task flow: a dependency between two task nodes that belong to the same task flow.
-
Edge across task flows: a dependency between two task nodes that belong to different task flows. Configure event scheduling to create this type of edge.
Business time
Business time is the date that a task flow processes data for — always one day earlier than when the task flow runs. The business time is represented by the bizdate variable. For more information, see Variables.
Data backfill
A data backfill operation generates one or more task flow instances for a specific point in time or time range, based on published version N of the task flow.
Owner and stakeholder
| Role | Permissions |
|---|---|
| Owner | Edit the task flow and its configurations; trigger test runs; receive runtime alerts. |
| Stakeholder | View the task flow and its configurations; trigger test runs. Cannot edit the task flow. |
Permissions to trigger a task flow
Both the owner and stakeholders can trigger a task flow. Whether a trigger succeeds depends on whether the owner has the required permissions on the relevant databases and tables.
If the owner lacks the required permissions, the task flow fails — even if the stakeholder who triggers it has those permissions.
Run time
The time when a task runs.
Run mode
DMS provides four run modes:
-
Try Run: runs the task flow immediately.
-
Dry Run: generates a scheduled trigger record for a task flow without actually running it. Use this when Task Flow A depends on Task Flow B, but Task Flow A can be run without running Task Flow B.
-
Run at a specific point in time: runs a task flow as of a specific point in time without modifying SQL statements. Define a time variable in the task flow and reference it in SQL statements. The variable value is calculated based on the day before the run and the specified offset.
-
Run at a specific time range: runs a task flow across a range of business dates. Define multiple time variables in the task flow. Tasks within the run are executed serially — each business date must complete successfully before the next one starts.
A single Run at a specific time range execution can cover at most 50 node instances. For example, if the scheduling cycle is one day, the run covers at most 50 days.