Scheduling dependencies in DataWorks are dependencies between instances, not tasks. Each time a task runs, DataWorks generates one instance for that scheduling cycle. When ancestor and descendant tasks run at different frequencies, the rule that determines which ancestor instance a descendant instance waits for is called the principle of scheduling time proximity.
Key concepts
Instance: The execution unit for a single scheduling cycle. A task scheduled hourly generates one instance per hour.
Principle of scheduling time proximity: A descendant instance depends on the ancestor instance whose scheduling time is closest to — but not later than — its own scheduling time, and that is not an ancestor instance of other instances.
Self-dependency: A configuration that makes each instance depend on the previous instance of the same task. When enabled, only the first instance of the day depends on the ancestor task; every subsequent instance depends on the preceding instance of the same task rather than running in parallel.
Dry-run instance: A placeholder instance generated for weekly, monthly, or yearly tasks on days when no actual run is scheduled. Its status is automatically set to successful without running any code, so it does not block descendant tasks.
How the proximity principle works
The proximity principle applies whenever ancestor and descendant tasks run at different frequencies:
A descendant instance depends on the ancestor instance whose scheduling time is closest to but not later than its own scheduling time.
Two edge cases apply:
-
If no ancestor instance has a scheduling time earlier than the first descendant instance of the day, the first descendant instance depends on the first ancestor instance of the day.
-
If a descendant task's scheduling time arrives before its ancestor has finished running, the descendant waits. Its actual start time is no earlier than the ancestor's completion time.
Worked example — hour task depends on day task
Task A (day task) is scheduled at 07:00. Task B (hour task) is scheduled at 00:00, 08:00, and 16:00.
When 00:00 arrives, Task A has not finished. Task B's 00:00 instance waits. The earliest Task B can actually start is 07:00, after Task A completes.
When the proximity principle applies
| Scenario | Proximity principle applies? | Mapping |
|---|---|---|
| Hour task depends on hour task — same instance count per day | No | One-to-one: first depends on first, second on second, and so on |
| Hour task depends on hour task — different instance count per day | Yes | Each descendant depends on the closest ancestor instance not later than its own scheduling time |
| Hour task depends on minute task | Yes | See Hour and minute dependencies |
| Minute task depends on hour task | Yes | Each minute instance depends on the closest preceding hour instance |
| Hour or minute task depends on day task | No (all instances depend on the single day instance) | See Hour or minute task depending on a day task |
| Day task depends on hour or minute task | No (depends on all instances by default) | See Day task depending on hour or minute |
| Day, hour, or minute task depends on weekly, monthly, or yearly task | Yes (dry-run instances fill non-scheduled days) | See Dependencies on weekly, monthly, and yearly tasks |
Dependency modes by frequency pair
Hour and minute dependencies
Hour depends on hour
When both tasks generate the same number of instances per day, one-to-one mappings apply regardless of when each task starts its scheduling period.
-
Scenario 1 — equal instance counts, same period: Node A and Node B are both scheduled every 5 hours within 00:00–23:59. Each generates the same number of instances. One-to-one mappings apply: B's first instance depends on A's first instance, B's second on A's second, and so on.
-
Scenario 2 — equal instance counts, different start times: Node A runs every 4 hours from 06:10 to 21:59 (4 instances: 06:10, 10:10, 14:10, 18:10). Node B runs every 4 hours from 08:00 to 23:59 (4 instances: 08:00, 12:00, 16:00, 20:00). The instance counts are equal, so one-to-one mappings apply: B's 08:00 instance depends on A's 06:10 instance, B's 12:00 instance on A's 10:10 instance, and so on.
When the instance counts differ, the proximity principle determines the mapping. If an ancestor generates fewer instances than the descendant, multiple descendant instances may depend on the same ancestor instance.
Hour depends on minute
| Mode | Behavior |
|---|---|
| No self-dependency on the minute task | The hour instance for a given hour depends on all minute instances within that same hour. |
| Self-dependency on both tasks | The hour instance depends on the minute instance whose scheduling time is closest to but not later than the hour instance's scheduling time within the same hour. |
Minute depends on hour
| Mode | Behavior |
|---|---|
| No self-dependency on the minute task | Each minute instance depends on the hour instance whose scheduling time is closest to but not later than that minute instance's scheduling time. Multiple minute instances may share the same ancestor hour instance. |
| Self-dependency on both tasks | Each minute instance also depends on its own previous-cycle instance, in addition to the closest preceding hour instance. |
Hour or minute task depending on a day task
| Mode | Behavior |
|---|---|
| No self-dependency on the hour/minute task | All instances generated for the hour or minute task on the current day depend on the single day instance. After the day task finishes, all queued hour/minute instances whose scheduling time has passed run in parallel. |
| Self-dependency on the hour/minute task | Only the first instance of the day depends on the day task. Each subsequent instance depends on the previous instance of the same task, so instances run sequentially. |
When self-dependency is enabled, the first instance of the current day may depend on the last instance of the previous day. If that previous-day instance has not finished, the hour or minute task cannot start on the current day.
Day task depending on hour or minute
| Mode | Behavior |
|---|---|
| No self-dependency on the hour/minute task | The day instance depends on all hour or minute instances on the current day. The day task starts after all those instances complete. |
| Self-dependency on the hour/minute task | The day instance depends on the hour or minute instance whose scheduling time is closest to but not later than the day task's scheduling time. The day task starts after that specific instance completes. |
Enable self-dependency on the hour or minute task when the day task should start immediately after the last upstream run before its scheduled time, rather than waiting for all upstream instances to complete.
Day task depending on another day task
| Mode | Behavior |
|---|---|
| No self-dependency on the ancestor day task | The descendant day instance depends on the ancestor day instance in the same scheduling cycle (same calendar day). |
| Self-dependency on the ancestor day task | A cross-dependency exists: the ancestor's current-day instance depends on its own previous-day instance. The descendant task cannot start until both conditions are met. |
Day task depending on hour or minute on the previous day
| Mode | Behavior |
|---|---|
| No self-dependency on the hour/minute task | The day task on the current day depends on all instances generated for the hour or minute task on the previous day. |
| Self-dependency on the hour/minute task | The day task on the current day depends only on the last instance generated for the hour or minute task on the previous day. |
Dependencies on weekly, monthly, and yearly tasks
When a day, hour, or minute task depends on a weekly, monthly, or yearly task, DataWorks generates dry-run instances for the low-frequency task on every day that falls outside its scheduled run days. Dry-run instances complete immediately with a successful status, so descendant tasks are not blocked.
Example — day task depends on weekly task (no self-dependency)
The weekly task is scheduled every Monday and Friday.
-
Monday and Friday: the weekly task runs normally. After it completes, the day task runs.
-
Tuesday, Wednesday, Thursday, Saturday, and Sunday: dry-run instances are generated for the weekly task. Their status is set to successful immediately without running any code. The day task runs as usual.
Choose the right dependency mode
Use this table to identify the right configuration for common scenarios.
| Goal | Configuration |
|---|---|
| Day task waits for all hourly data before running | Day depends on hour task; no self-dependency on the hour task |
| Day task starts as soon as the last hourly run before its scheduled time completes | Enable self-dependency on the hour task |
| Hour instances run sequentially after the day task finishes (no parallel burst) | Enable self-dependency on the hour task |
| Hour instances all start as soon as the day task finishes (parallel burst is acceptable) | No self-dependency on the hour task |
| Day task waits for all previous-day hourly data | Day depends on previous-day hour task; no self-dependency on the hour task |
| Day task waits only for the last run of the previous-day hourly task | Enable self-dependency on the hour task |
What's next
-
For same-cycle and previous-cycle scheduling dependency configuration, see the scheduling dependency configuration topics.
-
In DataWorks, you can configure ancestor and descendant nodes for a workflow as a whole. For workflow-level dependency configuration, see Recurring workflows.