Node scheduling configuration and best practices - DataWorks

DataWorks uses two complementary scheduling mechanisms — dependency-driven and time-constrained — to determine when each node runs. Choosing the right mechanism, or combining them, directly affects data delivery timeliness, resource utilization, and the operational stability of your pipelines.

This topic covers time property configuration. For scheduling dependencies, see Configure scheduling dependencies.

How scheduling time works

Every node instance runs only when both of the following conditions are met:

All upstream scheduling dependencies have completed successfully.
The node's own scheduled time has arrived.

This dual-gate model gives you precise control: let dependency completion drive execution speed, enforce a hard time constraint, or combine both.

The two patterns below cover the most common configurations.

Pattern 1: Dependency-driven execution

Set a specific scheduled time only for the first node in a workflow. Set all downstream nodes to 00:00. Downstream nodes start immediately after their upstream dependency completes — no waiting for a clock time.

Use this pattern when the goal is to complete the workflow as fast as possible after the first node runs.

How it works:

02:00          ~02:10         ~02:18
  |              |              |
  A runs         B triggered    C triggered
  (scheduled)    (dep met)      (dep met)

Node	Scheduled time	Actual runtime	Trigger
A (start)	`02:00`	`02:00`	Runs at its scheduled time
B (downstream)	`00:00`	`~02:10`	Triggered immediately after A completes
C (downstream)	`00:00`	`~02:18`	Triggered immediately after B completes

Dependency-driven execution diagram

Pattern 2: Time-constrained execution

Set a specific scheduled time for any node that must not run before a certain clock time, regardless of when its upstream dependencies complete.

Use this pattern when external systems, business rules, or maintenance windows require a node to wait until a specific time.

How it works:

02:00          05:00          08:00
  |              |              |
  A completes   B runs         C runs
  (dep met,     (scheduled     (scheduled
  B waits)       time reached)  time reached)

Node	Scheduled time	Actual runtime	Trigger
A (start)	`02:00`	`02:00`	Runs at its scheduled time
B (time-constrained)	`05:00`	`05:00`	Waits for both: upstream A complete AND clock reaches 05:00
C (time-constrained)	`08:00`	`08:00`	Waits for both: upstream B complete AND clock reaches 08:00

Time-constrained execution diagram

Plan scheduled times for delivery deadlines

When downstream teams or systems depend on data being ready by a specific time, plan scheduled times by working backward from the deadline.

Calculate scheduled times

Use this formula to find the latest start time for any node with a delivery deadline:

Latest start time = Delivery deadline − (Estimated runtime + Buffer time)

Example: Node E must be ready by 09:00. Its estimated runtime is 20 minutes, and a 10-minute buffer is needed.

Node E scheduled time = 09:00 − (20 min + 10 min) = 08:30

Choose a configuration approach

	Manual time setting	Dependency-driven (recommended)
What you configure	Precise scheduled times for every node in the workflow	Scheduled time for the start node only
Intermediate nodes	Each node has a manually calculated time	Remain at the default `~00:00` and run as soon as upstream completes
Maintenance cost	High — any runtime change requires recalculating all nodes	Low — only the start point needs adjustment
Flexibility	Low	High

Dependency-driven approach (recommended): Set the start node's scheduled time so the workflow launches on time. Let the system manage everything else through dependencies.

The default scheduled time for a daily node is randomly generated within the 00:00–00:30 window, not exactly 00:00.

Example — five-node workflow with a 09:00 delivery deadline:

Node	Manual approach	Dependency-driven approach	Actual runtime
A	`07:00`	`07:00`	`07:00`
B	`07:20`	`~00:00` (keep default)	`~07:20`
C	`07:45`	`~00:00` (keep default)	`~07:45`
D	`07:30`	`~00:00` (keep default)	`~07:30`
E	`08:30`	`~00:00` (keep default)	`~08:30`

Handle resource contention at peak times

When many nodes use dependency-driven execution with a 00:00 default time, they can all start simultaneously, causing resource contention and queuing. Use baselines to prioritize core nodes without manually staggering start times.

Assign core nodes (such as Operational Data Store (ODS) layer extraction) to a high-priority baseline.
The scheduling engine gives high-priority nodes first access to computing resources.
Non-core report nodes queue behind them without competing for resources.

Baseline resource optimization diagram

	Before optimization	After optimization
Scenario	All jobs (core A/B, report C/D) scheduled at `00:00`	Core nodes A/B assigned to a high-priority baseline
Result	High concurrency, resource competition, widespread queuing	A/B run at `00:00`; C/D start at `~02:00` after A/B finish

Configure complex scheduling scenarios

Cross-cycle dependencies

Configure a cross-cycle dependency when a node must wait for all instances of an upstream node from the previous scheduling cycle to complete before running.

Scenario: A daily summary node B runs at 02:00 each day. Its data source is an hourly node A. Node B runs only after all 24 hourly instances of node A from the previous day (00:00–23:00) have completed.

Configuration: In node B's scheduling dependency settings, set its dependency on node A as a cross-cycle dependency.

Cross-cycle dependency configuration

Result: The node B instance for data timestamp 2025-12-02 waits until all node A instances for data timestamp 2025-12-01 complete successfully.

Cross-cycle dependency execution diagram

If node B has no other upstream nodes, add the workspace root vertex as its upstream node.

For more scenarios involving nodes with different scheduling granularities, see Principles and examples for configuring scheduling in complex dependency scenarios.

Quarterly and other irregular recurring schedules

For nodes that run on irregular but predictable dates — such as a financial closing node that runs on the last day of each quarter — combine a yearly scheduling cycle with scheduling parameters.

Scenario: A financial closing node must run on the last day of each quarter (the end of January, April, July, and October) and process data from the entire preceding quarter.

When setting a quarter-end closing date, allow extra time for special month-end items such as cross-month supplementary orders, refund reversals, and manual audits.

Configuration:

In the node's time properties, set the scheduling cycle to yearly, specify the months as 1, 4, 7, 10, and select Last day of the month for the date. DataWorks automatically handles months with 30 or 31 days and leap years.
In your code, use scheduling parameters to dynamically calculate the start and end dates of the quarter based on the current data timestamp.

Running logic: DataWorks identifies the correct last day for each month automatically. On non-execution days within the schedule, instances perform a dry-run without consuming computing resources, maintaining dependency continuity.

Quarterly schedule configuration diagram

Scheduling calendars for business-day-specific schedules

A scheduling calendar acts as a date filter. Combined with a scheduled time, it lets you run a node only on specific business days — such as trading days — without generating instances on other days.

Scenario: A securities clearing node must run at 22:00 on every trading day. On weekends and public holidays, the node must not run.

Configuration:

In the DataWorks resource center, create a custom "Trading Calendar" with all trading dates for the year. For details, see Configure a scheduling calendar.
In the node's scheduling properties, set the trigger time to 22:00 and associate the node with the Trading Calendar.

Execution logic:

Trading day: The node runs at 22:00.
Non-trading day (such as a public holiday): The system skips instance generation, or the generated instance performs a dry-run without consuming computing resources.

Scheduling calendar execution diagram

Combine a scheduling calendar with hourly or minute-level schedules for dual date-and-time filtering. For example, an hourly node configured to run at 08:00 and 18:00 daily, when associated with a calendar that includes only Mondays and Fridays, runs only at those times on Mondays and Fridays.

Best practices

Decouple scheduling logic from business logic

Use a layered strategy so that changes to business requirements don't force you to reconfigure the entire workflow.

Linear workflows: Set the scheduled time for the first node only (for example, 07:00). Downstream nodes run through dependency-driven execution.
Time-constrained nodes: Set precise, independent scheduled times for nodes with hard external deadlines. Avoid setting a start node's scheduled time later than its downstream node — that prevents the downstream node from running on time.
Dynamic date handling: Use scheduling parameters such as ${yyyymmdd} to dynamically replace date values in your code, avoiding hardcoded dates.
Active period control: Use scheduling calendars and effective date ranges to limit when a node runs. For example, restrict a node to weekdays between January 1, 2026, and December 31, 2026.

Use baselines for intelligent scheduling

Baselines guarantee delivery times for core nodes and eliminate the need to manually manage resource contention.

Prerequisite: Create a baseline and configure node priorities.

Define committed completion times: Attach core nodes to a high-priority baseline with a committed completion time (for example, 09:00). The scheduling engine identifies the critical path and ensures high-priority nodes — such as ODS layer extraction — get computing resources first.
Automatic resource peak-load shifting: Non-core report nodes are queued behind high-priority nodes. No manual staggering of start times is needed.
Dynamic prediction and alerts: Based on historical runtimes, the system predicts early in the day whether a workflow will miss its delivery time. If a delay is detected at 07:00 (for example, a predicted finish of 09:15 instead of 09:00), it triggers an alert and highlights the bottleneck nodes on the critical path — shifting the response from post-incident recovery to proactive intervention.

Combining static planning with intelligent baselines: Set the start node's scheduled time through static planning. Use baselines to handle dynamic priority-based scheduling across the rest of the workflow. This reduces manual maintenance costs and ensures a high degree of certainty for core data output.

DataWorks:Configure time properties