All Products
Search
Document Center

DataWorks:Configure time properties

Last Updated:Mar 03, 2026

This guide explains how to configure time properties for DataWorks nodes in a production environment. It covers basic scheduling settings and advanced logic for complex business scenarios. Time properties determine the timeliness of data output and directly affect the stability and determinism of your production workflows. By configuring instance lifetime, scheduled time, execution policies, and fault tolerance, you can build a flexible and robust automated scheduling system that decouples business logic from the computation flow.

Basic principles of time configuration

The DataWorks scheduling system uses two types of logic: dependency-driven and time-constrained. A node runs only when both conditions are met. Understanding these patterns is fundamental to configuring schedules.

Note

For more information about scheduling dependencies, see Configure scheduling dependencies.

Pattern 1: Dependency-driven execution

Use this pattern when the goal is to complete all computations in a workflow as quickly as possible. All nodes in the workflow should run as soon as their input data is ready, completing the entire chain of computation at maximum speed.

  • Configuration method: Set a specific scheduled time, such as 02:00, for only the first node (or nodes with no upstream dependencies) in the workflow. Set the scheduled time for all descendant nodes to 00:00.

  • Running logic: Descendant nodes remain in a waiting state until their upstream dependencies are met. As soon as an ancestor node runs successfully, its descendant nodes are triggered immediately.

  • Configuration and runtime example:

    image

    Node

    Scheduled Time Configuration

    Estimated Runtime

    Trigger Logic

    A (Start node)

    02:00

    02:00

    Runs when its scheduled time arrives.

    B (downstream)

    00:00

    ~02:10 (After A completes)

    Node B is triggered immediately after its dependency, Node A, runs successfully.

    C (downstream)

    00:00

    ~02:18 (After B completes)

    Node C is triggered immediately after its dependency, Node B, runs successfully.

Pattern 2: Time-constrained execution

Use this pattern when a node in a workflow must run after a specific point in time due to external dependencies or business rules. For example, a node must start its computation after 05:00 because of an external business rule or a system maintenance window.

  • Configuration method: Set a specific scheduled time, such as 05:00, for the time-constrained node.

  • Running logic: The node runs only after both conditions are met: its upstream dependencies have completed, and its own scheduled time has arrived. Even if the ancestor Node A finishes at 02:00, Node B will wait until its scheduled time of 05:00 to start running.

  • Configuration and runtime example:

    image

    Node

    Scheduling Configuration

    Instance Runtime

    Trigger Logic

    A (Start node)

    02:00

    02:00

    Runs when its scheduled time arrives.

    B (Time-constrained)

    05:00

    05:00

    The upstream dependency, Node A, is met. The node waits for its own scheduled time to arrive.

    C (Time-constrained)

    08:00

    08:00

    The upstream dependency, Node B, is met. The node waits for its own scheduled time to arrive.

Planning for scheduling time configuration

Ensure On-Time Completion of Critical Tasks

When output data has strict delivery time requirements (for example, node E must be completed before 09:00 daily), you need to plan the scheduled time for the entire business process in reverse from the endpoint.

Solution 1: Manually set the time

Solution 2: Use dependency-driven auto-adjustment (Recommended)

  1. Determine the delivery time and buffer time for the final output node

    • Goal: Node E must be delivered to the downstream business team before 09:00.

    • The estimated runtime for Node E is 20 minutes. Add a 10-minute buffer time to account for fluctuations.

  2. Calculate and set the scheduled time for the final output node

    • Latest start time = Delivery time - (Estimated runtime + Buffer time).

    • Scheduled time for Node E = 09:00 - (20 minutes + 10 minutes) = 08:30.

  3. Calculate the scheduled time for each ancestor node

    Working backward from the end, calculate and set the latest start time for all ancestor nodes, such as C, D, B, and A.

  1. Determine the SLA and buffer time for the final output node

    • Goal: Node E must be completed before 09:00.

    • The estimated runtime for Node E is 20 minutes. Add a 10-minute buffer time to account for fluctuations.

  2. Calculate and set the scheduled time for the final output node

    • Latest start time = Service-Level Agreement (SLA) time - (Estimated runtime + Buffer time).

    • Scheduled time for Node E = 09:00 - (20 minutes + 10 minutes) = 08:30.

  3. Calculate the scheduled time for the start node

    Working backward from the end, calculate and set the scheduled time for only the start node (Node A in the figure below). The remaining descendant nodes can keep their default time and will be triggered by scheduling dependencies.

Solution 2 combines static planning with dynamic scheduling capabilities. This approach ensures delivery times with lower maintenance costs and higher operational flexibility. It minimizes manual configuration by requiring you to focus only on the start and end points. The system intelligently manages the intermediate process. This solution is highly recommended.

imageimage
Note

The default time of 00:00 in the figure is only an example. In practice, the default scheduled time for a daily node is randomly generated within the 00:00 to 00:30 time frame.

Node

Scheduled Time (Solution 1)

Scheduled Time (Solution 2)

Actual Runtime

A

07:00

07:00

07:00:00

B

07:20

~00:00 (Keep default, no adjustment needed)

~07:20:00

C

07:45

~00:00 (Keep default, no adjustment needed)

~07:45:00

D

07:30

~00:00 (Keep default, no adjustment needed)

~07:30:00

E

08:30

~00:00 (Keep default, no adjustment needed)

~08:30:00

Using baseline priority for resource peak-load shifting

The dependency-driven method is easy to configure, but it can cause many nodes to start simultaneously, such as at midnight 00:00. This leads to competition for computing resources and node queuing. In this case, use the priority settings in baseline management to give critical core nodes a higher priority to run.

  • Identify node priority: Distinguish between core nodes, such as Operational Data Store (ODS) layer data extraction, and non-core nodes, such as some internal reports.

  • Set the scheduling priority for a job: You can set a baseline to prioritize the acquisition of core resources.

  • Optimization comparison diagram:

image

By combining scheduled times with baselines, you can achieve reasonable allocation of schedule resources and priority-based intelligent scheduling. This reduces the O&M costs and human errors associated with setting individual scheduled times for each node.

Scenario

Description

Before optimization: Resource contention

Scheduling all jobs (core A/B, report C/D) to run at 00:00 causes high concurrency, intense resource competition, and widespread job queuing.

After optimization: Staggered execution

Core nodes A and B get resources promptly and run at 00:00 because of their high priority.
Report nodes C and D wait for A and B to finish and start running at 02:00. This ensures stable execution without competing for resources with important nodes.

Complex scenario practices

Configuring cross-cycle dependencies

When a node's execution depends on an instance of its ancestor node from a previous cycle, you need to configure a cross-cycle dependency. An example is a T+1 daily summary node that must wait for all hourly nodes from day T to complete.

  • Scenario: A daily summary node B is scheduled to run at 02:00 every day. Its data source is an hourly node A. Node B can run only after all hourly instances of Node A from the previous day (day T), from 00:00 to 23:00, have run successfully.

  • Configuration method: To configure the scheduling dependencies for Node B, set its dependency on the ancestor Node A as a cross-cycle dependency. For more information, see Configure a dependency on the previous cycle (cross-cycle dependency).

    PixPin_2026-01-22_14-27-13

  • How it works: After configuration, the instance of Node B for the data timestamp 2025-12-02 will wait until all instances of Node A for the data timestamp 2025-12-01 have run successfully before it is triggered.

    image
Note

Implementing complex recurring schedules

For nodes with special recurring patterns, such as quarterly or semi-annual tasks like a quarterly closing node, you can combine scheduling cycles with scheduling parameters.

  • Scenario: A financial closing node needs to run on the last closing day of each quarter and depends on data from the entire past quarter.

    When setting a closing day, a buffer period is usually reserved to handle special month-end items such as cross-month supplementary orders, refund reversals, and manual audits.
  • Configuration method:

    1. Set the scheduling cycle: In the node's time properties, select a yearly schedule, specify the months as 1, 4, 7, 10, and select Last day of the month for the date. DataWorks automatically handles months with different numbers of days (30 or 31) and leap years.

    2. Use scheduling parameters: In your code, use scheduling parameters or user-defined functions to dynamically calculate the required data date range. For example, you can determine the start and end dates of the quarter based on the current data timestamp. For more information, see Supported formats of scheduling parameters.

  • Running logic: DataWorks automatically identifies whether the 30th or 31st (or even February 29th in a leap year) is the "last day". During this period, instances on other days of the month automatically perform a dry-run. This ensures both logical dependency continuity and precise execution for financial calculations.

    image

Using scheduling calendars for trading day schedules

The scheduled time (cron expression) defines when a node runs within its scheduling cycle, while a scheduling calendar acts as a filter for the execution dates. By combining them, you can precisely control nodes to run only on specific business days, such as trading days or promotion days.

  • Scenario: A securities clearing node for a brokerage firm must run at 22:00 on every trading day (non-holiday). If a scheduled day falls on a weekend or a public holiday, the node must automatically stop running to avoid generating invalid instances or wasting resources on dry-runs.

  • Solution: Scheduling calendar + Scheduled time

    • Create a custom calendar: In the DataWorks resource center, you can maintain a "Trading Calendar" by manually or automatically synchronizing all trading dates for the year. For more information, see Configure a scheduling calendar.

    • Configure scheduling properties: Set the node to trigger daily at 22:00 and select the custom "Trading Calendar".

  • Execution logic:

    • On a trading day: The system detects that the current date is in the calendar, and the node starts on time at 22:00.

    • On a non-trading day (such as Chinese New Year): The system automatically skips generating an instance for the node, or the generated instance performs a dry-run without consuming actual computing resources.

    image
Note

A scheduling calendar can be thought of as a filter for execution dates. By combining it with hourly or minute-level schedules, you can achieve dual filtering for both date and time.

For example, an hourly node configured to run daily at 08:00 and 18:00, if associated with a scheduling calendar that only includes Mondays and Fridays, will ultimately run only at these specified times on Mondays and Fridays.

Best practices

1. Static scheduling planning and configuration

Goal: Decouple scheduling logic from business logic using a layered strategy.

Core strategies:

  1. Linear workflows

    Configure the scheduled time for only the first node, for example, set the first node's scheduled time to 07:00. Descendant nodes are automatically triggered through dependencies to maximize execution efficiency.

  2. Time-based task

    1. Set precise, independent scheduled times for specific nodes. When setting times, avoid scheduling an ancestor node later than its descendant node, which would prevent the descendant from running on time.

    2. Use scheduling calendars and effective dates to control the node's active period. For example, you can control a node to run only on weekdays between January 1, 2026, and December 31, 2026.

  3. Dynamic scheduling parameters

    Combine with scheduling parameters such as ${yyyymmdd} to dynamically replace time parameters.

2. Intelligent dynamic control with baselines

Goal: Guarantee delivery times for core nodes and reduce manual intervention costs.

Prerequisite: You have created a baseline and set node priorities.

Core mechanisms:

  1. Commitment time and priority definition:

    Define the committed completion time for core nodes, such as 09:00, and attach them to a high-priority baseline. The system automatically identifies the critical path based on priority and ensures that high-priority nodes, such as ODS layer extraction, get computing resources first.

  2. Automatic resource "peak-load shifting":

    You do not need to manually set staggered start times for each non-core node. The scheduling engine automatically queues non-core report nodes to avoid resource contention during peak times, prioritizing resource supply for the critical path.

  3. Dynamic prediction and real-time alerting:

    Based on historical runtimes, the system can dynamically predict in the early morning whether the day's workflow will miss its delivery time. If at 07:00 the system predicts a delay to 09:15, it immediately triggers an alert and highlights the bottleneck nodes on the critical path. This changes the approach from "post-mortem recovery" to "proactive intervention".

The best practice is to combine the latest start time, derived by working backward from the end node, with intelligent baselines. This method sets the starting point through static planning and then uses baselines for dynamic, priority-based scheduling across the entire workflow. It reduces manual maintenance costs and builds a comprehensive assurance system from planning to prediction, ensuring a high degree of certainty for core data output.

References