Scheduling dependencies define which ancestor nodes must finish successfully before a descendant node can start. Configure them correctly to make sure each node gets valid data from its upstream tables.
How it works
DataWorks runs auto triggered nodes based on a directed acyclic graph (DAG). When you set a scheduling dependency between two nodes, the descendant node waits until all its ancestor nodes complete successfully before it starts. This ensures downstream nodes always operate on data that was generated in the current or previous scheduling cycle.
The scheduling time of a node sets its earliest possible start time. If an ancestor node hasn't finished yet, the descendant waits — even if its scheduled start time has already passed. For details on diagnosing wait states, see Use the Intelligent Diagnosis feature.
Choose a dependency method
Two methods are available. Which one to use depends on whether a strong data lineage exists between the current node's output table and its ancestor nodes' output tables.
Strong lineage means the current node reads data that ancestor nodes write. If ancestor nodes fail to produce data, the current node also fails to get valid data.
| Method | When to use |
|---|---|
| Configure based on table data lineage | A strong lineage exists between the current node's table and its ancestor nodes' tables, and ancestor nodes are auto triggered nodes |
| Configure custom dependencies | No strong lineage exists, or the upstream data source is not an auto triggered node |
Configure scheduling dependencies based on table data lineage
When a strong lineage exists and the upstream tables are generated by auto triggered nodes, configure table-lineage-based dependencies. DataWorks tracks node completion status to determine when upstream data is ready.
DataWorks supports dependencies between nodes scheduled by minute, hour, day, week, month, or year. The number of instances per node depends on its scheduling frequency and cycle count, so the dependency mapping between ancestor and descendant instances can vary. Preview dependencies before deploying if ancestor and descendant nodes have different scheduling frequencies or scheduling times. For details, see Principles and samples of scheduling configurations in complex dependency scenarios.
Step 1: Confirm table data lineage
Check whether the current node's output depends on the ancestor node's output. Ask: if the ancestor fails to generate data today, can the current node still produce valid results? If not, a strong lineage exists.
For cross-workspace dependencies or cases where you cannot view the ancestor node's scheduling parameter configuration, see Confirm the lineage of a table.
Step 2: Select a dependency type
The scheduling parameters in node code determine which specific ancestor instance the current instance depends on. DataWorks replaces parameter values at scheduling time based on the data timestamp and scheduling time of the node. The following diagram shows how the selection logic works.
Select the dependency type based on which data the current node needs:
| Dependency type | When to use | Example |
|---|---|---|
| Same-cycle | The current node needs data generated by the ancestor node on the same day | A daily report node (runs at 06:00) depends on a daily aggregation node (runs at 05:00) — the 06:00 instance depends on the same day's 05:00 instance |
| Cross-cycle | The current node needs data generated by the ancestor node on the previous day, or an hour/minute-scheduled instance depends on the previous cycle's instance | A daily report node (runs at 01:00) depends on yesterday's end-of-day aggregation — the January 15 instance depends on the January 14 ancestor instance |
Special cases — hour and minute scheduling:
-
A node scheduled by hour or minute can depend on its own previous-cycle instance. For example, the instance generated for an hourly node in the current cycle depends on the instance generated for the same node in the previous cycle. For details, see Dependency on the instance generated for the current node in the previous cycle.
-
If Node A (hourly) and Node B (hourly) share the same scheduling time, configure a cross-cycle dependency on A. This makes the 02:00 instance of Node A depend on the 01:00 instance of Node B, rather than waiting for the same-hour instance. The same logic applies to minute-scheduled nodes.
Step 3: Verify the dependencies
After configuring, verify that dependencies work as expected before and after deployment.
| Method | What it checks | When to use |
|---|---|---|
| Preview scheduling dependencies | Shows which ancestor instance each descendant instance depends on — catches cross-frequency mapping errors before deployment | Configure dependencies between nodes with different scheduling frequencies (e.g., a daily node depending on an hourly node), or configure cross-cycle dependencies |
| Compare code parsing results | Shows the diff between current and new dependencies — confirms that your changes won't break data in production | Modify dependencies on a node that uses automatic code parsing |
| View Auto Triggered Nodes page after deployment | Shows the actual dependency graph in the production environment — catches mismatches between development config and production state | After deploying a node to verify the production environment reflects the intended dependencies |
Verify in the production environment:
In a workspace running in standard mode, development and production environments can have different dependency configurations. After you deploy a node on DataStudio, go to the Auto Triggered Nodes page in Operation Center and expand the ancestor/descendant view to confirm the production dependencies.
Also check that the partition data generated by ancestor nodes matches what the current node expects. A correctly configured dependency doesn't guarantee partition alignment.
The Auto Triggered Nodes page shows the latest node status in production. Whether instances are added or removed depends on the Instance Generation Mode. For details, see the Instance generation mode section.
If the deployment procedure is blocked by a process control approval, go to the Cycle Task page in Operation Center to check dependency configuration and deployment status. See Deploy nodes.
Configure custom dependencies
Use custom dependencies when no strong lineage exists between nodes, or when the upstream data isn't produced by an auto triggered node.
Rule: DataWorks can only track tables generated by its own auto triggered nodes. Any table produced outside this mechanism — regardless of update frequency — cannot participate in scheduling dependency checks.
Common examples of tables that fall outside DataWorks scheduling:
-
Tables generated by real-time synchronization nodes
-
Tables uploaded from on-premises machines
-
Dimension tables
-
Tables generated by manually triggered nodes
-
Tables periodically updated by external systems, not by auto triggered nodes in DataWorks
For these cases, use one of the following options:
Specify the workspace root node as the ancestor
Use this when a data synchronization node depends on other business databases, or an SQL node processes data from a real-time synchronization node. Setting the root node as the ancestor lets the descendant node run at its scheduled time without waiting for upstream data readiness checks.
Specify a zero load node as the ancestor
Use a zero load node when a workflow has many nodes or complex relationships. A zero load node acts as a coordination point — you can use it to set scheduling times, batch-freeze nodes, or centralize dependency management. This makes data forwarding paths in the workspace easier to trace.
Usage notes
Node uniqueness
A node can have different dependency configurations in the development and production environments, but the node itself must be unique across both. Before undeploying a node:
-
Remove all descendant nodes from both the development and production environments.
-
Reconfigure a replacement ancestor node for those descendants.
-
Commit and deploy all changes.
This ensures descendant nodes continue to get valid data after the undeploy.
Instance generation mode
Set the Instance Generation Mode parameter to the same value for a node and all its ancestor nodes.
If ancestor nodes are set to Immediately After Deployment but the descendant is set to Next Day, the descendant may become an isolated node.
If you change an existing node's scheduling frequency and set Instance Generation Mode to Immediately After Deployment, previously generated instances are not automatically removed. This can create complex dependency states for instances generated on the current day. See Impacts on scheduling dependencies when Instance Generation Mode is set to Immediately After Deployment.
200-upstream limit
If you get the error 'One file could not have more than 200 inputs' when calling the API to update a job, the node has exceeded 200 direct upstream dependencies.
To resolve this, use DataStudio to insert a zero load node between the ancestor and descendant nodes, reducing the number of direct dependencies on the current node.