Configure Same-Cycle Node Dependencies for Reliable DAG Execution - DataWorks

If same-cycle scheduling dependencies are configured for a node, the instance generated for the node in the current cycle depends on the data from the instance generated for another node in the same scheduling cycle. The current node can be run as expected only after the instance generated for another node is successfully run. If the current node needs to depend on the data in a table that is generated by another node in the same scheduling cycle, you can configure same-cycle scheduling dependencies for the current node. DataWorks allows you to configure same-cycle scheduling dependencies by using various methods and provides the dependency preview feature. You can view and adjust incorrect scheduling dependencies at the earliest opportunity to ensure that nodes are scheduled as expected. This topic describes the precautions, logic, and methods to configure same-cycle scheduling dependencies.

How it works

A same-cycle scheduling dependency connects two nodes that run in the same scheduling cycle. When Node A (the ancestor) produces a table that Node B (the descendant) reads, declaring that relationship as a dependency tells the scheduler to run Node A first. Node B starts only after Node A finishes successfully.

To declare the dependency, set the output of Node A as the input of Node B. Once the link is established, the directed acyclic graph (DAG) of the descendant node shows the dependency as a solid line.

Same-cycle dependencies (solid lines in the DAG) express a within-run ordering relationship: run A before B in the same scheduling cycle. This is different from cross-cycle dependencies, which express a temporal relationship: run B only after a specific instance of A from a previous cycle completes. Do not confuse the two. See When to use cross-cycle dependencies to determine which type applies to your scenario.

Configuration entry point

Go to the node in DataStudio and click the Properties tab in the right-side navigation pane. Scheduling dependencies are configured in the Dependencies section.

The Dependencies section has two subsections:

Parent Nodes: specify the ancestor nodes that the current node depends on. Enter the output name of each ancestor node.
Output Name of Current Node: specify the outputs of the current node so that other nodes can declare a dependency on it.

Important

Every node must have at least one ancestor node. If no table lineage exists for a node, set the root node or zero-load node of the workspace as its ancestor. For details, see the Scheduling dependency configuration guide. Ancestor nodes must be committed before you commit the current node. If you get an error stating that the output of an ancestor node does not exist, check whether the ancestor node has been committed. If the current node is configured as an ancestor node of a descendant node, the name of the output of the current node contains the name of the descendant node after the descendant node is committed. DataWorks does not allow you to manually modify the descendant node in the Output Name of Current Node section of the current node.

Configuration methods

DataWorks provides three ways to configure same-cycle scheduling dependencies. Use automatic parsing as the default approach — it reads your SQL code and wires up dependencies automatically. Fall back to the other methods when automatic parsing does not cover your scenario.

Method	When to use
Automatic parsing (recommended)	Default. Works for most ODPS SQL nodes where table lineage is clear from the code.
Draw lines on the workflow canvas	When you prefer a visual approach or when working in a workflow.
Manual configuration in Dependencies	When automatic parsing produces incorrect results, or when the source table is not generated by periodic scheduling.

Automatic parsing

Automatic parsing reads the SELECT and INSERT statements in your node code and constructs the dependency graph from table lineage:

Tables in SELECT statements are added to Parent Nodes.
Tables in INSERT statements are added to Outputs for the node.

Output names are formatted as projectname.tablename.

Override automatic parsing results with code comments

To remove or add inputs and outputs without disabling automatic parsing, add the following comments to your node code:

Comment	Effect
`--@exclude_input=projectname.tablename`	Remove a specific input
`--@exclude_output=projectname.tablename`	Remove a specific output
`--@extra_input=projectname.tablename`	Add an input
`--@extra_output=projectname.tablename`	Add an output

Limitations of automatic parsing

Temporary tables prefixed with t_ are excluded. DataWorks does not add them as inputs or outputs.
Auto recommendation is available only for ODPS SQL nodes.
Automatically recommended ancestor nodes are updated with a one-day delay. For a node to appear as a recommendation, it must be committed to the scheduling system on the previous day and have generated data on the current day.
Output names must be unique within the workspace. If two nodes have the same name in a workspace, rename the output of one of them manually.
If two nodes write to the same table, automatic parsing may report an error for one of them. See Can multiple nodes have the same output name?
For batch synchronization nodes, manually set the output table name to match the node name, or add the output table as a node output explicitly. Otherwise, downstream SQL nodes that depend on the batch synchronization node may fail to commit with the error: The output that is named in the ${projectname.tablename} format for the ancestor node does not exist.

Draw lines on the workflow canvas

On the configuration tab of a workflow, draw a line from an ancestor node to a descendant node. DataWorks automatically adds the ancestor node's output (suffixed with `_out`) as the input of the descendant node.

Removing a connection line from the workflow canvas also removes the corresponding scheduling dependency from the node configuration.

Manual configuration in Dependencies

In the Parent Nodes section of the Dependencies configuration, enter the output name of the ancestor node in projectname.tablename format to add it as an ancestor.

Use this method to correct dependencies that automatic parsing configured incorrectly, or to add dependencies on tables that are not produced by periodic scheduling.

When to use cross-cycle dependencies

Use same-cycle scheduling dependencies when both nodes run on the same schedule and produce data in the same cycle. Switch to cross-cycle scheduling dependencies in these situations:

A daily node needs data from yesterday's run of another daily node (the current node depends on the previous cycle's instance).
A daily node depends on an hourly node. By default, the daily node depends on all instances generated for the hourly node on the current day. If you need the daily node to wait for a specific hourly instance rather than all hourly instances, configure self-dependency for the hourly node so that you can control which hourly instance the daily node depends on.
An hourly or minute-level node needs to wait for its own previous-cycle instance to finish before it starts (self-dependency).

For self-dependency configuration, see Configure cross-cycle scheduling dependencies. For complex multi-schedule dependency scenarios, see Principles and samples of scheduling configurations in complex dependency scenarios.

Development best practices

Follow these guidelines to get the most out of automatic parsing and keep dependency graphs clean:

Node naming: Name each node after its output table. This keeps the default output name in sync with the table and makes automatic parsing reliable.
One writer per table: Do not write to the same table from multiple nodes. Automatic parsing may produce errors when it detects multiple writers for the same output.
Use the output table as the node output: Set the table a node generates as that node's declared output. This makes the lineage explicit and supports automatic downstream dependency resolution.

Verify the configuration

After configuring dependencies, confirm that the scheduling graph matches your expectations before deploying to production:

Preview scheduling dependencies: Use the preview feature to inspect the DAG for the node. This catches misconfigured dependencies before they delay production runs. See Preview scheduling dependencies of a node.
Commit the node: DataWorks validates dependencies at commit time and reports errors if an ancestor node's output cannot be found.
Confirm in Operation Center: After deployment, open Operation Center and inspect the auto triggered nodes in the production environment. Check that the scheduling dependencies between instances match your intent. Note that dependencies between instances are affected by the Instance Generation Mode parameter. See Confirm the scheduling dependencies.

Impacts of removing or modifying a node output

Important

Removing or renaming the output of a node that has descendant nodes can break those nodes. Notify descendant node owners before making changes and ask them to update their dependency configurations.

Specific impacts:

Removing an output has no effect on the table itself. The underlying data is not deleted.
Removing an output from a node that has descendant nodes may cause those descendant nodes to become isolated nodes, which cannot be scheduled. Downstream data may also be affected due to missing scheduling dependencies.
Modifying an output (for example, renaming a table) transfers the dependency to the new output. For details on how to safely transfer outputs, see Impacts of removing or changing the output of a node.

If you must remove an output, first remove the dependency on the current ancestor node from all descendant nodes, then remove the output.

What's next

FAQ

What do I do if the output of an ancestor node does not exist?

The ancestor node has not been committed. Commit the ancestor node first, then retry committing the current node. See What do I do if the output of a node on which the current node depends does not exist?

What do I do if the output name of a node is not unique?

See What do I do if the output name of a node is not unique?

For more frequently asked questions, see Scheduling dependencies.