Validate Node Dependencies by Comparing Code Parsing Results - DataWorks

When automatic parsing is enabled for a node, DataWorks parses the node's inputs and outputs based on table lineage in the node code each time you commit the node. If the resulting scheduling dependencies differ from those in the production environment, DataWorks prompts you to review the changes before the commit takes effect.

How it works

The dependency comparison triggers when all three conditions are met:

Automatic parsing is enabled for the node.
You commit the node.
The scheduling dependencies derived from the current code differ from those in the production environment.

When a difference is detected, DataWorks displays a message listing which inputs or outputs were added or removed. Review the changes and decide whether to accept them. After you commit the node, the updated inputs and outputs are automatically reflected in Parent Nodes and Output in the Dependencies section of the Properties tab.

If automatic parsing is not enabled, configure scheduling dependencies manually and track changes by comparing node versions.

Limitations

DataWorks cannot compare code parsing results for cross-cycle scheduling dependencies. If your node has cross-cycle scheduling dependencies, verify that those dependencies meet your expectations before committing the node.

Review dependency changes before committing

When a difference is detected, check whether the current scheduling dependencies match your business requirements before accepting them. Inappropriate changes — especially on nodes with many descendant nodes — can disrupt data generation across the entire downstream pipeline. Avoid changing scheduling dependencies unless necessary.

The two scenarios below describe common types of dependency changes and their potential impact.

Scenario 1: A required ancestor node is missing from the current dependencies

If the current scheduling dependencies no longer include an input that exists in the production environment, check whether the ancestor node that produces that input was intentionally removed.

Example: Suppose input A — the output of an ancestor node — is present in production but missing from the current dependencies. If the node's code still reads from table A via a SELECT statement, but the node that generates table A is not listed as an ancestor node, the current node may start running before table A data is ready. This causes the node to fail or produce incorrect data.

Scenario 2: An output of the current node is removed

If the current scheduling dependencies no longer include an output that exists in the production environment, check whether removing that output will break downstream nodes that depend on it.

Example: Suppose output B is present in production but removed from the current dependencies, and a descendant node such as rpt_user_info_d depends on output B. Removing the output can isolate rpt_user_info_d from the scheduling pipeline, preventing it from running as expected or causing errors when it tries to fetch data.

Important

If you accept the current scheduling dependencies and the node's output is removed, confirm the impact on all descendant nodes before proceeding. For details, see Impacts of removing or changing the output of a node.