This topic describes the FAQ about dependencies.

After DataWorks automatically parses the input and output of a node, the node fails to be committed. An error message appears, indicating that the output name of the parent node does not exist. Why does this error occur?

Commitment failed
Possible causes are as follows:
  • The parent node is not committed. Commit the parent node and try again.
  • The parent node is committed, but workshop_yanshi.tb_2 is not an output name of the parent node.
Note Generally, the output names of the parent node and the current node are automatically parsed based on the table name that is specified in the INSERT or CREATE statement or follows the FROM keyword. Make sure that you follow the principles of automatic dependency parsing.

In the output of the current node, the child node name and ID are empty and cannot be specified. Why does this happen?

If the current node does not have a child node, the child node name and ID are empty. After a child node is configured for the current node, the corresponding content can be automatically parsed.

What is the output name of a node used for?

The output name of a node is used to establish dependencies between nodes. For example, if the output name of Node A is ABC and Node B uses ABC as its input name, a dependency is established between Nodes A and B.

Can a node have multiple output names?

Yes, a node can have multiple output names. If a descendant node references an output name of the current node as the output name of its parent node, a dependency is established between the descendant node and the current node.

Can multiple nodes have the same output name?

No, the output name of each node must be unique under your Alibaba Cloud account. If you want multiple nodes to export data to the same MaxCompute table, we recommend that you use Table name_Partition ID as the output name format of these nodes.

How can I avoid intermediate tables when I enable DataWorks to automatically parse node dependencies?

Right-click an intermediate table name in the SQL code and select Delete Input or Delete Output. Then, click Parse I/O to parse the input and output of the node.

How do I configure dependencies for the upmost node in a workflow?

You can set the node to depend on the root node of the current workspace.

Why do I find a non-existent output name of Node B when I enter an output name to search for the parent node for Node A?

DataWorks searches for the specified output name among the output names of nodes that have been committed to the scheduling system. After Node B is committed, if you delete the output name of Node B and does not commit Node B to the scheduling system again, the deleted output name of Node B can still be found.

How do I enable Nodes A, B, and C to run in sequence once per hour?

Set the output of Node A as the input of Node B and the output of Node B as the input of Node C. Also, set Nodes A, B, and C to run once per hour.

What can I do if the parent node ID fails to be parsed based on an output table name?

This error does not indicate that the table does not exist. Instead, it indicates that the table is not set as the output of a specific node. Therefore, the table name cannot be used to find the node that generates the table data. In this case, the dependency on the node cannot be created.

According to the principles of automatic dependency parsing, a dependency is created after the output of an ancestor node is set as the input of a descendant node. For example, if no ancestor node can be parsed based on the xc_demo_partition table referenced in SQL statements, no node uses the xc_demo_partition table as its output.

You can resolve this issue in the following way:
  1. Find the node that generates the table data and view the node output.

    If you do not know which the target node is, you can enter keywords to search the code for the node in fuzzy match mode.

  2. If the table data is uploaded from a local server or you do not need to depend on the node, you can right-click the table name in the code and select Delete Input.
Note We recommend that you specify the correct lineage in the code of nodes to reduce custom dependencies.

Assume that Node A scheduled by day depends on Node B scheduled by hour. How do I enable Node A to run at 12:00 every day instead of after all 24 instances of Node B are run?

For Node B, select Cross-Cycle Dependencies and select Instances of Current Node from the Depend On drop-down list. For Node A, set Run At to 12:00 and clear Cross-Cycle Dependencies.

The scheduling system runs Node A after the instance of Node B generated for 12:00 is run.

How do I configure a node scheduled by day to depend on the data generated on the day before for a node scheduled by hour?

For the node scheduled by day, select Cross-Cycle Dependencies, select Instances of Custom Nodes from the Depend On drop-down list, and then enter the ID of the node scheduled by hour.

How do I configure the current node to depend on the data generated by its instances in the last cycle if I do not know when the data in the last cycle is generated?

Select Cross-Cycle Dependencies and select Instances of Current Node from the Depend On drop-down list for the current node.

Assume that Node A scheduled by hour depends on Node B scheduled by day. After the instance of Node B is run, the running time of multiple instances of Node A arrives. As a result, multiple instances of Node A are run concurrently. How do I resolve this issue?

Select Cross-Cycle Dependencies and select Instances of Current Node from the Depend On drop-down list for Node A scheduled by hour.