This topic describes the definition of merge nodes and how to create a merge node and define the merging logic. This topic also provides an example to show the scheduling configuration and running details of a merge node.

Prerequisites

DataWorks Standard Edition or a more advanced edition is activated so that you can use merge nodes.

Background information

A merge node is a logical control node in DataStudio. It can merge the running results of its parent nodes, regardless of their running status. It aims to facilitate the running of nodes that depend on the output of the child nodes of a branch node.

You cannot change the running status of a merge node. A merge node merges the running results of multiple child nodes of a branch node and sets the running status to Successful. To ensure the proper running of a node that depends on the output of the child nodes of a branch node, you can configure the node to directly depend on a merge node.

For example, Branch node C has two logically exclusive branches C1 and C2. These two branches use different logic to write data to the same MaxCompute table. Assume that Node B depends on the output of this MaxCompute table. To ensure that Node B can be run as expected, you must use Merge node J to merge the running results of branches C1 and C2, and then configure Merge node J as the parent node of Node B. If Node B directly depends on branches C1 and C2, one of the branches will fail to be run because only one branch meets the branch condition each time Branch node C is run. In this case, Node B and its descendant nodes cannot be triggered as scheduled.

Create a merge node

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Move the pointer over the Create icon and choose General > MERGE Nodes.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.

Define the merging logic

After the merge node is created, the node configuration tab appears. Specify the branches to be merged for the node. Enter the output name or output table name of each parent node, and click the Add icon. You can view the running status in the Result section. The available running states are Failed, Successful, and Branch Not Running.

On the node configuration tab, click the Properties tab in the right-side navigation pane. Configure the scheduling properties of the merge node.

Example of the merge node

You can associate child nodes with different outputs of a branch node to define the branches under different conditions. For example, in the workflow that is shown in the following figure, the branches Branch_1 and Branch_2 are defined as the child nodes of the branch node. Example
Branch_1 depends on the output that is named autotest.fenzhi121902_1. Dependencies
Branch_2 depends on the output that is named autotest.fenzhi121902_2. Output

Run nodes

The condition of Branch_1 is met. The child node of this branch is run. You can select the branch and view the running details of the child node on the Runtime Log tab.

The condition of Branch_2 is not met. The child node of this branch is skipped. You can select the branch and view relevant information on the Runtime Log tab.

The child node of the merge node is run as expected.