DataWorks provides do-while nodes for you. You can rearrange the business process inside a do-while node, write the logic to be executed in a loop in the node, and then configure an end node to determine whether to exit the loop. You can also use a do-while node together with an assignment node to loop through the result set that is passed by the assignment node. This topic provides examples in simple and complex scenarios to describe how to configure a do-while node.

Prerequisites

DataWorks Standard Edition or a more advanced edition is activated so that you can use a do-while node.

Background information

A do-while node in DataWorks is a special node that contains internal nodes. When you create a do-while node, the following three internal nodes are automatically created: the start node (loop start node), the sql node (loop task node), and the end node (loop end node). The internal nodes are organized into an internal node process to perform the task in a loop. Internal nodes of a do-while nodeYou can also customize the sql node and use the built-in variables provided by the do-while node to write the code of the end node that controls the number of times the loop is run. For more information about logic principles, see Logic principles. You can plan the business process based on actual situations. For more information about how to configure a do-while node, see the procedures in the following part:

Limits and usage nodes

  • Support for do-while nodes
    • You can use do-while nodes only in DataWorks Standard Edition or a more advanced edition.
    • A do-while node supports a maximum of 128 times the loop is run. If the number of times the loop is run determined by the end node exceeds 128, an error is returned.
  • Internal nodes
    • When you customize a do-while node, you can delete the dependencies between the internal nodes and rearrange the internal workflow of the do-while node. However, you must use the start node and the end node as the start and end nodes of the internal workflow of the do-while node.
    • When the internal nodes of a do-while node use a branch node to perform logical judgments or traverse results, a merge node also needs to be used.
    • You cannot add comments when you develop the code of the end node of a do-while node.
  • Test and running
    • If the workspace is in standard mode, you cannot directly test and run a do-while node in DataStudio.

      To test the do-while node and view the result, you must commit the do-while node to Operation Center and run the do-while node in Operation Center. If you use the value passed by an assignment node in the do-while node, run both the assignment node and do-while node during the test in Operation Center.

    • When you view the operational logs of a do-while node in Operation Center, right-click the do-while node and select View Internal Nodes to view the operational logs of the internal nodes.

Create a do-while node

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. In the Data Analytics pane, move the pointer over the Create icon and choose General > do-while.
    Alternatively, you can click the workflow in which you want to create a do-while node, right-click General, and then choose Create > do-while.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.

Simple example of a do-while node

This section describes how to use a do-while node to repeat a loop five times and display the loop count each time the loop is run.

  1. Double-click the name of the do-while node. The configuration tab of the node appears.
    By default, the do-while node consists of the start, sql, and end nodes.
    • The start node marks the startup of a loop and has no business effect.
    • DataWorks provides the sql node as a sample business processing node. You must replace the sql node with your own business processing node, such as a Shell node named Display loop count.
    • The end node marks the end of a loop and determines whether to start the loop again. In this example, the end node defines the condition for exiting the loop for the do-while node.
  2. Delete the sql node.
    1. Right-click the sql node in the middle of the do-while node and select Delete Node.
      Delete Node
    2. In the Delete message, click OK.
  3. Create and configure a loop task node. In this example, create and configure a Shell node.
    1. Choose General > Shell and drag Shell to the canvas on the right.
      Shell
    2. In the Create Node dialog box, enter a name in the Node Name field.
      Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
    3. Click Commit.
    4. On the canvas of the do-while node, drag lines to configure the Shell node as the child node of the start node and the parent node of the end node.
    5. Double-click the Shell node. The configuration tab of the Shell node appears.
    6. Enter the following code in the code editor:
      echo ${dag.loopTimes} ----Display the loop count. 
      • The ${dag.loopTimes} variable is a reserved variable of the system. This variable indicates the loop count, and the value increments from 1. All internal nodes of the do-while node can reference this variable. For more information about built-in variables, see Built-in variables and Examples of variable values.
      • After you modify the code of the Shell node, save the modification. No message will appear reminding you to save the modification when you commit the node. If you do not save the modification, the code cannot be updated to the latest version in time.
  4. Configure the end node to control the number of times the loop is run.
    1. Double-click the end node. The configuration tab of the node appears.
    2. Select Python from the Language drop-down list.
    3. Enter the following code to define the condition for exiting the loop for the do-while node:
      if ${dag.loopTimes}<5: 
       print True; 
      else: 
       print False;
      • The ${dag.loopTimes} variable is a reserved variable of the system. This variable indicates the loop count, and the value increments from 1. All internal nodes of the do-while node can reference this variable. For more information about built-in variables, see Built-in variables and Examples of variable values.
      • In the code, the value of the dag.loopTimes variable is compared with 5 to limit the loop count. The value of the ${dag.loopTimes} variable is 1 when the loop is run for the first time and is incremented by 1 each time, for example, 2 for the second time, 5 for the fifth time. The do-while node exits the loop when the result of ${dag.loopTimes}<5 is False.
  5. On the node configuration tab, click the Properties tab on the right side to set the scheduling properties for the node. For more information, see Basic properties.
  6. Click the Save icon in the top toolbar.
  7. Commit the do-while node.
    Notice You must set the Rerun and Parent Nodes parameters before you can commit the node.
    1. Click the Submit icon in the top toolbar.
    2. In the Commit dialog box, select the nodes that you want to commit and enter your comments in the Description field.
    3. Click Commit.
    In a workspace in standard mode, you must click Deploy in the upper-right corner after you commit the do-while node. For more information, see Deploy nodes.
  8. Test the node and view the result.
    Note If the workspace is in standard mode, you cannot directly test and run the do-while node in DataStudio.

    To test the do-while node and view the result, you must commit the do-while node to Operation Center and run the do-while node in Operation Center. If you use the value passed by the assignment node in the do-while node, run both the assignment node and do-while node during the test in Operation Center.

    1. On the node configuration tab, click Operation Center in the upper-right corner to go to Operation Center.
    2. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Task.
    3. Select the do-while node. In the directed acyclic graph (DAG) on the right, right-click the assignment node and choose Run > Current and Descendent Nodes Retroactively.
    4. Refresh the Patch Data page. After the retroactive instance is run, click DAG in the ACTIONS column of the instance.
    5. Right-click the do-while node and select View Internal Nodes.
      The internal loop body of the do-while node is divided into three parts:
      • The left pane of the view lists the rerun history of the do-while node. A record is generated each time a do-while node instance is run.
      • The middle pane of the view is a loop record list that shows all the existing loops of the do-while node and the status of each loop.
      • The right pane of the view shows the details about each loop. You can click a record in the loop record list to view the details of each instance in the loop.
    6. On the internal node page, click Loop 3 on the left, right-click the Shell node, and then select View Runtime Log.
    In the preceding example, the do-while node works in the following manner:
    1. Run from the start node.
    2. Run nodes in sequence based on the defined node dependencies.
    3. Define the condition for exiting the loop in the end node.
    4. Run the conditional statement of the end node after the nodes are run.
    5. Record the loop count as 1 and start the loop again if the conditional statement returns True in the logs of the end node.
    6. Exit the loop if the conditional statement returns False in the logs of the end node.

Complex example of a do-while node

In addition to the preceding simple scenario, you may encounter complex scenarios where each data entry is processed in sequence by using a loop. Before you process data in such scenarios, make sure that the following conditions are met:
  • You have deployed a parent node that can export queried data to the child node. You can use an assignment node to meet this condition.
  • The do-while node can obtain the output of the parent assignment node. You can configure the node context and dependencies to meet this condition.
  • The internal nodes of the do-while node can reference each data entry. The existing node context is enhanced, and the system variable ${dag.offset} is used to reference the context of the do-while node.
The following example shows how to configure a do-while node in a complex scenario: The preceding figure shows the following information:
  • The output of the assignment node is a two-dimensional array. The two-dimensional array is passed to the do-while node.
    Example value of the two-dimensional array:
    +----------------------------------------------+
    | uid            | region | age_range | zodiac |
    +----------------------------------------------+
    | 0016359810821  | Hubei Province | 30–40 years old   | Cancer (constellation) |
    | 0016359814159  | Unknown   | 30–40 years old   | Cancer (constellation) |
    +----------------------------------------------+
  • The internal nodes of the do-while node use variables to obtain and print the loop parameters, offsets, and parameter values of the input of the parent assignment node.
  1. Create and configure an assignment node.
    Key points:
    • Value assignment code and context parameters: Select the language of the assignment node and compile the code of the assignment parameter. The output of the assignment node is output parameters generated based on specific rules.
      Note The output of the assignment node is used as the input of the do-while node.
      Assignment node
    • Node dependencies: You can create an assignment node in the business process and drag a line to configure the assignment node as the parent node of the do-while node.
    For more information, see Configure an assignment node.
  2. Add the output of the assignment node as the input of the do-while node.
    On the configuration tab of the do-while node, click the Properties tab on the right side. In the Parameters section, click Create. Set the Parameter Name parameter to input and the Value Source parameter to the output parameter of the parent assignment node.
    Note The context relationship is the context parameter configuration between the assignment node and the do-while node, not between the internal nodes of the do-while node.
    Properties
  3. Configure the internal loop task node of the do-while node.

    Double-click the name of the do-while node. The node configuration tab appears. Then, define the loop body.

    By default, the do-while node consists of three nodes: the start, sql, and end nodes. You must delete the sql node, create a Shell node, and then compile the code of the Shell node to print the loop parameters. The following part describes the key points:
    • Node dependencies: After you delete the sql node and create a Shell node, you must drag lines to create dependencies between the internal nodes. Complex case-internal
    • Loop task code: When the code of the internal Shell node is compiled, you can use built-in variables to print various loop parameters. For more information about the built-in variables available for the do-while node, see Built-in variables. The following part shows the sample code of the Shell node.
      echo '${dag.input}';
      echo 'Obtain the row data of the current loop:'${dag.input[${dag.offset}]};
      
      echo 'Obtain the offset:'${dag.offset};
      
      echo 'Obtain the number of times the loop is run:'${dag.loopTimes};
      
      echo 'Obtain the length of the dataset passed by the parent assignment node odpssql:'${dag.input.length};
      
      echo 'If you want to select data in a specific row and a specific column in the dataset passed by the assignment node, select the value based on a two-dimensional array:'${dag.input[0][1]};
  4. Define the loop exit condition for the end node.
    You can use the built-in variables supported by the do-while node for loop control. In this example, compare the values of the dag.loopTimes and dag.input.length variables. The dag.loopTimes variable specifies the number of times the loop is run, and the dag.input.length variable specifies the length of the value of the input parameter. If the value of the dag.loopTimes variable is less than that of the dag.input.length variable, the end node returns True and the loop continues. Otherwise, the end node returns False and the loop stops. The following code is used in this example:
    if ${dag.loopTimes}<${dag.input.length}:
        print True;
    else:
        print False;
  5. Run the do-while node and view the result.
    After you go to Operation Center, right-click the node and choose Run > Current and Descendent Nodes Retroactively. Select the assignment node and the do-while node. After the nodes are run, you can view the result in the logs.
    Note
    • If you use the value passed by the assignment node in the do-while node, run both the assignment node and do-while node during the test in Operation Center.
    • When you view the operational logs of the do-while node in Operation Center, right-click the do-while node and select View Internal Nodes to view the operational logs of the internal nodes.
    • Output of the assignment node: Output of the assignment node
    • Result that is returned when the end node is run for the first time
    • Result that is returned when the end node is run for the second time

Summary

  • Comparison between a do-while node and the while, for-each, and do-while loop statements:
    • A do-while node contains a loop body that runs a loop before evaluation. This node functions the same way as the do-while statement. A do-while node can use the system variable ${dag.offset} and the node context to implement the feature of the for-each statement.
    • A do-while node cannot achieve the feature of the while statement because a do-while node runs a loop before evaluation.
  • Work procedure of a do-while node:
    1. Run nodes in the loop body in sequence by starting from the start node based on node dependencies.
    2. Run the code that is defined for the end node.
      • Run the loop again if the end node returns True.
      • Exit the loop if the end node returns False.
  • Method to use the node context: The internal nodes of a do-while node can use ${dag.context variable} to reference the context that is defined for the do-while node.
  • System parameters: DataWorks provides two system variables for the internal nodes of the do-while node:
    • dag.loopTimes: the loop count, which starts from 1.
    • dag.offset: the offset of the loop count to 1, which starts from 0.