This topic provides examples in simple and complex scenarios to describe how to configure a do-while node.

Prerequisites

DataWorks Standard Edition or a more advanced edition is activated so that you can use a do-while node.

Background information

You can define mutually dependent nodes, including a loop decision node named end, in a do-while node. DataWorks repeatedly runs the nodes and exits the loop only when the end node returns False.
Do-while nodes support the MaxCompute SQL, SHELL, and Python assignment languages. If you use MaxCompute SQL, you can use a CASE WHEN statement to evaluate whether the specified condition is met.
Note A do-while node can be repeated for a maximum of 128 times. If the loop count exceeds this limit, an error occurs.

Create a do-while node

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. On the Data Analytics tab, move the pointer over the Create icon and choose General > do-while.
    Alternatively, you can find the required workflow, right-click General, and then choose Create > do-while.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.

Simple example of a do-while node

This section describes how to use a do-while node to repeat a loop five times and display the loop count each time the loop is run.

  1. Double-click the name of the do-while node. The configuration tab of the node appears.
    By default, the do-while node consists of the start, sql, and end nodes.
    • The start node marks the startup of a loop and does not have any business effect.
    • DataWorks provides the sql node as a sample business processing node. You must delete the sql node and replace it with your own business processing node, for example, a Shell node named Display loop count.
    • The end node marks the end of a loop and determines whether to start the loop again. In this example, it defines the condition for exiting the loop for the do-while node.
  2. Delete the sql node.
    1. Right-click the sql node in the middle of the do-while node and select Delete Node.
      Delete node
    2. In the Delete message, click OK.
  3. Create and configure a Shell node.
    1. Choose General > Shell and drag Shell to the canvas on the right.
      Shell
    2. In the Create Node dialog box, enter a name in the Node Name field.
      Notice The node name must be 1 to 128 characters in length. It can contain letters, digits, underscores (_), and periods (.).
    3. Click Commit.
    4. On the canvas of the do-while node, drag lines to configure the Shell node as the child node of the start node and as the parent node of the end node.
    5. Double-click the Shell node. The configuration tab of the Shell node appears.
    6. Enter the following code in the code editor:
      echo ${dag.loopTimes} ----Display the loop count.
      Note After you modify the code of the Shell node, save the modification. No message will appear reminding you to save the modification when you commit the node. If you do not save the modification, the code cannot be updated to the latest version in time.
  4. Configure the end node.
    1. Double-click the end node. The configuration tab of the node appears.
    2. Select Python from the Language drop-down list.
    3. Enter the following code to define the condition for exiting the loop for the do-while node:
      if ${dag.loopTimes}<5:
         print True;
      else:
         print False;

      The end node is an assignment node. It only generates True or False, which indicate whether to start the loop again or exit the loop.

      The ${dag.loopTimes} variable is used in both the Display loop count node and the end node. It is a reserved variable of the system. This variable indicates the loop count and the value increments from 1. All internal nodes of the do-while node can reference this variable.

      In the code, the value of the dag.loopTimes variable is compared with 5 to limit the loop count. The value of the ${dag.loopTimes} variable is 1 when the loop is run for the first time and is incremented by 1 each time, for example, 2 for the second time, 5 for the fifth time. The do-while node exits the loop when the result of ${dag.loopTimes}<5 is False.

  5. On the node configuration tab, click the Properties tab on the right side to set the scheduling properties for the node. For more information, see Basic properties.
  6. Click the Save icon in the toolbar.
  7. Commit the do-while node.
    Notice You must set the Rerun and Parent Nodes parameters before you can commit the node.
    1. Click the Commit icon in the toolbar.
    2. In the Commit dialog box, select the nodes that you want to commit and enter your comments in the Description field.
    3. Click Commit.
    In a workspace in standard mode, you must click Deploy in the upper-right corner after you commit the do-while node.
  8. Test the node and view the result.
    1. On the node configuration tab, click Operation Center in the upper-right corner to go to Operation Center.
    2. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Task.
    3. Select the do-while node. In the directed acyclic graph (DAG) on the right, right-click the assignment node and choose Run > Current and Descendent Nodes Retroactively.
    4. Refresh the Patch Data page. After the retroactive instance is run, click DAG of the instance.
    5. Right-click the do-while node and select View Internal Nodes.
      The internal loop body of the do-while node is divided into three parts:
      • The left pane of the view lists the rerun history of the do-while node. A record is generated each time a do-while node instance is run.
      • The middle pane of the view is a loop record list that shows all the existing loops of the do-while node and the status of each loop.
      • The right pane of the view shows the details about each loop. You can click a record in the loop record list to view the running details of each instance in the loop.
    6. On the internal node page, click Loop 3 on the left, right-click the Shell node, and then select View Runtime Log.
      On the operational log page, view the logs of the end node when the loop is run for the third time.

      View the logs of the end node when the loop is run for the fifth time.

      The do-while node exits the loop after the loop is run for the fifth time.

    Based on the preceding example, the do-while node works in the following manner:
    1. Run from the start node.
    2. Run nodes in sequence based on the defined node dependencies.
    3. Define the condition for exiting the loop in the end node.
    4. Run the conditional statement of the end node after the nodes are run.
    5. Record the loop count as 1 and start the loop again if the conditional statement returns True in the logs of the end node.
    6. Exit the loop if the conditional statement returns False in the logs of the end node.

Complex example of a do-while node

In addition to the preceding simple scenario, you may encounter complex scenarios where each data entry is processed in sequence by using a loop. Before you process data in such scenarios, make sure that:
  • You have deployed a parent node that can export queried data to the child node. You can use an assignment node to meet this condition.
  • The do-while node can obtain the output of the parent node. You can configure the node context and dependencies to meet this condition.
  • The internal nodes of the do-while node can reference each data entry. The existing node context is enhanced and the system variable ${dag.offset} is used to reference the context of the do-while node.

This section describes how to use a do-while node to display the data entries c, 0, 1 in the tb_dataset table in sequence by using loops.

  1. Create an assignment node and configure dependencies for the node.
    In this example, create an assignment node named Initialize Data Integration to create a test dataset and configure it as the parent node of a do-while node.
    1. Double-click the name of the workflow to which the do-while node belongs. The configuration tab of the workflow appears.
    2. Click Assignment Node and drag it into the canvas on the right.
    3. In the Create Node dialog box, enter a name in the Node Name field.
      Notice The node name must be 1 to 128 characters in length. It can contain letters, digits, underscores (_), and periods (.).
    4. Click Commit.
    5. Drag a line to configure the assignment node as the parent node of the do-while node.
  2. Double-click the name of the do-while node. The node configuration tab appears. Then, define the loop body.
    By default, the do-while node consists of the start, sql, and end nodes. You must delete the sql node and create a Shell node.
    1. Right-click the sql node in the middle of the do-while node and select Delete Node.
    2. In the Delete message, click OK.
    3. Choose General > Shell and drag Shell to the canvas on the right.
    4. In the Create Node dialog box, enter a name in the Node Name field.
      Notice The node name must be 1 to 128 characters in length. It can contain letters, digits, underscores (_), and periods (.).
    5. Click Commit.
    6. On the canvas of the do-while node, drag lines to configure the Shell node as the child node of the start node and as the parent node of the end node.
    7. Click Properties in the right-side navigation pane. In the Parameters section, click Create. Set the Parameter Name parameter to input and the Value Source parameter to the outputs parameter of the parent node named Initialize dataset. In other words, specify the output of the parent node as the input of the do-while node.
  3. Configure and save the Shell node.
    1. Double-click the Shell node. The configuration tab of the node appears. Enter the following code:
      echo ${dag.input[${dag.offset}]}
      Parameter description:
      • dag.offset: a reserved variable of DataWorks. This variable indicates the offset of the loop count to 1. For example, the offset is 0 when the loop is run for the first time, 1 for the second time, and 2 for the third time. The offset is n-1 when the loop is run for the nth time.
      • dag.input: the context that you configure for the do-while node. In the preceding steps, the input parameter is defined for the do-while node and the value of the input parameter is the output of the parent node named Initialize dataset.

        The internal nodes of the do-while node can use ${dag.${ctxKey}} to reference the context. In the preceding step, the context key is set to input. Therefore, you can use ${dag.input} to reference the context.

      • ${dag.input[${dag.offset}]}: The output of the Initialize dataset node is a table. DataWorks can obtain a data entry from the table based on an offset. The value of the ${dag.offset} variable increments from 0 each time the loop is run. Therefore, the data entries such as ${dag.input[0]} and ${dag.input[1]} are returned until all data entries in the dataset are returned.
    2. Click the Save icon in the toolbar.
  4. Define the exit condition for the end node.
    In this example, compare the values of the dag.loopTimes and dag.input.length variables. If the value of dag.loopTimes is less than dag.input.length, the end node returns True and the loop continues. Otherwise, the end node returns False and the loop stops.
    Note The ${dag.input.length} variable indicates the number of data entries in the array that is specified by the input parameter. It is automatically set by the system based on the context that you configure for the do-while node.
  5. Run the do-while node and view the result.
    The Initialize dataset node generates data entries 0 and 1.
    The end node returns two results:
    • Result that is returned when the end node is run for the first time
    • Result that is returned when the end node is run for the second time

    The loop count is less than the number of data entries when the loop is run for the first time. Therefore, the end node returns True and the loop continues. The loop count equals the number of data entries when the loop is run for the second time. Therefore, the end node returns False and the loop stops.

Summary

  • Comparison between a do-while node and the while, for-each, and do…while loop statements:
    • A do-while node contains a loop body that runs a loop before evaluation. This node functions the same way as the do...while statement. A do-while node can use the system variable ${dag.offset} and the node context to implement the feature of the foreach statement.
    • A do-while node cannot achieve the feature of the while statement because a do-while node runs a loop before evaluation.
  • Work procedure of a do-while node:
    1. Run nodes in the loop body in sequence by starting from the start node based on node dependencies.
    2. Run the code that is defined for the end node.
      • Run the loop again if the end node returns True.
      • Exit the loop if the end node returns False.
  • Method to use the node context: The internal nodes of a do-while node can use ${dag. context variable} to reference the context that is defined for the do-while node.
  • System parameters: DataWorks provides two system variables for the internal nodes of the do-while node:
    • dag.loopTimes: the loop count, which starts from 1.
    • dag.offset: the offset of the loop count to 1, which starts from 0.