This topic describes how to create and configure a do-while node to be used in simple and complex scenarios.

Prerequisites

DataWorks Standard Edition or higher is activated.

Background information

You can define mutually dependent nodes, including a loop decision node named end, in a do-while node. DataWorks repeatedly runs the nodes and exits the loop only when the end node returns False.
Do-while nodes support the MaxCompute SQL, SHELL, and Python languages. If you use MaxCompute SQL, you can use a CASE WHEN statement to evaluate whether the specified condition for exiting the loop is met.
Note A loop can be repeated for a maximum of 128 times. If the loop count exceeds this limit, an error occurs.

Create a do-while node

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. On the Data Development tab, move the pointer over the Create icon icon and choose Universal > do-while.
    Alternatively, you can click a workflow in the Business process section, right-click General, and then choose New > do-while.
  3. In the New node dialog box, set the Node name and Destination folder parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
  4. Click Commit.

Simple example

This section describes how to use a do-while node to repeat a loop five times and display the loop count each time the loop is run.

  1. Double-click the do-while node in the Business process section. The configuration tab of the node appears.
    By default, the do-while node consists of the start, sql, and end nodes.
    • The start node marks the startup of a loop and does not have any business effect.
    • DataWorks provides the sql node as a sample business processing node. You must replace the sql node with your own business processing node, for example, a Shell node named Display loop count.
    • The end node marks the end of a loop and determines whether to start the loop again. In this example, it defines the condition for exiting the loop for the do-while node.
  2. Delete the sql node.
    1. Right-click the sql node and select Delete node.
      Delete node
    2. In the Delete message, click Confirm.
  3. Create and configure a Shell node.
    1. On the configuration tab of the do-while node, drag Shell under Universal to the canvas on the right.
      Shell
    2. In the New node dialog box, set the Node name parameter.
      Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
    3. Click Submit.
    4. On the configuration tab of the do-while node, drag directed lines to configure the Shell node as the child node of the start node and as the parent node of the end node.
    5. Double-click the Shell node. The configuration tab of the Shell node appears.
    6. Enter the following code in the code editor:
      echo ${dag.loopTimes} ----Display the loop count.
      Note After you modify the code of the Shell node, save the modification. No message will appear reminding you to save the modification when you commit the node. If you do not save the modification, the code cannot be updated to the latest version in time.
  4. Configure the end node.
    1. On the configuration tab of the do-while node, double-click the end node. The configuration tab of the end node appears.
    2. Select Python from the Please select an assignment language drop-down list.
    3. Enter the following code in the code editor to define the condition for exiting the loop:
      if ${dag.loopTimes}<5:
         print True;
      else
         print False;
      The end node is an assignment node. It only generates True or False, which indicate whether to start the loop again or exit the loop, respectively.Python

      The ${dag.loopTimes} variable is used in both the Display loop count node and the end node. It is a reserved variable of DataWorks. This variable indicates the loop count and the value increments from 1. All internal nodes of the do-while node can reference this variable.

      In the code shown in the preceding figure, the value of the ${dag.loopTimes} variable is compared with 5 to limit the loop count. The value of the ${dag.loopTimes} variable is 1 when the loop is run for the first time and is incremented by 1 each time, for example, 2 for the second time. In the fifth loop, the value is 5. In this case, the result of ${dag.loopTimes}<5 is False, and the do-while node exits the loop.

  5. On the node configuration tab, click the Scheduling configuration tab in the right-side navigation pane. On the Scheduling configuration tab, set the scheduling properties for the node. For more information, see Basic properties.
  6. Click the Save icon icon in the toolbar.
  7. Commit the do-while node.
    Notice You must set the Rerun attribute and Dependent upstream node parameters on the Scheduling configuration tab before you can commit the node.
    1. Click the Submit icon icon in the toolbar.
    2. In the Submit dialog box, select the nodes to commit and enter your comments in the Remarks field.
    3. Click Submit.
    In a workspace in standard mode, you must click Publish in the upper-right corner after you commit the do-while node.
  8. Test the do-while node and view the running result.
    1. On the configuration tab of the do-while node, click Operation & Maintenance (O & M) in the upper-right corner to go to Operation Center.
    2. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Task.
    3. On the page that appears, click the name of the do-while node that you created in the node list. In the directed acyclic graph (DAG) on the right, right-click the do-while node and choose Run > Current and Descendant Nodes Retroactively. In the Patch Data dialog box, select the do-while node and click OK.
    4. On the Patch Data page, wait until the retroactive data generation instance is run and click DAG in the ACTIONS column of the instance.
    5. In the DAG that appears, right-click the do-while node and select View Internal Nodes. The view of the internal loop body of the do-while node appears.
      The view is divided into three parts.
      • The left pane of the view lists the rerun history of the do-while node. A record is generated each time a do-while node instance is run.
      • The middle pane of the view shows a loop record list. A record is generated each time the loop of the do-while node is run. The running status of each loop also appears.
      • The right pane of the view shows the details about the do-while node each time the loop is run. You can click a record in the loop record list to view the running details.
    6. In the view, click Loop 3 in the middle pane, right-click the end node in the right pane, and then select View Runtime Log.
      On the page that appears, view the operational logs of the end node when the loop is run for the third time.

      View the operational logs of the end node when the loop is run for the fifth time.

      The do-while node exits the loop after the loop is run for the fifth time.

    Based on the preceding simple example, the do-while node works in the following way:
    1. Run from the start node.
    2. Run nodes in sequence based on the defined node dependencies.
    3. Define the condition for exiting the loop in the end node.
    4. Run the conditional statement of the end node after the loop is run for the first time.
    5. Record the loop count as 1 and start the loop again if the conditional statement returns True in the operational logs of the end node.
    6. Exit the loop if the conditional statement returns False in the operational logs of the end node.

Complex example

A do-while node can also be used in complex scenarios where each data entry is processed in sequence by using a loop. Before you process data in such scenarios, make sure that:
  • You have deployed a parent node that can export queried data to the do-while node. You can use an assignment node to meet this condition.
  • The do-while node can obtain the output of the parent node. You can configure the node context and dependencies to meet this condition.
  • The internal nodes of the do-while node can reference each data entry. In this example, the existing node context is enhanced and the system variable ${dag.offset} is used to reference the context of the do-while node.

This section describes how to use an existing do-while node to display the data entries in a table in sequence until all data entries in the table are displayed. Each time the loop is run, a data entry is displayed.

  1. Create an assignment node and configure dependencies for the node.
    In this example, create an assignment node named Initialize dataset and configure it as the parent node of a do-while node.
    1. Double-click the workflow to which the do-while node belongs. The configuration tab of the workflow appears.
    2. Click Assignment node in the left-side navigation tree.
    3. In the New node dialog box, set the Node name parameter.
      Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
    4. Click Submit.
    5. Drag a directed line to configure the assignment node as the parent node of the do-while node.
  2. Double-click the do-while node. The configuration tab of the node appears.
    By default, the do-while node consists of the start, sql, and end nodes. You must delete the sql node and create a Shell node.
    1. Right-click the sql node and select Delete node.
    2. In the Delete message, click Confirm.
    3. Drag Shell under Universal to the canvas on the right.
    4. In the New node dialog box, set the Node name parameter.
      Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
    5. Click Submit.
    6. Drag directed lines to configure the Shell node as the child node of the start node and as the parent node of the end node.
    7. Click Scheduling configuration in the right-side navigation pane. On the Scheduling configuration tab, click Add under Input parameters of this node in the Node context section. Set Parameter name to input and Value source to the outputs parameter of the parent node named Initialize dataset.
  3. Configure and save the Shell node.
    1. Double-click the Shell node. The configuration tab of the node appears. Enter the following code in the code editor:
      echo ${dag.input[${dag.offset}]}
      Parameter description:
      • ${dag.offset}: a reserved variable of DataWorks. This variable indicates the offset of the loop count to 1. For example, the offset is 0 when the loop is run for the first time and 1 for the second time. The offset equals the loop count minus 1.
      • ${dag.input}: the context that you configure for the do-while node. In the preceding steps, the input parameter is defined for the do-while node and the value of the input parameter is the output of the parent node named Initialize dataset.

        The internal nodes of the do-while node can use ${dag.${ctxKey}} to reference the context. In this example, ${ctxKey} is set to input. Therefore, you can use ${dag.input} to reference the context.

      • ${dag.input[${dag.offset}]}: the data obtained from the table generated by the Initialize dataset node. DataWorks can obtain a data entry from the table based on the specified offset. The value of the ${dag.offset} variable increments from 0. Therefore, the data entries such as ${dag.input[0]} and ${dag.input[1]} are returned until all data entries in the dataset are returned.
    2. Click the Save icon icon in the toolbar.
  4. Define the condition for exiting the loop for the end node.
    In this example, configure the end node to compare the values of the ${dag.loopTimes} and ${dag.input.length} variables. If the value of ${dag.loopTimes} is less than that of dag.input.length, the end node returns True and the loop continues. Otherwise, the end node returns False and the loop stops.
    Note The system automatically sets the ${dag.input.length} variable to the number of data entries in the array specified by the input parameter based on the context configured for the do-while node.
  5. Run the do-while node and view the running result.
    The Initialize dataset node generates data entries 0 and 1.
    • Operational logs of the Shell node when the loop is run for the first time.
    • Operational logs of the Shell node when the loop is run for the second time.
    The end node is run twice before it exits the loop.
    • Runtime logs of the end node when the loop is run for the first time.
    • Runtime logs of the end node when the loop is run for the second time.

    The loop count is less than the number of data entries when the loop is run for the first time. Therefore, the end node returns True and the loop continues. The loop count equals the number of data entries when the loop is run for the second time. Therefore, the end node returns False and the loop stops.

Summary

  • Compared with the while, foreach, and do...while statements, a do-while node has the following characteristics:
    • A do-while node contains a loop body that runs a loop before evaluating the conditional statement. This node functions the same way as the do...while statement. A do-while node can also use the system variable ${dag.offset} and the node context to implement the feature of the foreach statement.
    • A do-while node cannot achieve the feature of the while statement because a do-while node runs a loop before evaluating the conditional statement.
  • A do-while node works in the following way:
    1. Run nodes in the loop body starting from the start node based on node dependencies.
    2. Run the code defined for the end node.
      • Run the loop again if the end node returns True.
      • Exit the loop if the end node returns False.
  • How to use the node context: The internal nodes of a do-while node can use ${dag.${ctxKey}} to reference the context defined for the do-while node.
  • System parameters: DataWorks provides the following system variables for the internal nodes of the do-while node:
    • ${dag.loopTimes}: the loop count, starting from 1.
    • ${dag.offset}: the offset of the loop count to 1, starting from 0.