You can define mutually dependent nodes, including a loop decision node named end, in a do-while node. DataWorks repeatedly runs the nodes and exits the loop only when the end node returns False.

Note
  • You can only use do-while nodes in DataWorks Standard Edition or higher.
  • A loop can be repeated for a maximum of 128 times. If the loop count exceeds this limit, an error occurs.

The do-while node supports the ODPS SQL, SHELL, and Python languages. If you use ODPS SQL, you can use a CASE WHEN statement to evaluate whether the specified condition for exiting the loop is met.

Simple example

This section describes how to use a do-while node to repeat a loop five times and display the loop count each time the loop runs.
  1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.
  2. On the Data Analytics tab that appears, move the pointer over the Create icon icon and choose General > do-while.

    Alternatively, you can click a workflow in the left-side navigation pane, right-click General, and then choose Create > do-while.

  3. In the Create Node dialog box that appears, set Node Name and Location, and then click Commit.
    Note The node name must be 1 to 128 characters in length.
  4. Double-click the created do-while node. On the node configuration tab that appears, define the loop body.
    By default, the do-while node consists of the start, SQL, and end nodes.
    • The start node marks the startup of a loop and does not have any business effect.
    • DataWorks provides the SQL node as a sample business processing node. You must replace the SQL node with your own business processing node, for example, a Shell node named Display loop count.
    • The end node marks the end of a loop and determines whether to start the loop again. In this example, it defines the condition for exiting the loop for the do-while node.

      The end node is an assignment node. It only generates True or False, indicating whether to start the loop again or exit the loop.

      The ${dag.loopTimes} variable is used in both the Display loop count node and the end node. It is a reserved variable of DataWorks. This variable indicates the loop count and the value increments from 1. The internal nodes of the do-while node can directly reference this variable.

      In the code shown in the preceding figure, the value of the dag.loopTimes variable is compared with 5 to limit the loop count. The value of the dag.loopTimes variable is 1 when the loop runs for the first time and is incremented by 1 each time, for example, 2 for the second time. When the loop runs for the fifth time, the value is 5. In this case, the conditional statement ${dag.loopTimes}<5 returns False and the do-while node exits the loop.

  5. Run the do-while node.

    You can configure the scheduling properties for the do-while node as needed and commit it to Operation Center for running.

    • do-while node: The do-while node appears as a whole node in Operation Center. To view the loop details about the do-while node, right-click the node in the directed acyclic graph (DAG) and select View Internal Nodes.
    • Internal loop body: This view is divided into three parts.
      • The left pane of the view lists the rerun history of the do-while node. A record is generated each time a do-while node instance is run.
      • The middle pane of the view shows a loop record list. A record is generated each time the loop of the do-while node runs. The running status of each time also appears.
      • The right pane of the view shows the details about the do-while node each time the loop runs. You can click a record in the loop record list to view the running details.
  6. View the running result.

    View the internal loop body. In the loop record list, click the record corresponding to the third time. The loop count is 3 in the runtime logs.

    You can also view the runtime logs of the end node that are generated when the loop runs for the third time and for the fifth time, respectively.

    As shown in the preceding figures, the conditional statement 3<5 returns True when the loop runs for the third time, and the conditional statement 5<5 returns False when the loop runs for the fifth time. Therefore, the do-while node exits the loop after the loop runs for the fifth time.

Based on the preceding simple example, the do-while node works in the following way:
  1. Run from the start node.
  2. Run nodes in sequence based on the defined node dependencies.
  3. Define the condition for exiting the loop in the end node.
  4. Run the conditional statement of the end node after the loop ends for the first time.
  5. Record the loop count as 1 and start the loop again if the conditional statement returns True in the runtime logs of the end node.
  6. Exit the loop if the conditional statement returns False in the runtime logs of the end node.

Complex example

In addition to simple scenarios, do-while nodes can also be used in complex scenarios where each row of data is processed in sequence by using a loop. Before processing data in such scenarios, make sure that:
  • You have deployed a parent node that can export queried data to the do-while node. You can use an assignment node to meet this condition.
  • The do-while node can obtain the output of the parent node. You can configure the node context and dependencies to meet this condition.
  • The internal nodes of the do-while node can reference each row of data. In this example, the existing node context is enhanced and the system variable ${dag.offset} is used to reference the context of the do-while node.
This section describes how to use the do-while node to respectively display data 0 and 1 in two rows of the tb_dataset table each time the loop runs.
  1. On the Data Analytics tab, move the pointer over the Create icon icon and choose General > do-while.

    Alternatively, you can click a workflow in the left-side navigation pane, right-click General, and then choose Create > do-while.

  2. In the Create Node dialog box that appears, set the parameters and click Commit.
  3. Double-click the created do-while node. On the node configuration tab that appears, define the loop body.
    1. Create an assignment node named Initialize dataset and add it as the parent node of the do-while node. The parent node generates a test dataset.
    2. On the Properties tab of the do-while node, define an input parameter in the Parameters section. Set Parameter Name to input and Value Source to the output of the parent node.
    3. Write code for the business processing node named Print each data row.
      • ${dag.offset}: a reserved variable of DataWorks. This variable indicates the offset of the loop count to 1. For example, the offset is 0 when the loop runs for the first time and 1 for the second time. The offset equals the loop count minus 1.
      • ${dag.input}: the context that you configure for the do-while node. In the preceding steps, the input parameter is defined for the do-while node and the value of the input parameter is the output of the parent node named Initialize dataset.

        The internal nodes of the do-while node can directly use ${dag.${ctxKey}} to reference the context. In this example, ${ctxKey} is set to input. Therefore, you can use ${dag.input} to reference the context.

      • ${dag.input[${dag.offset}]}: the data obtained from the table generated by the Initialize dataset node. DataWorks can obtain a row of data from the table based on the specified offset. The value of the ${dag.offset} variable increments from 0. Therefore, the data such as ${dag.input[0]} and ${dag.input[1]} is returned until all data in the dataset is returned.
    4. Define the condition for exiting the loop in the end node. As shown in the following figure, the values of the ${dag.loopTimes} and ${dag.input.length} variables are compared. If the value of the former is less than that of the latter, the end node returns True and the do-while node continues the loop. Otherwise, the end node returns False and the do-while node exits the loop.
      Note The system automatically sets the ${dag.input.length} variable to the number of rows in the array specified by the input parameter based on the context configured for the do-while node.
  4. Run the do-while node and view the running result.

    The loop count is smaller than the number of data rows when the loop runs for the first time. Therefore, the end node returns True and the loop continues. The loop count equals the number of data rows when the loop runs for the second time. Therefore, the end node returns False and the loop ends.

Summary

  • Compared with the while, foreach, and do...while statements, a do-while node has the following characteristics:
    • A do-while node contains a loop body that runs a loop before evaluating the conditional statement. This node functions the same as the do...while statement. A do-while node can also use the system variable ${dag.offset} and the node context to implement the function of the foreach statement.
    • A do-while node cannot achieve the function of the while statement because a do-while node runs a loop before evaluating the conditional statement.
  • A do-while node works in the following way:
    1. Run nodes in the loop body starting from the start node based on node dependencies.
    2. Run the code defined for the end node.
      • Run the loop again if the end node returns True.
      • Exit the loop if the end node returns False.
  • How to use the node context: The internal nodes of a do-while node can use ${dag.${ctxKey}} to reference the context defined for the do-while node.
  • System parameters: DataWorks provides the following system variables for the internal nodes of the do-while node:
    • ${dag.loopTimes}: the loop count, starting from 1.
    • ${dag.offset}: the offset of the loop count to 1, starting from 0.