This topic describes how to create a workflow, create nodes in the workflow, and configure the dependencies. After you create a workflow, you can use the DataStudio service to further compute and analyze data in the workspace.

Prerequisites

The table bank_data for storing business data and the table result_table for storing results are created in a workspace. Data is imported to the bank_data table. For more information, see Create tables and import data.

Background information

The DataStudio service in DataWorks allows you to configure node dependencies by dragging lines between nodes in a workflow. You can process data and configure dependencies in the data based on the workflow. You can create multiple workflows in a workspace. For more information, see Manage workflows.

Create a workflow

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. After you select the region where the required workspace resides, find the workspace and click Data Analytics.
  4. On the Data Analytics tab, move the pointer over the Create icon and select Workflow.
  5. In the Create Workflow dialog box, set the Workflow Name and Description parameters.
    Notice The workflow name must be 1 to 128 characters in length. It can contain letters, digits, underscores (_), and periods (.).
  6. Click Create.

Create nodes and configure dependencies

In the workflow, create a zero load node named start and an ODPS SQL node named insert_data, and configure the insert_data node to depend on the start node.
Notice
  • A zero load node is a control node that is used to maintain and control its descendant nodes in a workflow. A zero load node does not affect data.
  • If other nodes depend on a zero load node and the zero load node is set to Failed by an administration expert, the pending descendant nodes cannot be triggered. During the O&M process, a zero load node can be disabled to prevent errors of ancestor nodes from being further expanded.
  • Typically, the root node of the workspace is used as the ancestor node of a zero load node in a workflow. The root node of a workspace is named in the Workspace name_root format.
  • DataWorks automatically creates an output name for a node, in the Workspace name. Node name format. If a workspace contains two nodes with the same name, modify the output name of one of the nodes.

When you design a workflow, we recommend that you create a zero load node as the root node of the workflow to control the entire workflow. To design a workflow, perform the following steps:

  1. Double-click the name of a workflow to go to the configuration tab. Click Zero-Load Node and drag it to the canvas on the right.
    Zero-load node
  2. In the Create Node dialog box, set the Node Name parameter to start and click Commit.
    Notice The node name must be 1 to 128 characters in length. It can contain letters, digits, underscores (_), and periods (.).
  3. Use the same method to create an ODPS SQL node named insert_data.
  4. Drag a line to configure the start node as the parent node of the insert_data node.

Configure the parent node of the zero load node

In a workflow, a zero load node is often used to control the entire workflow and serves as the ancestor node of all nodes in the workflow.

Generally, a zero load node depends on the root node of the workspace.

  1. Double-click the name of the zero load node to go to the node configuration tab.
  2. Click Properties in the right-side navigation pane.
  3. In the Dependencies section, click Use Root Node to configure the root node of the workspace as the parent node of the zero load node.
  4. Click the Save icon in the toolbar.

Edit and run the ODPS SQL node

This section uses SQL code to query and save the number of singles with different education levels who loan to buy houses in the ODPS SQL node insert_data. The descendant nodes can continue to analyze and present the results.

  1. Go to the configuration tab of the ODPS SQL node and enter the following code.
    For more information about the syntax, see MaxCompute SQL overview.
    INSERT OVERWRITE TABLE result_table  -- Insert data to the result_table table.
    SELECT education
        , COUNT(marital) AS num
    FROM bank_data
    WHERE housing = 'yes'
        AND marital = 'single'
    GROUP BY education
  2. Right-click bank_data in the code and select Delete Input.
    Delete Input
  3. Click the Save icon in the top toolbar. This prevents code loss.
  4. Click the Run icon.
    After the node is run, you can view the operational log and result in the lower part of the tab.

Commit the workflow

  1. After you run and debug the ODPS SQL node insert_data, return to the configuration tab of the workflow.
  2. Click the Submit icon.
  3. In the Commit dialog box, select the nodes to be committed, enter your comments in the Change description field, and then select Ignore I/O Inconsistency Alerts.
  4. Click Commit.

What to do next

Now you have learned how to create and commit a workflow. You can proceed with the next tutorial. In the next tutorial, you will learn how to create a sync node to export data to different types of data stores. For more information, see Create a sync node.