This topic describes how to create a workflow, create nodes in the workflow, and configure the dependencies among the nodes. After the configuration is completed, you can use the Data Analytics feature to further compute and analyze data in the workspace.

Prerequisites

The bank_data table for storing business data and the result_table table for storing data analytics results are created in the workspace. Data is imported to the bank_data table. For more information, see Create tables and import data.

Background information

The Data Analytics feature of DataWorks allows you to drag and drop nodes in a workflow and configure dependencies among the nodes. You can process data and configure dependencies in the data based on the workflow.

Currently, you can create various types of nodes. For example, you can create ODPS SQL nodes, ODPS Script nodes, ODPS Spark nodes, PyODPS nodes, ODPS MR nodes, Shell nodes, and zero load nodes.

Create a workflow

  1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.
  2. Move the pointer over the Create icon icon and click Workflow.
  3. In the Create Workflow dialog box that appears, set Workflow Name and Description.
  4. Click Create.

Create nodes and configure dependencies among the nodes

This section describes how to create a zero load node named start and an ODPS SQL node named insert_data in the workflow, and configure the insert_data node to depend on the start node.
Notice
  • A zero load node is a control node used to maintain and control its descendant nodes in a workflow. A zero load node does not generate any data.
  • If other nodes depend on a zero load node and the zero load node is set to Failed by an administration expert manually, the pending descendant nodes cannot be triggered. During the O&M process, an administration expert can disable a zero load node to prevent errors of ancestor nodes from being further expanded.
  • Typically, the root node of the workspace is used as the ancestor node of a zero load node in a workflow. The root node of a workspace is named in the Workspace name_root format.

We recommend that you create a zero load node as the root node of a workflow to control the entire workflow.

  1. Double-click the name of the workflow to go to the dashboard of the workflow. Move the pointer over Zero-Load Node and drag it to the development panel on the right.
  2. In the Create Node dialog box that appears, set Node Name to start and click Commit.
  3. Repeat steps 1 and 2 to create an ODPS SQL node and name it insert_data.
  4. Draw a line to connect the nodes and set the start node as the ancestor node of the insert_data node.

Configure the ancestor node of the zero load node

In a workflow, a zero load node is often used to control the entire workflow and serves as the ancestor node of all nodes in the workflow. Generally, the zero load node in a workflow depends on the root node of the workspace.

  1. Double-click the name of the zero load node. On the page that appears, click the Properties tab in the right-side navigation pane.
  2. In the Dependencies section, click Use Root Node and set the root node of the workspace as the ancestor node of the zero load node.
  3. Click Save icon in the upper-left corner to save the configuration.

Edit code in the ODPS SQL node

This section provides a sample SQL statement used to query and save the number of singles with different education levels who loan to buy houses in ODPS SQL node insert_data. The query result can be analyzed by and presented in descendant nodes of insert_data.

Enter the following SQL statement in the insert_data node. For more information about the syntax, see SQL summary.
INSERT OVERWRITE TABLE result_table  --Insert data to the result_table table.
SELECT education
    , COUNT(marital) AS num
FROM bank_data
WHERE housing = 'yes'
    AND marital = 'single'
GROUP BY education

Run and debug the ODPS SQL node

  1. After entering the SQL statement in the insert_data node, click Save icon.
  2. Click Run icon to view the operational logs and result.

Commit the workflow

  1. After running and debugging ODPS SQL node insert_data, go back to the workflow editing page and click Commit icon.
  2. In the Commit dialog box that appears, select the nodes to be committed, set Description, and then select Ignore I/O Inconsistency Alerts.
  3. Click Commit.

What to do next

Now you have learned how to create and commit a workflow. You can proceed with the next tutorial. In the next tutorial, you will learn how to create a batch synchronization node to export data to different types of data stores. For more information, see Create a batch synchronization node.