This topic describes how to create a workflow, create nodes in the workflow, and configure
node dependencies. After you create a workflow, you can use the DataStudio service
to compute and analyze data in the workspace.
Prerequisites
The
bank_data table for storing business data and the
result_table table for storing results are created in a workspace. Data is imported to the bank_data
table. For more information, see
Create tables and import data.
Background information
The DataStudio service in DataWorks allows you to configure node dependencies by dragging
lines between nodes in a workflow. You can process data and configure node dependencies
based on the workflow. You can create multiple workflows in a workspace. For more
information, see Manage workflows.
Create a workflow
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region in which the workspace that you want
to manage resides. Find the workspace and click Data Development in the Actions column.
- On the DataStudio page, move the pointer over the
icon and select Create Workflow.
- In the Create Workflow dialog box, specify Workflow Name and Description.
Notice The workflow name must be 1 to 128 characters in length and can contain letters, digits,
underscores (_), and periods (.).
- Click Create.
Create nodes and configure node dependencies
In the workflow, create a zero load node named
start and an ODPS SQL node named
insert_data, and configure the
insert_data node to depend on the
start node.
Notice
- A zero load node is a control node that is used to maintain and control its descendant
nodes in a workflow. A zero load node does not generate data.
- If other nodes depend on a zero load node and the zero load node is set to Failed
by O&M personnel, the pending descendant nodes cannot run. During the O&M process,
a zero load node can be disabled to prevent incorrect data of ancestor nodes from
being obtained by their descendant nodes.
- In most cases, the root node of the workspace is used as the ancestor node of a zero
load node in a workflow. The root node of a workspace is named in the
Workspace name_root
format.
- DataWorks automatically creates an output name for a node. The name is in the Workspace name.Node name format. If a workspace contains two nodes with the same name, rename one of the two
nodes.
When you design a workflow, we recommend that you create a zero load node as the root
node of the workflow to control the entire workflow. To design a workflow, perform
the following steps:
- In the left side of the Scheduled Workflow page, double-click the name of the workflow
that you created below Business Flow. On the configuration tab that appears, choose
General>Zero-Load Node.
You can also drag Zero-Load Node to the canvas on the right side to go to the Create Node dialog box.
- In the Create Node dialog box, set the Node Name parameter to start and click Commit.
Notice The node name must be a maximum of 128 characters in length and can contain letters,
digits, underscores (_), and periods (.).
- Use the same method to create an ODPS SQL node named insert_data.
- Drag a line from the start node to the insert_data node to configure the start node as the ancestor node of the insert_data node.
Configure the ancestor node of the zero load node
In a workflow, a zero load node is used to control the entire workflow and serves
as the ancestor node of all nodes in the workflow.
In most cases, a zero load node depends on the root node of the workspace.
- Double-click the name of the zero load node to go to the node configuration tab.
- Click Properties in the right-side navigation pane.
- In the Dependencies section, click Add Root Node to configure the root node of the workspace as the ancestor node of the zero load
node.
- Save and commit the node.
Notice You must specify Rerun and Parent Nodes on the Properties tab before you commit the zero load node.
- Click the
icon in the top toolbar to save the node.
- Click the
icon in the top toolbar.
- In the Commit Node dialog box, enter your comments in the Change description field.
- Click OK.
Edit and run the ODPS SQL node
This section describes how to use SQL code to query the number of singles with different
education levels who have mortgage loans in the ODPS SQL node insert_data and save the query result. The query result can be used for descendant nodes to continue
to analyze or present data.
- Go to the configuration tab of the ODPS SQL node and enter the following code.
For more information about the syntax, see
Overview of MaxCompute SQL.
INSERT OVERWRITE TABLE result_table -- Insert data into the result_table table.
SELECT education
, COUNT(marital) AS num
FROM bank_data
WHERE housing = 'yes'
AND marital = 'single'
GROUP BY education;
- Right-click bank_data in the code and select Delete Input.
The
bank_data table is not generated by an auto-triggered node. For more information about how
to create a table and import data into the table, see
Create tables and import data. If the SELECT statement in the code of a node specifies a table that is not generated
by an auto-triggered node, you can right-click the name of the table that you want
to manage and click Delete input. You can also add a comment for a rule at the top
of the code. This way, the system does not automatically parse the dependency based
on the rule.

Note Scheduling dependencies ensure that a node can obtain the table data generated by
its ancestor node that is scheduled to run. However, if the ancestor node of a node
is not scheduled to run, the system cannot monitor the generation of the latest table
data by the ancestor node. If a node uses a SELECT statement to query data of a table
that is not generated by an auto-triggered node, you must manually delete the dependency
of the node that is automatically generated by the SELECT statement.
- Click the
icon in the top toolbar. This prevents code loss.
- Click the
icon. After the node is run, you can view the operational log and result in the lower part
of the tab.
Commit a workflow
- After you run and debug the ODPS SQL node named insert_data, return to the configuration tab of the workflow.
- Click the
icon.
- In the Commit dialog box, select the node that you want to commit, enter your comments in the Change description field, and then select Ignore I/O Inconsistency Alerts.
- Click Submit.
After the workflow is committed, you can view the node status from the node list in
the
workflow. If the

icon is displayed on the left of the node name, the node is committed. If the

icon is not displayed, the node is not committed.
What to do next
You have learned how to create and commit a workflow. You can proceed with the next
tutorial. You can create a synchronization node to export data to different types
of data sources. For more information, see Create a sync node.