DataWorks organizes different types of nodes in a workflow by business category to develop code of business.

DataWorks provides you with a dashboard for different types of nodes in each workflow. DataWorks also provides tools for you to optimize and manage nodes in each workflow. This promotes easy and intelligent development and management.

Workflow structure

A workspace supports multiple types of computing engines and multiple workflows. A workflow is a collection of various types of nodes that are closely associated with each other. DataWorks automatically generates a directed acyclic graph (DAG) so that you can view the workflow. Supported node types include data integration node, data analytics node, table, resource, function, and algorithm.

Each type of node has an independent folder. You can also create subfolders in each folder. To facilitate management, we recommend that you create a maximum of four levels of subfolders. If more than four subfolder levels are required, your workflow is too complex. We recommend that you split the workflow into two or more workflows and add the split workflows to one solution.

Create a workflow

  1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.
  2. Move the pointer over the Create icon and click Workflow.
  3. In the Create Workflow dialog box that appears, set the Workflow Name and Description parameters.
  4. Click Create.

Workflow nodes

Note We recommend that you create no more than 100 nodes in a workflow.
A workflow consists of nodes of the following types:
  • Data Integration

    Double-click Data Integration under a workflow to view all the data integration nodes created in the workflow. For more information, see Batch Sync node and Real-Time Sync node.

  • MaxCompute

    The MaxCompute computing engine allows you to create and view data analytics nodes such as ODPS SQL, SQL script template, ODPS Spark, PyODPS, ODPS Script, and ODPS MR nodes. You can also create and view MaxCompute tables, resources, and functions.

    • Data Analytics

      Right-click MaxCompute under a workflow and select Create to create a data analytics node of a specific type. For more information, see ODPS SQL node, SQL script template, ODPS Spark node, PyODPS node, ODPS Script node, and ODPS MR node.

    • Table

      Right-click MaxCompute under a workflow and choose Create > Table to create a table. You can also view all the tables created in the current MaxCompute computing engine. For more information, see Table.

    • Resource

      Right-click MaxCompute under a workflow, choose Create > Resource, and click a specific resource type to create a resource. You can also view all the resources created in the current MaxCompute computing engine. For more information, see Resource.

    • Function

      Right-click MaxCompute under a workflow and choose Create > Function to create a function. You can also view all the functions created in the current MaxCompute computing engine. For more information, see Function.

  • E-MapReduce

    The E-MapReduce computing engine allows you to create data analytics nodes such as EMR Hive, EMR MR, EMR Spark, and EMR Spark nodes. You can also view and create E-MapReduce resources.

    Note The EMR module is available on the DataStudio page only after you bind an E-MapReduce computing engine to the current workspace. For more information, see Configure a workspace.
    • Data Analytics

      Click EMR under a workflow. Right-click Data Analytics and select Create to create a data analytics node of a specific type. For more information, see EMR HIVE node, EMR MR node, EMR SPARK SQL node, and EMR SPARK node.

    • Resource

      Click EMR under a workflow. Right-click Resource and select Create to create a resource of a specific type. You can also view all the resources created in the current E-MapReduce computing engine. For more information, see Resource.

  • Algorithm

    Right-click Algorithm under a workflow and choose Create > PAI Experiment to create a Machine Learning experiment node. You can also view all the Machine Learning experiment nodes created in the current workflow. For more information, see Machine Learning experiment node.

  • General

    Right-click General under a workflow and Select Create to create a node of a specific type. For more information, see OSS Object Inspection node, for-each node, do-while node, MERGE node, Branch node, Assignment node, Shell node, Zero-load node, and Cross-tenant collaboration node.

    Note If you are using DataWorks Basic Edition, you can only create Cross-Tenant Collaboration nodes and OSS Object Inspection nodes. You must upgrade DataWorks to higher editions to create other types of nodes. You can click Upgrade Now to upgrade the current DataWorks edition.
  • UserDefined

    Right-click UserDefined under a workflow and select Create to create a node of a specific type. For more information, see Hologres node, Data Lake Analytics node, AnalyticDB for MySQL node, and AnalyticDB for PostgreSQL node.

View all workflows

On the Data Analytics page, right-click Business Flow and select All Workflows to view all workflows created in the current workspace.

Click a workflow. The dashboard of the workflow appears.

Dashboard for each node type

DataWorks provides a dashboard for each type of node in a workflow. On the dashboard, each node is presented by a card that offers operation and optimization suggestions, so that you can intelligently manage nodes.

For example, the card of each data analytics node provides two indicators to show whether baseline-based monitoring and event notification are enabled for the node. This allows you to understand the status of each node.

You can double-click a folder in a workflow. The dashboard of the selected node type appears.

Commit a workflow

  1. Go to the dashboard of a workflow and click the Commit icon in the toolbar.
  2. In the Commit dialog box that appears, select the nodes to be committed, set Description, and then select Ignore I/O Inconsistency Alerts.
  3. Click Commit.
Note If a node has been committed but the node code is not changed, the node cannot be selected again. In this case, you can enter the description for the node and click Commit. The changes are automatically committed.