You can use a workflow to organize nodes based on business types. This way, you can develop code by the business type. This topic describes how to create, design, commit, and view a workflow and how to modify or delete multiple nodes in a workflow at a time.

Background information

A workspace supports various types of compute engines and can contain multiple workflows. A workflow is a collection of multiple types of objects. The object types include Data Integration, Data Analytics, table, resource, function, and algorithm.

Each type of object corresponds to an independent folder. You can create subfolders in the folder. To facilitate the management of objects, we recommend that you create no more than four levels of subfolders. If you create more than four levels of subfolders, your workflow becomes excessively complex. In this case, we recommend that you split your workflow into two or more workflows and add the workflows to the same solution to improve work efficiency.

Design the organization structure

Note
  • If no compute engines are available in your workspace or no folders of a compute engine type appear in the directory tree, check whether the service corresponding to a compute engine type is activated and your workspace is associated with a compute engine instance of this type on the Workspace Management page. The folder of a compute engine type appears in the directory tree of a workflow only after a compute engine instance of this type is associated with the current workspace. For more information about how to associate a workspace with compute engine instances, see Configure a workspace.
  • If you cannot use specific features or cannot find an entry point used to create an object, your account may not have development permissions or the purchased DataWorks edition may be not the required edition. You can check whether your account is an Alibaba Cloud account with development permissions or your RAM user is assigned the developer or workspace administrator role on the User Management page and check the DataWorks edition.
  • If you create more than four levels of subfolders, your workflow becomes excessively complex. In this case, we recommend that you split your workflow into two or more workflows and add the workflows to the same solution to improve work efficiency.

Create a workflow

In DataStudio, data development is implemented by using the components such as nodes in workflows. Before you create a node, create a workflow.

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Move the pointer over the Create icon and click Workflow.
    Workflow
  3. In the Create Workflow dialog box, set the Workflow Name and Description parameters.
    Notice The workflow name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Create.

Design a workflow

Code development is implemented in workflows. To develop code in a workflow, you can create a node under a folder of a compute engine type in the directory tree. You can also double-click a workflow. On the workflow configuration tab, drag the components including nodes of different compute engine types to the canvas and connect the components to form a directed acyclic graph (DAG). Design the organization structureWhen you design a workflow, take note of the following items:
  • We recommend that you create no more than 100 nodes in a workflow.
  • In the DAG, you can draw a line between two nodes to configure dependencies between the two nodes. You can also open the Properties panel on the configuration tab of a node and configure node dependencies in the panel. For more information, see Logic of same-cycle scheduling dependencies.
  • If you create a node in the directory tree of a workflow, the node dependencies can be configured based on the lineage in the code. For more information, see Logic of same-cycle scheduling dependencies.

Design the business logic

DataWorks encapsulates the capabilities of different compute engines in different types of nodes. You can use nodes of different compute engine types to develop data without the need to run complex commands on compute engines. You can also use the general nodes of DataWorks to design complex logic.

In a workflow, you can configure components such as data integration nodes and data analytics nodes.
  • You can configure data integration nodes including batch sync nodes and real-time sync nodes to synchronize data between databases.
  • You can configure data analytics nodes for data cleansing, and add required resources and create required functions in a visualized mode.
Note
  • For more information about the supported types of nodes that encapsulate the capabilities of different compute engines and the supported features for development in DataWorks, see Overview.
  • For more information about how to configure scheduling dependencies, see Configure basic properties.

Commit a workflow

In a workspace in standard mode, the DataStudio page only allows you to develop and test nodes in the development environment. To commit the code to the production environment, you can commit multiple nodes in the workflow at a time and deploy them on the Deploy page.

  1. After you design a workflow, click the Submit icon icon in the toolbar.
  2. In the Commit dialog box, select the nodes that you want to commit and enter your comments in the Change description field. Then, determine whether to select Ignore I/O Inconsistency Alerts based on your business requirements. If you do not select Ignore I/O Inconsistency Alerts, an error message is displayed after the system detects that the input and output that you set do not match with those identified in code lineage analysis. For more information, see When I commit a node, the system reports an error that the input and output of the node are not consistent with the data lineage in the code developed for the node. What do I do?.
    Commit
  3. Click Commit.
    Note If you have modified the code or properties of a node and committed the node on its configuration tab, you cannot select the node in the Commit dialog box. If you have modified the code or properties of a node but have not committed the node on its configuration tab, you can select the node in the Commit dialog box.

View all workflows

In the Scheduled Workflow pane, right-click Business Flow and select All Workflows to view all the workflows in the current workspace. All Workflows
Click the card of a workflow. The configuration tab of the workflow appears. View workflows

Manage workflows by using the solution feature.

You can include one or more workflows in a solution. Solutions have the following benefits:
  • A solution can contain multiple workflows.
  • A workflow can be added to multiple solutions.
  • Workspace members can collaboratively develop and manage all solutions in a workspace.
If you manage workflows by using solutions, you can perform the following operations:
  • Add a workflow to a solution Add to Solution
  • To add multiple workflows to a solution at a time, you can right-click a solution, select Edit, and then modify the Workflows parameter in the Change Solution dialog box. Edit

Modify or delete multiple nodes at the same time

If you want to modify or delete multiple nodes of the same type, such as all batch sync nodes, in the current workspace at a time, you can use the parameters on the Node tab to find the nodes and modify or delete the nodes. The parameters include Node type, Business processes, and Scheduling Resource Group.
Note You can modify only the owners and resource groups for scheduling of multiple nodes at a time.
  1. In the Scheduled Workflow pane, click the Node tab icon in the upper part to go to the Node tab of the Batch Operation-Data Development tab. Node tab
  2. Modify or delete nodes. Modify or delete nodes
    1. Configure filter conditions such as the node name, node ID, node type, and workflow to find the nodes that you want to modify or delete.
    2. Select some or all nodes.
    3. Modify or delete the nodes.
      • To modify the selected nodes, click Modify responsible person or Modify scheduling Resource Group. You can modify only the owners and resource groups for scheduling of multiple nodes at a time.

        If you set the Mandatory modification parameter to Yes in the dialog box that appears, you can modify all the selected nodes. If you set this parameter to No, you can modify only the nodes that are locked by yourself.

      • To delete the selected nodes, choose More > Delete in the lower part of the Node tab.

        If you set the Force delete parameter to Yes in the Delete node dialog box, you can delete all the selected nodes. If you set this parameter to No, you can delete only the nodes that are locked by yourself.

Export a common workflow for replication

You can use the node group feature to quickly group all nodes in a workflow as a node group and then reference the node group in a new workflow. For more information, see Node group.

Export multiple workflows from a DataWorks workspace at a time and import them to other workspaces of DataWorks or to other open source engines

If you want to export multiple workflows in a workspace from DataWorks at a time and import them to other workspaces of DataWorks or to other open source engines, you can use the Migration Assistant service of DataWorks. For more information, see Overview.