You can use a workflow to organize nodes based on business types. This way, you can develop code by the business type. This topic describes how to create, design, commit, and view a workflow and how to modify or delete multiple nodes in a workflow at a time.

Background information

A workspace supports various types of compute engines and can contain multiple workflows. A workflow is a collection of multiple types of objects. The object types include Data Integration, Data Analytics, table, resource, function, and algorithm.

Each type of object corresponds to an independent folder. You can create subfolders in the folder. To facilitate the management of objects, we recommend that you create no more than four levels of subfolders. If you create more than four levels of subfolders, your workflow becomes excessively complex. In this case, we recommend that you split your workflow into two or more workflows and add the workflows to the same solution to improve work efficiency.

Design the organizational structure

You can organize your business based on workspaces, solutions, and workflows. You can plan and group workspaces based on enterprise departments, business projects, and data warehouse layers.
Concept Description Purpose
Workspace You can specify administrators and members for each workspace based on your business requirements. The role settings of members and parameters for a compute engine instance are different among workspaces. For more information about workspace planning, see Plan workspaces. Workspaces are basic units for managing permissions in DataWorks. You can create workspaces based on the organizational structure of your company. You can use a workspace to manage development permissions and O&M permissions. Workspace members can collaborate to develop and manage the code for all nodes in a workspace.
Solution A solution is a group of workflows that are dedicated to a specific business goal. A workflow can be added to multiple solutions. After you develop a solution and add a workflow to the solution, other users can reference and modify the workflow in their solutions or workflows for collaborative development. You can use a solution for business integration.
Workflow A workflow is an abstract business entity that allows you to develop code based on your business requirements. Workflows and nodes in different workspaces are separately developed.
Workflows can be displayed in a directory tree or in a panel. The display modes enable you to organize code from the business perspective and show the resource classification and business logic in a more efficient manner.
  • The directory tree allows you to organize your code by node type.
  • The panel shows the business logic in a workflow.
A workflow is a basic unit for code development and resource management.
Organizational structureDataStudio works based on nodes in a workflow. You can create one or more nodes in a specific workflow in the panel. In each workflow, nodes are grouped by engine type. In the section of a specific engine, nodes are classified into data synchronization nodes, tables, resources, and functions. These components can be used to meet a specific business goal. Only the components that are used in a workflow are displayed in the workflow.
  • To use DataStudio, you must create a workflow.
  • If you change the code for a node in the production environment, you must modify node parameters on the DataStudio page. Then, commit and deploy the node.
Note
  • If no compute engine is available in your workspace or the compute engine that you want to use is not displayed in the directory tree, check whether the service corresponding to the compute engine type is activated and whether the compute engine is associated with your workspace on the Workspace Management page. Only the compute engines that are associated with the workspace are displayed in the directory tree. For more information about how to associate a compute engine with a workspace, see Configure a workspace.
  • If you cannot use specific features or cannot find an entry used to create an object, go to the User Management page to check whether you have developer permissions. You have developer permissions if you use an Alibaba Cloud account or you log on to the DataWorks console as a RAM user that is assigned the developer role or workspace administrator role. You can also check whether the DataWorks edition that you adopted meets the requirements.
  • If you create more than four levels of subfolders, your workflow becomes excessively complex. In this case, we recommend that you split your workflow into two or more workflows and add the workflows to the same solution to improve work efficiency.

Create a workflow

In DataStudio, data development is implemented by using the components such as nodes in workflows. Before you create a node, create a workflow.

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region in which the workspace that you want to manage resides. Find the workspace and click DataStudio in the Actions column.
  2. Move the pointer over the Create icon icon and click Workflow.
    Workflow
  3. In the Create Workflow dialog box, set the Workflow Name and Description parameters.
    Notice The workflow name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Create.

Design a workflow

Code development is implemented in workflows. To develop code in a workflow, you can create a node under a folder of a compute engine type in the directory tree. You can also double-click a workflow. On the workflow configuration tab, drag the components including nodes of different compute engine types to the canvas and connect the components to form a directed acyclic graph (DAG). Organizational structureWhen you design a workflow, take note of the following items:
  • We recommend that you create no more than 100 nodes in a workflow.
    Note If the total number of nodes in a workflow exceeds 1,000, the DAG of the workflow cannot be viewed.
  • In the DAG, you can draw a line between two nodes to configure dependencies between the two nodes. You can also open the Properties panel on the configuration tab of a node and configure node dependencies in the panel. For more information, see Logic of same-cycle scheduling dependencies.
  • If you create a node in the directory tree of a workflow, the node dependencies can be configured based on the lineage in the code. For more information, see Logic of same-cycle scheduling dependencies.

Design the business logic

DataWorks encapsulates the capabilities of different compute engines in different types of nodes. You can use nodes of different compute engine types to develop data without the need to run complex commands on compute engines. You can also use the general nodes of DataWorks to design complex logic.

In a workflow, you can configure components such as data integration nodes and data analytics nodes.
  • You can configure data integration nodes including batch synchronization nodes and real-time synchronization nodes to synchronize data between databases.
  • You can configure data analytics nodes for data cleansing. You can also add required resources and create required functions in a visualized mode.
Note
  • For more information about the supported types of nodes that encapsulate the capabilities of different compute engines and the supported features for development in DataWorks, see Select a data development node.
  • For more information about how to configure scheduling dependencies, see Configure basic properties.

Commit a workflow

In a workspace in standard mode, the DataStudio page only allows you to develop and test nodes in the development environment. To commit the code to the production environment, you can commit multiple nodes in the workflow at a time and deploy them on the Deploy page.

  1. After you design a workflow, click the Submit icon icon in the toolbar.
  2. In the Commit dialog box, select the nodes that you want to commit and enter your comments in the Change description field. Then, determine whether to select Ignore I/O Inconsistency Alerts based on your business requirements. If you do not select Ignore I/O Inconsistency Alerts, an error message is displayed if the system determines that the input and output that you set do not match with those identified in code lineage analysis. For more information, see When I commit a node, the system reports an error that the input and output of the node are not consistent with the data lineage in the code developed for the node. What do I do?.
    Submit icon
  3. Click Commit.
    Note If you have modified the code or properties of a node and committed the node on its configuration tab, you cannot select the node in the Commit dialog box. If you have modified the code or properties of a node but have not committed the node on its configuration tab, you can select the node in the Commit dialog box.

View all workflows

In the Scheduled Workflow pane, right-click Business Flow and select All Workflows to view all the workflows in the current workspace. All Workflows
Click the card of a workflow. The configuration tab of the workflow appears. View workflows

Manage workflows by using the solution feature

You can include one or more workflows in a solution. Solutions have the following benefits:
  • A solution can contain multiple workflows.
  • A workflow can be added to multiple solutions.
  • Workspace members can collaboratively develop and manage all solutions in a workspace.
If you manage workflows by using solutions, you can perform the following operations:
  • Add a workflow to a solution. Add to Solution
  • Add multiple workflows to a solution at a time. To do so, right-click a solution, select Edit, and then modify the Workflows parameter in the Change Solution dialog box. Edit

Modify or delete multiple nodes at a time

If you want to modify or delete multiple nodes of the same type, such as all batch synchronization nodes, in the current workspace at a time, you can use the parameters on the Node tab to find the nodes and modify or delete the nodes. The parameters include Node type, Business processes, and Scheduling Resource Group.
Note You can modify only the owners and resource groups for scheduling of multiple nodes at a time.
  1. On the DataStudio pane, click the Nodes icon in the upper-right corner of the Scheduled Workflow pane to go to the Node tab. Node tab
  2. Modify or delete nodes. Modify or delete nodes
    1. Configure filter conditions such as the node name, node ID, node type, and workflow to find the nodes that you want to modify or delete.
    2. Select partial or all nodes.
    3. Modify or delete the nodes.
      • To modify the selected nodes, click Modify responsible person or Modify scheduling Resource Group. You can modify only the owners and resource groups for scheduling of multiple nodes at a time.

        If you set the Mandatory modification parameter to Yes in the dialog box that appears, you can modify all the selected nodes. If you set this parameter to No, you can modify only the nodes that are locked by yourself.

      • To delete the selected nodes, choose More > Delete in the lower part of the Node tab.

        If you set the Force delete parameter to Yes in the Delete node dialog box, you can delete all the selected nodes. If you set this parameter to No, you can delete only the nodes that are locked by yourself.

Export a common workflow for replication

You can use the node group feature to quickly group all nodes in a workflow as a node group and then reference the node group in a new workflow. For more information, see Create and reference a node group.

Export multiple workflows from a DataWorks workspace at a time and import them to other DataWorks workspaces or open source engines

If you want to export multiple workflows in a workspace from DataWorks at a time and import them to other DataWorks workspaces or open source engines, you can use the Migration Assistant service of DataWorks. For more information, see Overview.