You can use a workflow to organize nodes based on business types. This way, you can develop code by the business type. This topic describes how to create, design, commit, and view a workflow and how to modify or delete multiple nodes in a workflow at a time.
A workspace supports various types of compute engines and can contain multiple workflows. A workflow is a collection of multiple types of objects. The object types include Data Integration, Data Analytics, table, resource, function, and algorithm.
Each type of object corresponds to an independent folder. You can create subfolders in the folder. To facilitate the management of objects, we recommend that you create no more than four levels of subfolders. If you create more than four levels of subfolders, your workflow becomes excessively complex. In this case, we recommend that you split your workflow into two or more workflows and add the workflows to the same solution to improve work efficiency.
Design the organization structure
- If no compute engines are available in your workspace or no folders of a compute engine type appear in the directory tree, check whether the service corresponding to a compute engine type is activated and your workspace is associated with a compute engine instance of this type on the Workspace Management page. The folder of a compute engine type appears in the directory tree of a workflow only after a compute engine instance of this type is associated with the current workspace. For more information about how to associate a workspace with compute engine instances, see Configure a workspace.
- If you cannot use specific features or cannot find an entry point used to create an object, your account may not have development permissions or the purchased DataWorks edition may be not the required edition. You can check whether your account is an Alibaba Cloud account with development permissions or your RAM user is assigned the developer or workspace administrator role on the User Management page and check the DataWorks edition.
- If you create more than four levels of subfolders, your workflow becomes excessively complex. In this case, we recommend that you split your workflow into two or more workflows and add the workflows to the same solution to improve work efficiency.
Create a workflow
In DataStudio, data development is implemented by using the components such as nodes in workflows. Before you create a node, create a workflow.
- Go to the DataStudio page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
- Move the pointer over the icon and click Workflow.
- In the Create Workflow dialog box, set the Workflow Name and Description parameters. Notice The workflow name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
- Click Create.
Design a workflow
- We recommend that you create no more than 100 nodes in a workflow.
- In the DAG, you can draw a line between two nodes to configure dependencies between the two nodes. You can also open the Properties panel on the configuration tab of a node and configure node dependencies in the panel. For more information, see Logic of same-cycle scheduling dependencies.
- If you create a node in the directory tree of a workflow, the node dependencies can be configured based on the lineage in the code. For more information, see Logic of same-cycle scheduling dependencies.
Design the business logic
DataWorks encapsulates the capabilities of different compute engines in different types of nodes. You can use nodes of different compute engine types to develop data without the need to run complex commands on compute engines. You can also use the general nodes of DataWorks to design complex logic.
- You can configure data integration nodes including batch sync nodes and real-time sync nodes to synchronize data between databases.
- You can configure data analytics nodes for data cleansing, and add required resources and create required functions in a visualized mode.
- For more information about the supported types of nodes that encapsulate the capabilities of different compute engines and the supported features for development in DataWorks, see Overview.
- For more information about how to configure scheduling dependencies, see Configure basic properties.
Commit a workflow
In a workspace in standard mode, the DataStudio page only allows you to develop and test nodes in the development environment. To commit the code to the production environment, you can commit multiple nodes in the workflow at a time and deploy them on the Deploy page.
- After you design a workflow, click the icon in the toolbar.
- In the Commit dialog box, select the nodes that you want to commit and enter your comments in the
Change description field. Then, determine whether to select Ignore I/O Inconsistency Alerts based on your business requirements. If you do not select Ignore I/O Inconsistency Alerts, an error message is displayed after the system detects that the input and output
that you set do not match with those identified in code lineage analysis. For more
information, see When I commit a node, the system reports an error that the input and output of the node are not consistent with the data lineage in the code developed for the node. What do I do?.
- Click Commit. Note If you have modified the code or properties of a node and committed the node on its configuration tab, you cannot select the node in the Commit dialog box. If you have modified the code or properties of a node but have not committed the node on its configuration tab, you can select the node in the Commit dialog box.
View all workflows
Manage workflows by using the solution feature.
- A solution can contain multiple workflows.
- A workflow can be added to multiple solutions.
- Workspace members can collaboratively develop and manage all solutions in a workspace.
- Add a workflow to a solution
- To add multiple workflows to a solution at a time, you can right-click a solution, select Edit, and then modify the Workflows parameter in the Change Solution dialog box.
Modify or delete multiple nodes at the same time
- In the Scheduled Workflow pane, click the icon in the upper part to go to the Node tab of the Batch Operation-Data Development tab.
- Modify or delete nodes.
- Configure filter conditions such as the node name, node ID, node type, and workflow to find the nodes that you want to modify or delete.
- Select some or all nodes.
- Modify or delete the nodes.
- To modify the selected nodes, click Modify responsible person or Modify scheduling Resource Group. You can modify only the owners and resource groups for scheduling of multiple nodes
at a time.
If you set the Mandatory modification parameter to Yes in the dialog box that appears, you can modify all the selected nodes. If you set this parameter to No, you can modify only the nodes that are locked by yourself.
- To delete the selected nodes, choose
If you set the Force delete parameter to Yes in the Delete node dialog box, you can delete all the selected nodes. If you set this parameter to No, you can delete only the nodes that are locked by yourself.
in the lower part of the Node tab.
- To modify the selected nodes, click Modify responsible person or Modify scheduling Resource Group. You can modify only the owners and resource groups for scheduling of multiple nodes at a time.
Export a common workflow for replication
You can use the node group feature to quickly group all nodes in a workflow as a node group and then reference the node group in a new workflow. For more information, see Node group.
Export multiple workflows from a DataWorks workspace at a time and import them to other workspaces of DataWorks or to other open source engines
If you want to export multiple workflows in a workspace from DataWorks at a time and import them to other workspaces of DataWorks or to other open source engines, you can use the Migration Assistant service of DataWorks. For more information, see Overview.