You can use a workflow to organize nodes based on business types. This way, you can develop code by the business type. This topic describes how to create, design, commit, and view a workflow and how to modify or delete multiple nodes in a workflow at a time.
A workspace supports various types of compute engines and can contain multiple workflows. A workflow is a collection of multiple types of objects. The object types include Data Integration, Data Analytics, table, resource, function, and algorithm.
Each type of object corresponds to an independent folder. You can create subfolders in the folder. To facilitate the management of objects, we recommend that you create no more than four levels of subfolders. If you create more than four levels of subfolders, your workflow becomes excessively complex. In this case, we recommend that you split your workflow into two or more workflows and add the workflows to the same solution to improve work efficiency.
Design the organizational structure
|Workspace||You can specify administrators and members for each workspace based on your business requirements. The role settings of members and parameters for a compute engine instance are different among workspaces. For more information about workspace planning, see Plan workspaces.||Workspaces are basic units for managing permissions in DataWorks. You can create workspaces based on the organizational structure of your company. You can use a workspace to manage development permissions and O&M permissions. Workspace members can collaborate to develop and manage the code for all nodes in a workspace.|
|Solution||A solution is a group of workflows that are dedicated to a specific business goal. A workflow can be added to multiple solutions. After you develop a solution and add a workflow to the solution, other users can reference and modify the workflow in their solutions or workflows for collaborative development.||You can use a solution for business integration.|
|Workflow||A workflow is an abstract business entity that allows you to develop code based on
your business requirements. Workflows and nodes in different workspaces are separately
Workflows can be displayed in a directory tree or in a panel. The display modes enable you to organize code from the business perspective and show the resource classification and business logic in a more efficient manner.
|A workflow is a basic unit for code development and resource management.|
- To use DataStudio, you must create a workflow.
- If you change the code for a node in the production environment, you must modify node parameters on the DataStudio page. Then, commit and deploy the node.
- If no compute engine is available in your workspace or the compute engine that you want to use is not displayed in the directory tree, check whether the service corresponding to the compute engine type is activated and whether the compute engine is associated with your workspace on the Workspace Management page. Only the compute engines that are associated with the workspace are displayed in the directory tree. For more information about how to associate a compute engine with a workspace, see Configure a workspace.
- If you cannot use specific features or cannot find an entry used to create an object, go to the User Management page to check whether you have developer permissions. You have developer permissions if you use an Alibaba Cloud account or you log on to the DataWorks console as a RAM user that is assigned the developer role or workspace administrator role. You can also check whether the DataWorks edition that you adopted meets the requirements.
- If you create more than four levels of subfolders, your workflow becomes excessively complex. In this case, we recommend that you split your workflow into two or more workflows and add the workflows to the same solution to improve work efficiency.
Create a workflow
In DataStudio, data development is implemented by using the components such as nodes in workflows. Before you create a node, create a workflow.
- Go to the DataStudio page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region in which the workspace that you want to manage resides. Find the workspace and click DataStudio in the Actions column.
- Move the pointer over the icon and click Workflow.
- In the Create Workflow dialog box, set the Workflow Name and Description parameters. Notice The workflow name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
- Click Create.
Design a workflow
- We recommend that you create no more than 100 nodes in a workflow.
Note If the total number of nodes in a workflow exceeds 1,000, the DAG of the workflow cannot be viewed.
- In the DAG, you can draw a line between two nodes to configure dependencies between the two nodes. You can also open the Properties panel on the configuration tab of a node and configure node dependencies in the panel. For more information, see Logic of same-cycle scheduling dependencies.
- If you create a node in the directory tree of a workflow, the node dependencies can be configured based on the lineage in the code. For more information, see Logic of same-cycle scheduling dependencies.
Design the business logic
DataWorks encapsulates the capabilities of different compute engines in different types of nodes. You can use nodes of different compute engine types to develop data without the need to run complex commands on compute engines. You can also use the general nodes of DataWorks to design complex logic.
- You can configure data integration nodes including batch synchronization nodes and real-time synchronization nodes to synchronize data between databases.
- You can configure data analytics nodes for data cleansing. You can also add required resources and create required functions in a visualized mode.
- For more information about the supported types of nodes that encapsulate the capabilities of different compute engines and the supported features for development in DataWorks, see Select a data development node.
- For more information about how to configure scheduling dependencies, see Configure basic properties.
Commit a workflow
In a workspace in standard mode, the DataStudio page only allows you to develop and test nodes in the development environment. To commit the code to the production environment, you can commit multiple nodes in the workflow at a time and deploy them on the Deploy page.
- After you design a workflow, click the icon in the toolbar.
- In the Commit dialog box, select the nodes that you want to commit and enter your comments in the
Change description field. Then, determine whether to select Ignore I/O Inconsistency Alerts based on your business requirements. If you do not select Ignore I/O Inconsistency Alerts, an error message is displayed if the system determines that the input and output
that you set do not match with those identified in code lineage analysis. For more
information, see When I commit a node, the system reports an error that the input and output of the
node are not consistent with the data lineage in the code developed for the node.
What do I do?.
- Click Commit. Note If you have modified the code or properties of a node and committed the node on its configuration tab, you cannot select the node in the Commit dialog box. If you have modified the code or properties of a node but have not committed the node on its configuration tab, you can select the node in the Commit dialog box.
View all workflows
Manage workflows by using the solution feature
- A solution can contain multiple workflows.
- A workflow can be added to multiple solutions.
- Workspace members can collaboratively develop and manage all solutions in a workspace.
- Add a workflow to a solution.
- Add multiple workflows to a solution at a time. To do so, right-click a solution, select Edit, and then modify the Workflows parameter in the Change Solution dialog box.
Modify or delete multiple nodes at a time
- On the DataStudio pane, click the icon in the upper-right corner of the Scheduled Workflow pane to go to the Node tab.
- Modify or delete nodes.
- Configure filter conditions such as the node name, node ID, node type, and workflow to find the nodes that you want to modify or delete.
- Select partial or all nodes.
- Modify or delete the nodes.
- To modify the selected nodes, click Modify responsible person or Modify scheduling Resource Group. You can modify only the owners and resource groups for scheduling of multiple nodes
at a time.
If you set the Mandatory modification parameter to Yes in the dialog box that appears, you can modify all the selected nodes. If you set this parameter to No, you can modify only the nodes that are locked by yourself.
- To delete the selected nodes, choose
If you set the Force delete parameter to Yes in the Delete node dialog box, you can delete all the selected nodes. If you set this parameter to No, you can delete only the nodes that are locked by yourself.
in the lower part of the Node tab.
- To modify the selected nodes, click Modify responsible person or Modify scheduling Resource Group. You can modify only the owners and resource groups for scheduling of multiple nodes at a time.
Export a common workflow for replication
You can use the node group feature to quickly group all nodes in a workflow as a node group and then reference the node group in a new workflow. For more information, see Create and reference a node group.
Export multiple workflows from a DataWorks workspace at a time and import them to other DataWorks workspaces or open source engines
If you want to export multiple workflows in a workspace from DataWorks at a time and import them to other DataWorks workspaces or open source engines, you can use the Migration Assistant service of DataWorks. For more information, see Overview.