You can use a workflow to organize nodes based on business types. This way, you can develop code by the business type. This topic describes how to create, design, commit, and view a workflow and how to modify or delete multiple nodes in a workflow at a time.
Background information
A workspace supports various types of compute engines and can contain multiple workflows. A workflow is a collection of multiple types of objects. The object types include Data Integration, Data Analytics, table, resource, function, and algorithm.
Each type of object corresponds to an independent folder. You can create subfolders in the folder. To facilitate the management of objects, we recommend that you create no more than four levels of subfolders. If you create more than four levels of subfolders, your workflow becomes excessively complex. In this case, we recommend that you split your workflow into two or more workflows and add the workflows to the same solution to improve work efficiency.
Design the organizational structure
Concept | Description | Purpose |
---|---|---|
Workspace | You can specify administrators and members for each workspace based on your business requirements. The role settings of members and parameters for a compute engine instance are different among workspaces. For more information about workspace planning, see Create workspaces. | Workspaces are basic units for managing permissions in DataWorks. You can create workspaces based on the organizational structure of your company. You can use a workspace to manage development permissions and O&M permissions. Workspace members can collaborate to develop and manage the code for all nodes in a workspace. |
Solution | A solution is a group of workflows that are dedicated to a specific business goal. A workflow can be added to multiple solutions. After you develop a solution and add a workflow to the solution, other users can reference and modify the workflow in their solutions or workflows for collaborative development. | You can use a solution for business integration. |
Workflow | A workflow is an abstract business entity that allows you to develop code based on
your business requirements. Workflows and nodes in different workspaces are separately
developed.
Workflows can be displayed in a directory tree or in a panel. The display modes enable
you to organize code from the business perspective and show the resource classification
and business logic in a more efficient manner.
|
A workflow is a basic unit for code development and resource management. |

- To use DataStudio, you must create a workflow.
- If you change the code for a node in the production environment, you must modify node parameters on the DataStudio page. Then, commit and deploy the node.
- If no compute engine is available in your workspace or the compute engine that you want to use is not displayed in the directory tree, check whether the service corresponding to the compute engine type is activated and whether the compute engine is associated with your workspace on the Workspace Management page. Only the compute engines that are associated with the workspace are displayed in the directory tree. For more information about how to associate a compute engine with a workspace, see Configure a workspace.
- If you cannot use specific features or cannot find an entry used to create an object, go to the User Management page to check whether you have development permissions. You have development permissions if you use an Alibaba Cloud account or you log on to the DataWorks console as a RAM user that is assigned the developer role or workspace administrator role. You can also check whether the DataWorks edition that you adopted meets the requirements.
- If you create more than four levels of subfolders, your workflow becomes excessively complex. In this case, we recommend that you split your workflow into two or more workflows and add the workflows to the same solution to improve work efficiency.
Create a workflow
In DataStudio, data development is implemented by using the components such as nodes in workflows. Before you create a node, create a workflow.
Design a workflow

- We recommend that you create no more than 100 nodes in a workflow.
Note If the total number of nodes in a workflow exceeds 1,000, the DAG of the workflow cannot be viewed.
- In the DAG, you can draw a line between two nodes to configure dependencies between the two nodes. You can also open the Properties panel on the configuration tab of a node and configure node dependencies in the panel. For more information, see Logic of same-cycle scheduling dependencies.
- If you create a node in the directory tree of a workflow, the node dependencies can be configured based on the lineage in the code. For more information, see Logic of same-cycle scheduling dependencies.
Design the business logic
DataWorks encapsulates the capabilities of different compute engines in different types of nodes. You can use nodes of different compute engine types to develop data without the need to run complex commands on compute engines. You can also use the general nodes of DataWorks to design complex logic.
- You can configure data integration nodes including batch synchronization nodes and real-time synchronization nodes to synchronize data between databases.
- You can configure data analytics nodes for data cleansing. You can also add required resources and create required functions in a visualized mode.
- For more information about the supported types of nodes that encapsulate the capabilities of different compute engines and the supported features for development in DataWorks, see Select a data development node.
- For more information about how to configure scheduling dependencies, see Configure basic properties.
Commit a workflow
In a workspace in standard mode, the DataStudio page only allows you to develop and test nodes in the development environment. To commit the code to the production environment, you can commit multiple nodes in the workflow at a time and deploy them on the Deploy page.
View all workflows


Manage workflows by using the solution feature
- A solution can contain multiple workflows.
- A workflow can be added to multiple solutions.
- Workspace members can collaboratively develop and manage all solutions in a workspace.
- Add a workflow to a solution.
- Add multiple workflows to a solution at a time. To do so, right-click a solution,
select Edit, and then modify the Workflows parameter in the Change Solution dialog
box.
Modify or delete multiple nodes at a time
- On the DataStudio pane, click the
icon in the upper-right corner of the Scheduled Workflow pane to go to the Node tab.
- Modify or delete nodes.
Export a common workflow for replication
You can use the node group feature to quickly group all nodes in a workflow as a node group and then reference the node group in a new workflow. For more information, see Create and reference a node group.
Export multiple workflows from a DataWorks workspace at a time and import them to other DataWorks workspaces or open source engines
If you want to export multiple workflows in a workspace from DataWorks at a time and import them to other DataWorks workspaces or open source engines, you can use the Migration Assistant service of DataWorks. For more information, see Overview.