DataWorks provides two types of workflows: auto triggered workflow and manually triggered workflow. Nodes in an auto triggered workflow are scheduled to run on a regular basis, and nodes in a manually triggered workflow are manually triggered to run. This topic describes how to use an auto triggered workflow.
Background information
Workflows are a type of automated management tool for data processing processes. You can drag different types of nodes on the configuration tab of a workflow to easily configure scheduling dependencies for the nodes. This helps accelerate the construction of a data processing process and effectively improve task development efficiency.
This topic describes how to use an auto triggered workflow. For information about how to use a manually triggered workflow, see Manually triggered workflow.
For information about differences between auto triggered workflows and manually triggered workflows, see Feature description.
Supported scheduling dependencies
The configuration of scheduling dependencies for an auto triggered workflow is much like that for a common node. Nodes in an auto triggered workflow or an auto triggered workflow as a whole can depend on or be depended on by other nodes. The following dependencies are supported:
A workflow as a whole can be depended on by other independent tasks or workflows.
A workflow as a whole can depend on other independent tasks or workflows.
Tasks in a workflow can depend on other independent tasks or workflows.
Tasks in a workflow can be depended on by other independent tasks or workflows.
State changes of auto triggered workflows in the running process
You can specify a scheduling time for an auto triggered workflow. The running of nodes in the auto triggered workflow is affected by the scheduling time. If a node depends on the auto triggered workflow, the running of the node is affected by the scheduling time of the auto triggered workflow. In scheduling scenarios, the status of an auto triggered workflow is affected by the status of tasks in the workflow.
Special scenarios:
If an instance is frozen or suspended in a workflow, the entire workflow instance enters the failed state.
If a data backfill instance generated for a task in a workflow is frozen, the workflow instance enters the successful state.
If a task cannot be run in data backfill scenarios, the workflow to which the task belongs enters the failed state.
A time difference exists between the time when the instance status changes and the actual time when a failure event is generated.
If a merge node exists in a workflow, ancestor nodes of the merge node may fail. In this case, you can check whether the workflow is in the successful state based on whether the merge node is in the successful state.
Execution time and parameter replacement for nodes in an auto triggered workflow
You do not need to configure a scheduling cycle for nodes in a workflow. You need to only configure the delayed execution time for nodes in a workflow. The delayed execution time indicates the amount of time for which a node in a workflow is delayed from the scheduling time of the workflow.
The actual running time of a node in a workflow is calculated based on the delayed execution time configured for the node and the scheduling time configured for the workflow.
The value assignment of scheduling parameters for nodes in a workflow is determined based on the overall scheduling time of the workflow, but not the time after a delay.
Limits
Only new-version Data Studio supports auto triggered workflows.
Create a workflow
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
In the left-side navigation pane of the Data Studio page, click the
icon. In the Workspace Directories section of the DATA STUDIO pane, click the
icon on the right side and select Create Workflow. In the popover that appears, specify a workflow name and press Enter. The configuration tab of the workflow appears.
NoteThe first time you perform operations in the Workspace Directories section of the DATA STUDIO pane, you can directly click Create Workflow to create a workflow.
Design the workflow
Go to the configuration tab of the workflow in the editing state.
After you create a workflow, the configuration tab of the workflow appears and the workflow enters the editing state by default. Alternatively, you can click the name of your workflow in the Workspace Directories section to go to the configuration tab of the workflow and then click Edit Workflow.
Design the workflow.
On the left side of the configuration tab of the workflow, select node types based on the types of tasks that you want to develop, and drag the node types to the canvas on the right. Then, manually drag lines to configure scheduling dependencies between the nodes.
NoteDataWorks encapsulates the capabilities of different compute engines in different types of nodes. You can use nodes of different compute engine types to develop data in a visualized manner, without the need to run complex commands on compute engines. You can also use the general nodes of DataWorks to design complex logic.
If a large number of nodes are created in a workflow, the workflow may not run properly. You can create a maximum of 200 nodes in a workflow, but we recommend that you create no more than 100 nodes in a workflow.
Save the workflow.
After you design the workflow, click Save in the top toolbar of the configuration tab of the workflow to save the workflow.
Develop the workflow
Go to the configuration tab of a node in the workflow.
On the configuration tab of the workflow, move the pointer over a desired node and click Open Node to go to the configuration tab of the node.
Develop the node.
On the configuration tab of the node, edit node code. Take note of the following items during code development:
The code syntax depends on the node type that you selected. Different types of tasks differ in scheduling configuration. For more information, see Node development.
You can enable the intelligent programming assistant Copilot to obtain intelligent code completion suggestions and improve development efficiency.
For most node types, you can define variables in the
${Variable name}
format. This way, you can assign different scheduling parameters to the variables as values to facilitate task code debugging.
Debug and run the node.
Configure debugging parameters. After you edit the code, you can click the Debugging Configurations tab in the right-side navigation pane of the configuration tab of the node to configure debugging parameters.
In the Computing Resource section, specify a computing resource for tasks on the node to be debugged.
In the DataWorks Configurations section, specify a resource group that is used to run tasks in DataWorks.
If you defined variables in the
${Variable name}
format in the node code, assign constants to the variables in the Script Parameters section.
Debug and run the node. After the configuration is complete, click Run in the top toolbar of the configuration tab of the node to run the node based on the debugging parameters configured on the Debugging Configurations tab.
Deploy the workflow
You can refer to the following procedure to define scheduling settings for a workflow and nodes in the workflow, and deploy the workflow to the production environment for periodic scheduling.
Configure scheduling settings for nodes in the workflow.
The procedure of configuring scheduling settings for nodes in a workflow is basically the same as that for common nodes. For more information, see Scheduling dependencies. Take note of the following items when you configure scheduling settings for nodes in a workflow:
You do not need to separately configure a scheduling time for nodes in a workflow. Instead, you can configure the delayed execution time for nodes in a workflow. The delayed execution time indicates the duration by which the running of a node lags behind the running of the related workflow.
The values of variables used in code of nodes in a workflow are assigned based on the scheduling time of the workflow.
Configure scheduling settings for the workflow.
Configure scheduling parameters, scheduling time, and scheduling dependencies for the workflow.
Deploy the workflow.
Click the
icon in the top toolbar of the configuration tab of the workflow. On the DEPLOY tab, click Start Deployment to Production Environment. The workflow is deployed based on the check and deployment process. For more information, see Node/workflow release.
What to do next: Workflow O&M
After an auto triggered workflow is deployed, the auto triggered workflow is scheduled on a regular basis. You can view the status of the auto triggered workflow in Operation Center in the production environment and perform O&M operations on the auto triggered workflow. For more information, see Overview and Backfill data and view data backfill instances (new version).