triggered workflows - DataWorks - Alibaba Cloud Documentation Center

DataWorks supports two types of workflow scheduling: Recurring Schedule and Triggered Schedule. Unlike a recurring workflow that runs at a fixed time, a triggered workflow is an on-demand execution model that is started by external signals, such as manual actions, API calls, or event messages. This model provides greater real-time capability and flexibility for data processing.

Function introduction

DataWorks supports two workflow scheduling modes: recurring schedule and triggered schedule.

A triggered schedule is an on-demand scheduling mode. Unlike a recurring workflow that runs automatically at a fixed time, a triggered workflow is started by an external signal. This mode offers high flexibility and is suitable for scenarios that require programmatic integration or a response to external events.

The following three trigger methods are supported:

Manual trigger: You can manually run a workflow in the DataWorks console.
OpenAPI trigger: An external system can trigger a workflow to run by calling an OpenAPI.
Event trigger: A pre-configured Trigger listens for specific events, such as an OSS file upload or a message arriving in a message queue such as Kafka, to automatically start the workflow. The event trigger is enabled only after the workflow is published to the production environment.

The configuration of inner nodes, such as PyODPS and Shell nodes, in a triggered workflow is the same as in a recurring workflow, but you do not need to configure a scheduling period.

Quotas and limits

Number of nodes: A single workflow supports a maximum of 400 inner nodes. We recommend that you keep the number of nodes under 100 to simplify workflow display and maintenance.
Concurrent instances: The maximum value for Maximum Parallel Instances is 100,000.
A workflow can be automatically triggered by an event only after it is published to the production environment (Operation Center).
Configuration limits: For node-level scheduling, you can configure only Priority, not Priority Weighting Policy.

Access the feature

Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose Shortcuts > Data Studio in the Actions column.
In the navigation pane on the left, click . Then, to the right of Project Folder, click > New Workflow to open the New Workflow page.
The first time you use the Project Folder, you can click the New Workflow button to create a workflow.

Create a triggered workflow

On the New Workflow page, you can set Scheduling Type to Triggered Schedule.
Enter a Name for the workflow and click Confirm.
(Optional) On the right side of the canvas, click Schedule Configuration. Click Scheduling Policy and select a trigger from the drop-down list to associate with the current workflow.

Design a triggered workflow

Orchestrate nodes
In the left-side pane of the workflow canvas, select the required node type based on your task. Drag the node to the canvas and manually connect the nodes to configure their dependencies.
The node configuration is similar to that of a recurring workflow, but you do not need to configure a scheduling period.
Configure workflow scheduling parameters
On the right side of the workflow canvas, click the scheduling configuration button. In the Scheduling Parameters interface, click Add Parameter to set scheduling parameters for the workflow. These parameters are scoped to the current workflow and can be referenced by all its inner nodes.
Note
If an inner node in the workflow is configured with a scheduling parameter that has the same name as a workflow scheduling parameter, the node-level parameter takes precedence.
Associate a trigger (Optional)
To automatically trigger a workflow with an event, first configure a trigger in the Operation Center. Then, on the right side of the workflow canvas, go to Schedule Configuration > Scheduling Policy and select the created trigger to associate with the workflow.

Configure priority and concurrency (Advanced Configuration)

When multiple workflows or tasks are triggered simultaneously and cause a system resource bottleneck, you can use Priority and Weighting Policy for intelligent resource scheduling. This ensures that the most important tasks are executed first.

Ensure core business continuity: You can set a higher priority for core business workflows so they always run before other non-core workflows.

Reduce the duration of critical processes: Within a single workflow instance, you can influence the execution order of nodes using the Priority Weighting Policy. For example, with the Downward weighting policy, nodes on the critical path that have more upstream dependencies are assigned a higher dynamic weight. This allows them to be executed first, which shortens the runtime of the entire workflow.

Configuration item	Function description
Priority	Defines the absolute priority level of a workflow instance in the scheduling queue. Available levels are 1, 3, 5, 7, and 8. A higher number indicates a higher priority. High-priority tasks or workflows always get scheduling resources before low-priority ones.
Priority Weighting Policy	Defines how the weights of inner nodes (tasks) are dynamically calculated within the same priority level. Nodes with higher weights get execution opportunities first. No weighting: All nodes have a fixed baseline weight. Downward weighting: The weight of a node is dynamically adjusted. The more upstream dependencies a node has, the higher its weight. This policy helps nodes on the critical path of a directed acyclic graph (DAG) to be executed first. The weight is calculated as: `Initial weight value + Sum of priorities of all upstream nodes`.
Maximum Parallel Instances	Controls the maximum number of instances of this workflow that can run at the same time. This is used for concurrency control and resource protection. When the number of running instances reaches the limit, subsequent triggered instances will enter a waiting state. You can set this to Unlimited or specify a custom maximum value up to 100,000. Note If the set limit exceeds the maximum capacity of the resource group, the actual concurrency bottleneck will be determined by the physical limit of the resource group.

The DataWorks priority system follows a hierarchical override rule: Runtime specification > Node-level configuration > Workflow-level configuration.

Workflow-level configuration (Baseline): Is set in the workflow's Scheduling Policy and serves as the default setting for all nodes.
Node-level configuration (Local): On the Schedule Configuration > Scheduling Policy page for an individual node in a workflow, you can set a higher Priority for that node. This setting overrides the workflow-level configuration.
When manually triggering a run in the Operation Center, you can use the Reset Priority At Runtime switch to specify a temporary configuration. This configuration has the highest precedence, applies only to the current run, and does not modify any permanent configurations.

Develop a triggered workflow

Develop nodes
On the node editing page, you can edit the node code. Note the following points during code development:
- The code syntax depends on the node type that you choose. Different task types have different configurations. For more information about the configurations, see Node development.
- You can enable the Copilot intelligent programming assistant to obtain intelligent code completion suggestions and improve development efficiency.
- For most node types, you can define variables using the ${variable_name} format. This lets you quickly debug the task code by substituting different values in the next step.
- When you execute a scheduling task, you can use the ${workflow.Parameter_name} format to retrieve the value of a workflow parameter in an inner node.
Use parameters passed by the trigger (Optional) A trigger can pass event information, such as file paths or message content, as event parameters for inner nodes in the workflow to use.
Take a Kafka trigger as an example. The Kafka message is shown below:
The message body format is provided in the Message Format Example section of the trigger details.
```
{
  "headers": {
    "headers": [],
    "isReadOnly": false
  },
  "partition": 2,
  "offset": 1,
  "topic": "demo-topic",
  "key": "demo-key",
  "value": "{\"number\":100,\"name\":\"EventBridge\"}",
  "timestamp": 1713852706576
}
```
In an inner node of a workflow, you can use the ${workflow.triggerMessage} parameter to obtain the complete message body. You can also use ${workflow.triggerMessage.xxx} to obtain the value of a specific field within the message body. The parameters are automatically replaced when the trigger starts the task. For example:
```
${workflow.triggerMessage}  # Get the entire message body
${workflow.triggerMessage.key}  # Get the value of the key field from the JSON. Result: demo-key
${workflow.triggerMessage.value}  # Get the value of the value field from the JSON. Result: {"number":100,"name":"EventBridge"}
```

Debug a triggered workflow

Debug a node
After editing the code, click Debug Configuration on the right side of the node editing page to configure debug parameters, such as the resource group and script parameters. After the configuration is complete, click the Run button in the toolbar. The node will run using the parameters you specified in Debug Configuration.
Debug a workflow
To debug a triggered workflow in Data Studio, click the Run icon on the toolbar above the workflow canvas. In the dialog box that appears, fill in the Trigger Message Body to simulate an event.

Publish a triggered workflow

After you debug the workflow, click the Publish button in the toolbar to open the publishing panel. Click Start Publishing To Production. The task is then published based on the release check process. For more information, see Publish a node or workflow.

Run a triggered workflow

After a workflow is published to the Operation Center, it enters a standby state and waits for a signal. You can run a triggered workflow using one of the following methods.

Event trigger

Based on the event type of the newly created trigger, perform the relevant action on the trigger's monitored object. For example, for an OSS trigger, you can upload a file to the OSS bucket that you configured in the trigger. For a message queue trigger, you can send a message to the topic or queue that you configured in the trigger.
After the trigger receives the event message, it starts the triggered workflow as a One-time Task in the Operation Center.
Go to Operation Center > One-time Task O&M > One-time Instance to view and manage executed triggered workflow instances. You can check the instance log to confirm whether the workflow was successfully triggered and executed.

Manual trigger

Go to Operation Center > One-time Task O&M > Triggered Workflow. Find the triggered workflow that you want to run and click Run. You can then configure runtime parameters, such as the run scope, data timestamp, and trigger message body.

Other operations

Clone a triggered workflow

You can use the clone feature to quickly create a new workflow from an existing one. The clone operation copies the workflow's inner nodes (including Code, Debug Configuration, and Schedule Configuration), the dependencies between nodes, and the workflow's Schedule Configuration.

In the Project Folder on the left, right-click the triggered workflow that you want to clone.
In the pop-up menu, select Clone. The Clone window opens.
In the window, you can optionally modify the Name and storage Path for the triggered workflow. Click Confirm to start cloning.
While cloning, you can monitor the Current Progress, Duration, and Number Of Completed Nodes in the pop-up window.
Once cloning is complete, you can view the generated triggered workflow in the Project Folder.
To add a new node to the triggered workflow, you can quickly create a node by cloning an existing one or create an inner node by dragging and dropping.

Triggered workflow version management

The system lets you revert a triggered workflow to a specified historical version using the version management feature. This feature also provides version viewing and comparison functions to help you analyze differences and make adjustments.

In the Project Folder on the left, double-click the triggered workflow to open the workflow canvas.
Click Version on the right side of the workflow canvas. On the Version page, you can view and manage Development Records and Publishing Records.
- How to View a version:
  1. On the Development Records or Publishing Records tab, find the desired triggered workflow version.
  2. In the Operation column, click View. On the details page, you can view the Code and Schedule Configuration of the triggered workflow.
    Note
    You can view the Schedule Configuration information in Code Editor or visualization mode, and switch between them in the upper-right corner of the Schedule Configuration tab.
- Compare versions:
  You can compare versions of a triggered workflow on the Development Records or Publishing Records tab. The following example demonstrates this operation on the Development Records tab.
  - Compare versions in the development or publishing environment: On the Development Records tab, select two versions and click the Select And Compare button at the top. You can then compare the code and schedule configurations between the two triggered workflow versions.
  - Compare versions between the development environment and the publishing or build environment:
    1. On the Development Records tab, locate a specific version of the triggered workflow.
    2. Click the Compare button in the Operation column. In the Please Select Content To Compare window, select a version from Publishing Records or Build Records to compare.
- Revert to a previous version:
  You can only revert a triggered workflow to a specific historical version in Development Records. On the Development Records tab, find the target version and click the Revert button in the Operation column. This action reverts the triggered workflow to the target version.
  Note
  When you revert a workflow, the system restores the target version and creates a new version record.

More operations

After a triggered workflow is published, you can perform O&M operations in the Operation Center. For more information, see One-time Task O&M.