A workflow automates data processing by organizing task nodes into a visual DAG with drag-and-drop, establishing dependencies and scheduling to build reliable data pipelines.
Key concepts
A workflow is a core orchestration unit in DataWorks. It organizes task nodes (SQL, Shell, Python, data synchronization, Check) into a directed acyclic graph (DAG) with clear dependencies, enabling unified scheduling and execution. Workflows can also be combined to support complex business scenarios.
By integrating separate tasks into a structured process, workflows shift focus from individual task management to managing an entire data pipeline. Core benefits:
-
Abstract and visualize development processes
Encapsulate dependent nodes, such as SQL and Shell tasks, into a business-oriented workflow, like one for "Daily Active User Analysis". This creates a clear DAG that clarifies the technical lineage. It also helps non-technical stakeholders understand the data flow and aligns business goals with technical implementation. -
Atomic management for development and O&M
A workflow serves as the smallest unit for changes and operations. It supports holistic submission, deployment, and O&M tasks like testing, rerunning, and data backfill. This approach prevents production issues that can result from partial modifications and ensures end-to-end consistency and stability. -
Define boundaries for team collaboration
In a multi-team environment, a workflow clarifies ownership and responsibilities. For example, the trading team can own the trading data workflow, and the product team can own the product data workflow. This enables permission isolation and issue tracking. Standardized outputs also support efficient, decoupled collaboration between upstream and downstream teams.
Workflow type comparison
DataWorks recommends two workflow types:
-
Scheduled workflow: Runs automatically on a fixed schedule (hourly, daily, weekly). Triggered by scheduling rules, with node execution controlled by the scheduled time. Suitable for recurring data processing.
-
Event-triggered workflow: Triggered on demand by external signals—manual operations, OpenAPI calls, or event messages. Does not rely on a fixed schedule. Supports manual triggering, API triggering, and event triggering. Suitable for real-time processing or responding to external events.
|
Feature |
Scheduled workflow |
Event-triggered workflow |
Manual workflow (not recommended) |
|
Scheduling method |
Triggered by scheduled time and dependencies |
Manual/event/API triggering |
Manual execution |
|
Use case |
Daily/hourly/weekly/monthly batch processing |
Real-time processing/on-demand execution/external integration |
Ad hoc tasks (legacy compatibility) |
|
Parameter priority |
Node > Workflow > Workspace |
Node > Workflow > Workspace |
Workflow > Node |
|
Typical example |
Daily T+1 reporting at midnight |
Automatic processing when an OSS file arrives |
One-time data fix |
-
An event-triggered workflow without a bound trigger can also be used as a manually run workflow, gradually replacing manual workflows.
-
Manual workflows are mainly used for compatibility with the legacy data development mode. We do not recommend using them for new projects.
Quick selection guide
Answer the following three questions to quickly determine the right workflow type:
FAQ
How do I obtain the Spec template of a workflow?
Open an existing workflow on the Data Studio page, and click Show Spec in the upper-right corner of the canvas to view and copy the workflow Spec (in JSON format). You can use this Spec as a template to create or update workflows through OpenAPI.
Can a scheduled task automatically stop after it succeeds once?
No. A scheduled workflow runs continuously based on the configured schedule and does not automatically stop after a successful execution. If you want a task to run only once, use one of the following methods:
-
Manually freeze the task: After the task succeeds, manually freeze it. A frozen task no longer participates in scheduling. For more information, see Freeze a task.
-
Use an event-triggered workflow: If your business scenario requires only a single execution, an event-triggered workflow is more appropriate. An event-triggered workflow is not bound to a schedule and runs only when manually triggered, called through an API, or triggered by an event, which inherently meets the need to run once and stop.
References
Select the relevant document based on your scenario:
-
For periodic scheduling scenarios, see Scheduled workflows.
-
For event-driven scenarios, see Event-triggered workflows.
-
For O&M monitoring, see O&M monitoring.