A workflow is a visual directed acyclic graph (DAG) that you build by dragging and connecting task nodes. This topic describes how to create, run, and manage workflows in Realtime Compute for Apache Flink, including how to use data backfill to process historical partitions.
Limitations
Workflows can only schedule deployed batch jobs.
Task orchestration is in public preview. A Service-Level Agreement (SLA) is not guaranteed during the public preview. For more information, see Realtime Compute for Apache Flink Service Level Agreement (SLA)submit a ticket. If you have questions, submit a ticket.
Task orchestration is supported only in the China (Shanghai), China (Hangzhou), China (Beijing), China (Shenzhen), China (Zhangjiakou), and Singapore regions.
Create a workflow
Prerequisites
Before you begin, ensure that:
The batch jobs you want to schedule are deployed
You are working in a supported region (see Limitations)
Steps
Log on to the Realtime Compute for Apache Flink console.
In the Actions column of the target workspace, click Console.
In the left navigation pane, choose Operation Center > Task Orchestration.
Click Create Workflow and configure the following parameters.
Parameter Description Workflow name Must be unique within the current project. Variable configuration Defines variables for data computation based on preset values. Each variable has a name (for example, ${date}) and a value, which can be a static date, a time format, or an expression. The following system time variables are built in: <br>-system.biz.date/${system.biz.date}: The day before the scheduled time of a daily scheduling instance. Format:yyyyMMdd. <br>-system.biz.curdate/${system.biz.curdate}: The scheduled date of a daily scheduling instance. Format:yyyyMMdd. <br>-system.datetime/${system.datetime}: The scheduled time of a daily scheduling instance. Format:yyyyMMddHHmmss. <br><br>ImportantVariable configuration is not required for materialized table workflows. Workflow-level configurations take precedence over job-level configurations.
Scheduling type Controls how the workflow is triggered. Choose based on your use case: <br>- Manual Trigger: Run the workflow on demand by clicking Run. Use this for temporary tests or one-time processing tasks where you control the timing. <br>- Recurring Schedule: Trigger the workflow automatically on a defined schedule (by minute, hour, or day). Use this for production pipelines that must run at regular intervals. <br><br> ImportantWorkflows that include materialized table nodes must use Recurring Schedule.
Scheduling cycle Required for Recurring Schedule only. Use cron expressions to define the schedule. Examples: <br>- 0 0 */4 ? * *— every 4 hours <br>-0 0 2 ? * *— daily at 02:00 <br>-0 0 5,17 ? * MON-FRI— at 05:00 and 17:00, Monday through Friday <br><br>For cron expression syntax, see Rules for writing cron expressions.Scheduling start time Required for Recurring Schedule only. Set this to a future time. If the start time is in the past, the workflow may miss its first scheduling window and result in a dry run or failure. <br><br> ImportantAfter creating a recurring workflow, enable its Scheduling State to activate it.
Failure retries The number of times to retry a failed node. By default, failed nodes are not retried. Node-level retry settings override this workflow-level setting. Failure notification The email address that receives alerts when a workflow node fails. To receive alerts via DingTalk or SMS, configure notifications through Cloud Monitor. For details, see Configure monitoring and alerting. Resource queue The default deployment target for all nodes in the workflow. For details, see Manage resource queues. NoteThis setting does not change the deployment target of corresponding deployed batch jobs. Node-level resource queue settings override this workflow-level setting.
Tags Optional tag name and value for organizing workflows. Click Create. The workflow node editor opens.
Configure the initial node. The editor includes an initial node by default. Click the node, configure the parameters in the Edit Node panel, and click Save.
After you create a materialized table node, a dialog box prompts you to build descendant nodes using data lineage. Descendant nodes must be partitioned tables created in VVR 11.0 or later with stream refresh mode and a freshness of less than 30 minutes.
Job
Parameter Description Job Select a deployed batch job from the current project. Fuzzy search is supported. Node name The display name of the node within the workflow. Upstream nodes Other nodes in the workflow that this node depends on. The initial node has no upstream dependencies. Failure retries The number of retries for this node. If set, this overrides the workflow-level retry count. Status subscription Notification settings for this node. Subscriptions are supported for Start and Fail statuses. Timeout Maximum allowed run time for the node. If exceeded, the node is marked as failed. Resource queue Deployment target for this node. If not set, defaults to the workflow-level resource queue. NoteThis does not change the deployment target of the corresponding deployed batch job.
Tags Optional tag name and value for this node. Materialized table
Parameter Description Materialized table Select a partitioned table created in Ververica Runtime (VVR) 11.0 or later with stream refresh mode. Node name The display name of the node within the workflow. Time partition The partition field and format (for example, yyyyMMdd) for the materialized table.Resource configuration Resource allocation for scheduled backfills. Enable Auto-infer to let the system determine the appropriate concurrency automatically. Upstream nodes Other nodes in the workflow that this node depends on. The initial node has no upstream dependencies. After you create a materialized table node, descendant nodes are inferred automatically from data lineage. Failure retries The number of retries for this node. If set, this overrides the workflow-level retry count. Status subscription Notification settings for this node. Subscriptions are supported for Start and Fail statuses. Timeout Maximum allowed run time for the node. If exceeded, the node is marked as failed. Resource queue Deployment target for this node. If not set, defaults to the workflow-level resource queue. NoteThis does not change the deployment target of the corresponding materialized table.
Tags Optional tag name and value for this node. (Optional) Click Add Node to add more nodes.
In the upper-right corner, click Save, then click OK in the confirmation dialog box.
Run a workflow
Each workflow run generates a workflow instance, visible on the Workflow Instance List and Details tab of the workflow details page.
Manual trigger
In the Actions column of the target workflow, click Run. In the dialog box, select Manual Execution and click OK. Each click runs the workflow once. Use this method for temporary tests or immediate processing.
Recurring schedule
To start a recurring workflow at its scheduled time, enable its Scheduling State. The workflow then runs automatically at the configured intervals.
To backfill historical data or reprocess a specific partition for a past period, use the data backfill feature described below.
Data backfill
Data backfill supplements or updates data for a past time period. Use it when you need to retransmit historical data from upstream, correct dimension tables, or add new data interfaces.
Perform a data backfill
Log on to the Realtime Compute for Apache Flink console.
In the Actions column of the target workspace, click Console.
In the left navigation pane, choose Operation Center > Task Orchestration.
In the Actions column of the target workflow, click Run.
In the Run dialog box, select Data Backfill as the scheduling method and configure the following parameters.
Parameter Description Time interval The time period to backfill. The interval is passed to the workflow's time variables to refresh data in the corresponding partitions. Resource queue The queue where the backfill task runs. Defaults to default-queue.
Click OK.
Manage data backfill instances
Data backfill instances are managed the same way as regular workflow instances. Click the workflow name to go to its instance page.

On the Workflow Instance List and Details tab, view the backfill instances and their details, including run times and statuses.
For more information, see Manage workflow instances and node instances.
Workflow status
The Running Status column shows an aggregated view of all instances for a workflow. For example, a workflow that runs once a day for five days generates five instances, and Running Status summarizes their combined state.
| Color | Status | Description | Action required |
|---|---|---|---|
| Purple | Queuing | The workflow is waiting to run. | None. The workflow starts automatically. |
| Blue | Running | The workflow is currently executing. | None. Wait for the run to complete. |
| Green | Success | All instances completed without errors. | None. |
| Red | Failed | One or more instances failed. | Check the instance details for error information. |
Edit a workflow
Log on to the Realtime Compute for Apache Flink console.
In the Actions column of the target workspace, click Console.
In the left navigation pane, click Task Orchestration.
In the Actions column of the target workflow, click Edit Workflow. For parameter details, see Create a workflow.
NoteYou cannot edit a workflow when its Scheduling State is Enabled.
What's next
To schedule Flink SQL batch tasks on DataWorks, see Flink SQL Batch node and Node scheduling configuration.
For task orchestration concepts, see Task orchestration (public preview).
To view workflow and node instance logs, see Manage workflow instances and node instances.
To isolate and manage resources with queues, see Manage resource queues.
To deploy batch jobs (SQL, JAR, and Python), see Deploy a job.