All Products
Search
Document Center

Realtime Compute for Apache Flink:Manage workflows

Last Updated:Mar 26, 2026

A workflow is a visual directed acyclic graph (DAG) that you build by dragging and connecting task nodes. This topic describes how to create, run, and manage workflows in Realtime Compute for Apache Flink, including how to use data backfill to process historical partitions.

Limitations

  • Workflows can only schedule deployed batch jobs.

  • Task orchestration is in public preview. A Service-Level Agreement (SLA) is not guaranteed during the public preview. For more information, see Realtime Compute for Apache Flink Service Level Agreement (SLA)submit a ticket. If you have questions, submit a ticket.

  • Task orchestration is supported only in the China (Shanghai), China (Hangzhou), China (Beijing), China (Shenzhen), China (Zhangjiakou), and Singapore regions.

Create a workflow

Prerequisites

Before you begin, ensure that:

  • The batch jobs you want to schedule are deployed

  • You are working in a supported region (see Limitations)

Steps

  1. Log on to the Realtime Compute for Apache Flink console.

  2. In the Actions column of the target workspace, click Console.

  3. In the left navigation pane, choose Operation Center > Task Orchestration.

  4. Click Create Workflow and configure the following parameters.

    ParameterDescription
    Workflow nameMust be unique within the current project.
    Variable configurationDefines variables for data computation based on preset values. Each variable has a name (for example, ${date}) and a value, which can be a static date, a time format, or an expression. The following system time variables are built in: <br>- system.biz.date / ${system.biz.date}: The day before the scheduled time of a daily scheduling instance. Format: yyyyMMdd. <br>- system.biz.curdate / ${system.biz.curdate}: The scheduled date of a daily scheduling instance. Format: yyyyMMdd. <br>- system.datetime / ${system.datetime}: The scheduled time of a daily scheduling instance. Format: yyyyMMddHHmmss. <br><br>
    Important

    Variable configuration is not required for materialized table workflows. Workflow-level configurations take precedence over job-level configurations.

    Scheduling typeControls how the workflow is triggered. Choose based on your use case: <br>- Manual Trigger: Run the workflow on demand by clicking Run. Use this for temporary tests or one-time processing tasks where you control the timing. <br>- Recurring Schedule: Trigger the workflow automatically on a defined schedule (by minute, hour, or day). Use this for production pipelines that must run at regular intervals. <br><br>
    Important

    Workflows that include materialized table nodes must use Recurring Schedule.

    Scheduling cycleRequired for Recurring Schedule only. Use cron expressions to define the schedule. Examples: <br>- 0 0 */4 ? * * — every 4 hours <br>- 0 0 2 ? * * — daily at 02:00 <br>- 0 0 5,17 ? * MON-FRI — at 05:00 and 17:00, Monday through Friday <br><br>For cron expression syntax, see Rules for writing cron expressions.
    Scheduling start timeRequired for Recurring Schedule only. Set this to a future time. If the start time is in the past, the workflow may miss its first scheduling window and result in a dry run or failure. <br><br>
    Important

    After creating a recurring workflow, enable its Scheduling State to activate it.

    Failure retriesThe number of times to retry a failed node. By default, failed nodes are not retried. Node-level retry settings override this workflow-level setting.
    Failure notificationThe email address that receives alerts when a workflow node fails. To receive alerts via DingTalk or SMS, configure notifications through Cloud Monitor. For details, see Configure monitoring and alerting.
    Resource queueThe default deployment target for all nodes in the workflow. For details, see Manage resource queues.
    Note

    This setting does not change the deployment target of corresponding deployed batch jobs. Node-level resource queue settings override this workflow-level setting.

    TagsOptional tag name and value for organizing workflows.
  5. Click Create. The workflow node editor opens.

  6. Configure the initial node. The editor includes an initial node by default. Click the node, configure the parameters in the Edit Node panel, and click Save.

    After you create a materialized table node, a dialog box prompts you to build descendant nodes using data lineage. Descendant nodes must be partitioned tables created in VVR 11.0 or later with stream refresh mode and a freshness of less than 30 minutes.

    Job

    ParameterDescription
    JobSelect a deployed batch job from the current project. Fuzzy search is supported.
    Node nameThe display name of the node within the workflow.
    Upstream nodesOther nodes in the workflow that this node depends on. The initial node has no upstream dependencies.
    Failure retriesThe number of retries for this node. If set, this overrides the workflow-level retry count.
    Status subscriptionNotification settings for this node. Subscriptions are supported for Start and Fail statuses.
    TimeoutMaximum allowed run time for the node. If exceeded, the node is marked as failed.
    Resource queueDeployment target for this node. If not set, defaults to the workflow-level resource queue.
    Note

    This does not change the deployment target of the corresponding deployed batch job.

    TagsOptional tag name and value for this node.

    Materialized table

    ParameterDescription
    Materialized tableSelect a partitioned table created in Ververica Runtime (VVR) 11.0 or later with stream refresh mode.
    Node nameThe display name of the node within the workflow.
    Time partitionThe partition field and format (for example, yyyyMMdd) for the materialized table.
    Resource configurationResource allocation for scheduled backfills. Enable Auto-infer to let the system determine the appropriate concurrency automatically.
    Upstream nodesOther nodes in the workflow that this node depends on. The initial node has no upstream dependencies. After you create a materialized table node, descendant nodes are inferred automatically from data lineage.
    Failure retriesThe number of retries for this node. If set, this overrides the workflow-level retry count.
    Status subscriptionNotification settings for this node. Subscriptions are supported for Start and Fail statuses.
    TimeoutMaximum allowed run time for the node. If exceeded, the node is marked as failed.
    Resource queueDeployment target for this node. If not set, defaults to the workflow-level resource queue.
    Note

    This does not change the deployment target of the corresponding materialized table.

    TagsOptional tag name and value for this node.
  7. (Optional) Click Add Node to add more nodes.

  8. In the upper-right corner, click Save, then click OK in the confirmation dialog box.

Run a workflow

Each workflow run generates a workflow instance, visible on the Workflow Instance List and Details tab of the workflow details page.

Manual trigger

In the Actions column of the target workflow, click Run. In the dialog box, select Manual Execution and click OK. Each click runs the workflow once. Use this method for temporary tests or immediate processing.

Recurring schedule

To start a recurring workflow at its scheduled time, enable its Scheduling State. The workflow then runs automatically at the configured intervals.

To backfill historical data or reprocess a specific partition for a past period, use the data backfill feature described below.

Data backfill

Data backfill supplements or updates data for a past time period. Use it when you need to retransmit historical data from upstream, correct dimension tables, or add new data interfaces.

Perform a data backfill

  1. Log on to the Realtime Compute for Apache Flink console.

  2. In the Actions column of the target workspace, click Console.

  3. In the left navigation pane, choose Operation Center > Task Orchestration.

  4. In the Actions column of the target workflow, click Run.

  5. In the Run dialog box, select Data Backfill as the scheduling method and configure the following parameters.

    ParameterDescription
    Time intervalThe time period to backfill. The interval is passed to the workflow's time variables to refresh data in the corresponding partitions.
    Resource queueThe queue where the backfill task runs. Defaults to default-queue.

    Data backfill dialog

  6. Click OK.

Manage data backfill instances

Data backfill instances are managed the same way as regular workflow instances. Click the workflow name to go to its instance page.

Workflow instance page

On the Workflow Instance List and Details tab, view the backfill instances and their details, including run times and statuses.

For more information, see Manage workflow instances and node instances.

Workflow status

The Running Status column shows an aggregated view of all instances for a workflow. For example, a workflow that runs once a day for five days generates five instances, and Running Status summarizes their combined state.

ColorStatusDescriptionAction required
PurpleQueuingThe workflow is waiting to run.None. The workflow starts automatically.
BlueRunningThe workflow is currently executing.None. Wait for the run to complete.
GreenSuccessAll instances completed without errors.None.
RedFailedOne or more instances failed.Check the instance details for error information.

Edit a workflow

  1. Log on to the Realtime Compute for Apache Flink console.

  2. In the Actions column of the target workspace, click Console.

  3. In the left navigation pane, click Task Orchestration.

  4. In the Actions column of the target workflow, click Edit Workflow. For parameter details, see Create a workflow.

    Note

    You cannot edit a workflow when its Scheduling State is Enabled.

What's next