All Products
Search
Document Center

E-MapReduce:Manage workflows

Last Updated:Apr 19, 2025

A workflow is an orderly process that consists of a series of jobs. Jobs in a workflow depend on each other and run in a specific order. If you want to run tasks at specific points in time, you can create a workflow and configure tasks and scheduling policies in the workflow. This topic describes how to create and run a workflow.

Prerequisites

  • You have created a workspace. For more information, see Manage workspaces.

  • You have developed and published jobs.

Create a workflow

  1. Go to the Workflows page.

    1. Log on to the E-MapReduce console.

    2. In the left navigation bar, select EMR Serverless > Spark.

    3. On the Spark page, click the name of the target workspace.

    4. On the EMR Serverless Spark page, click Workflows in the left navigation bar.

  2. On the Workflows page, click Create Workflow.

  3. In the Create Workflow panel, enter the required information and click Next.

    Parameter

    Description

    Workflow Name

    The name of the workflow. The name must be unique in a workspace.

    Resource Queue

    The default resource queue for the workflow.

    Note

    The resource queue specified for workflow nodes can override the default resource queue.

    Other Settings

    Scheduling Type

    The mode in which the node is run in the production environment. Valid values:

    • None (Manual): The workflow is manually run. This is the default value.

    • Scheduler: The workflow runs based on the settings of the scheduler. The workflow can be scheduled to run by minute, hour, or day.

      If you set Scheduling Type to Scheduler, you must also configure the Scheduling Cycle and Scheduling Start Time parameters.

    Scheduling Cycle

    The scheduling cycle of the workflow. This parameter determines the scheduling frequency of the workflow in the production environment. DataWorks generates instances for the node based on the scheduling frequency and the number of scheduling cycles of the node. The node is run as an instance. This parameter is required only when Scheduling Type is set to Scheduler.

    Valid values:

    • Days: Nodes run once a day at the specified point in time.

    • Hours: Nodes run at the specified interval of N hours during the specified period of time every day.

    • Minutes: Nodes run at the specified interval of N minutes during the specified period of time every day.

    Scheduling Start Time

    The date and time when the workflow is scheduled to run. The default value is the current time. This parameter is required only when Scheduler is selected.

    Important

    After you create a workflow with Scheduler type, you need to turn on the Scheduling Status switch on the Workflows page to trigger the workflow at the specified effective time.

    Number Of Retries

    The number of retries after a workflow node fails to run. By default, no retry is performed.

    Note

    The number of retries specified for a workflow node can override the value of this parameter.

    Failure Notification

    The email address to which a notification is sent after the workflow fails to run.

    Tags

    The tags that are used to identify the workflow. You can specify the key and value of each tag.

  4. Add a node in the workflow.

    1. On the Edit Workflow page, click Add Node at the bottom.

    2. In the Add Node panel that appears, configure the node parameters.

      Parameter

      Description

      Source File Path

      The job path that corresponds to the node. The job in the path must be published.

      Node Type

      The type of the node. By default, the system infers the type of the node based on the job in the corresponding path.

      Node Name

      The name of the node. The system automatically enters a node name based on the value of Source File Path. You can also specify a name based on your business requirements.

      Upstream Node

      The upstream node of the current node. The upstream node must be a node that is created in the current workflow.

      You do not need to specify an upstream node for the first node in the workflow.

      Number Of Retries

      The number of retries defined in the workflow is used. By default, no retry is performed.

      Timeout (seconds)

      The timeout period for a single run of the node. By default, no limit is imposed.

      Status Subscription

      The email address to which a notification is sent when the node is in the specified state.

      Tags

      The tags of the node. By default, the workflow_name and task_name tags are provided for each node.

      Resource Queue

      The resource queue that is used to run the node. By default, the resource queue that you specify for the workflow is used. You can configure a resource queue for the node to override the resource queue that you specified for the workflow.

      Important

      After you specify a resource queue for the workflow node, the specified resource queue prevails even if you modify the resource queue configured for the workflow.

      Note

      If you use an SQL job, you can configure the parameters in the Task Configuration section based on your business requirements. Task parameters inherit from the task template by default. You can modify the task template to adjust the default values. For more information about the parameters, see Manage configurations.

    3. Click Save.

      After the initial node is configured, you can click Add Node at the bottom of the page to add more nodes.

  5. Deploy the workflow.

    1. Click Deploy Workflow in the upper-right corner.

    2. In the Deploy dialog box, enter deployment information and click OK.

Run a workflow

Each time a workflow runs, a workflow instance is generated on the Workflow Instance List tab of the workflow details page.

Debug run

When you edit a workflow, you can debug the workflow of the latest version.

  1. On the Edit Workflow page, click Debug Run.

    image

  2. In the Debug Run dialog box, select a resource queue for the development environment and click Run.

System scheduling

If you select Scheduler for Scheduling Type when you create a workflow and turn on the Scheduling Status switch after the workflow is created, the workflow is triggered to run at the specified effective time.

image.png

Trigger run

On the Workflows page, click the name of the target workflow, and then click Run in the upper-right corner. Select a scheduling method to trigger the current workflow to run.

  • Manual Run (default): The task is immediately executed by manual triggering, without relying on the system's scheduled rules.

  • Backfill: Processes workflows for a historical time period, typically used to fix workflows that were not run or failed. When using the backfill scheduling method, you need to configure the following parameters:

    Parameter

    Description

    Business Cycle

    The system generates corresponding workflow instances based on the time range you select.

    • You can select cycles later than the current time. When the actual time is greater than the set time, the backfill workflow instance will automatically start running.

    • Backfill workflow instances are generated and executed only when the workflow's scheduled time falls within the selected business cycle.

    • If time variables exist in the workflow (for example, ${ds}), the system automatically replaces these variables with the time of the selected business cycle.

    Resource Queue

    By default, this is consistent with the resource queue set for the workflow. You can select other available queues in the production environment from the dropdown list.

    Remarks

    You can enter descriptive information for the backfill workflow to facilitate subsequent management and troubleshooting.

    More Settings

    Failure Notification: You can set up email addresses for failure alerts to receive timely notifications when backfill workflows fail.

View running status

You can view the running status of all workflow instances and nodes of a workflow in the Workflow Running Status and Workflow Node Running Status columns of the target workflow.image.png

  • Status of workflow runs

    Status

    Description

    Blue

    Running

    Green

    Succeeded

    Red

    Failed

    Purple

    Pending

  • Status of workflow nodes

    Status

    Description

    Blue

    Running

    Green

    Succeeded

    Red

    Failed

    Yellow

    Retrying

    Purple

    Pending

References