When you need to run multiple Spark jobs in a specific order — for example, a data ingestion job followed by a transformation job and then a reporting job — use workflows to define the dependency chain and automate execution. This topic explains how to create, run, and monitor workflows in EMR Serverless Spark.
Key concepts:
Workflow: A pipeline of jobs linked by dependency relationships and run according to a schedule or on demand.
Node: A single job in the workflow. Nodes are connected by upstream/downstream relationships to define execution order.
Workflow run: A single execution of the workflow. Each run is recorded and viewable in the Workflow Runs tab.
Prerequisites
Before you begin, ensure that you have:
A workspace. See Manage workspaces.
Jobs that are developed and published.
Create a workflow
Go to the Workflows page.
Log on to the E-MapReduce (EMR) console.
In the left-side navigation pane, choose EMR Serverless > Spark.
On the Spark page, click the name of the workspace you want to use.
In the left-side navigation pane of the EMR Serverless Spark page, choose Operation Center > Workflows.
On the Workflows tab, click Create Workflow.
In the Create Workflow panel, configure the parameters and click Next. When Scheduling Type is Scheduler, configure:
Scheduling Time: The run frequency. Days runs once per day at a fixed time. Hours runs every N hours within a daily window. Minutes runs every N minutes within a daily window.
Scheduling Started At: The date and time when scheduled runs begin. Defaults to the current time.
ImportantAfter the workflow is created, turn on the Scheduling Status switch for the workflow on the Workflows tab. Without this, the workflow does not run at the scheduled time.
Parameter Description Name The workflow name. Must be unique within a workspace. Resource Queue The default resource queue for the workflow. Node-level resource queues override this setting. Other Settings > Scheduling Type How the workflow runs in the production environment. Valid values: None (Manual) (manually triggered, default) and Scheduler (automatic, by minute, hour, or day). See the scheduling types table below. Retries After Failure The number of times to retry a failed node. Default: no retry. Node-level retry settings override this value. Failure Notification The email address to notify when the workflow fails. Tags Key-value pairs to identify the workflow. Scheduling type Behavior Additional required parameters None (Manual) (default) Trigger runs manually. — Scheduler Run automatically by minute, hour, or day. Scheduling Time and Scheduling Started At Add nodes to the workflow. Nodes represent jobs in the pipeline. Connect them through upstream/downstream relationships to define execution order.
On the canvas, click Add Node in the lower part of the canvas.
In the Add Node panel, configure the parameters.
Parameter
Description
Source File Path
The path of the job to run at this node. The job must be published.
Node Type
Inferred automatically from the job at the specified path.
Node Name
Auto-filled from Source File Path. Customize as needed.
Upstream Node
The node that must complete before this node runs. Must be a node in the current workflow. Leave blank for the first node.
Number of Retries
Defaults to the workflow-level retry count. No retry by default.
Timeout (Seconds)
The maximum run time for a single node run. Default: no limit.
Subscription
The email address to notify when the node reaches a specified state.
Tags
The node tags. Each node includes
workflow_nameandtask_nametags by default.Resource Queue
The resource queue for this node. Defaults to the workflow resource queue. Once set at the node level, this setting persists even if you later change the workflow-level resource queue.
NoteFor SQL jobs, configure additional parameters in the Task Configuration section. Default values match the job-level configuration. See Manage default configurations.
Click Save. Repeat to add more nodes.
Publish the workflow.
In the upper-right corner, click Publish Workflow.
In the Publish dialog box, enter remarks and click OK.
Run a workflow
Each workflow run produces a run record. View run history on the Workflow Runs tab of the workflow details page.
Debug a workflow
Debug the latest version of a workflow before running it in production.
In the Actions column, click Edit for the workflow. On the page that appears, click Debug next to the workflow name.

In the Debug dialog box, select a development environment resource queue and click Run.
Run on a schedule
When Scheduling Type is set to Scheduler and the Scheduling Status switch is on, the workflow runs automatically at the configured time.

Run manually
On the Workflows tab, click the workflow name.
In the upper-right corner, click Run.
In the Run dialog box, set the Scheduling Method and click OK.
Scheduling Method values:
| Value | When to use | Behavior |
|---|---|---|
| Manually Run (default) | Run the workflow now, regardless of schedule. | Starts immediately. |
| Backfill | Reprocess data for a historical time range — for example, when a scheduled run was missed or a job was fixed and needs to rerun over past data. | Generates runs for each scheduling interval within the specified range. |
When you select Backfill, configure the following parameters:
| Parameter | Description |
|---|---|
| Cycle | The historical time range. A run is generated for each scheduling interval that falls within this range. The range can be earlier than the current time. Time variables such as ${ds} are automatically replaced with the corresponding cycle time. |
| Resource Queue | Defaults to the workflow's configured resource queue. Select a different production queue if needed. |
| Remarks | A description to help you manage and troubleshoot the run. |
| More > Failure Notification | The email address to notify if backfilling fails. |
Check workflow run status
The Workflow Runs Status column shows the status of each workflow run. The Workflow Node Runs Status column shows the status of individual nodes within a run. For details about run records and node-level run logs, see Manage workflow runs and workflow node runs.

Workflow run status
| Color | Status |
|---|---|
| Blue | Running |
| Green | Succeeded |
| Red | Failed |
| Purple | Pending |
Workflow node status
| Color | Status |
|---|---|
| Blue | Running |
| Green | Succeeded |
| Red | Failed |
| Yellow | Retrying |
| Purple | Pending |
What's next
For workflow concepts and terminology, see Terms.
To view and manage workflow run records and node run details, see Manage workflow runs and workflow node runs.