In a data development project of E-MapReduce (EMR), you can define a group of dependent
jobs, and then create a workflow to allow the jobs to run in sequence based on their
dependencies. An EMR workflow can be represented as a directed acyclic graph (DAG)
that allows big data jobs to run in parallel. You can schedule workflows or view the
running status of workflows in the EMR console.
Prerequisites
- You have logged on to the Alibaba Cloud EMR console by using your Alibaba Cloud account.
Note You can use only your Alibaba Cloud account to create projects, add project members,
and associate clusters with projects. That is, the Create Project button and the Users and Cluster Settings pages are visible only when you log on to the EMR console by using your Alibaba Cloud
account.
- A project is created. For more information, see Manage projects.
- Jobs are edited. For more information, see Edit jobs.
Create a workflow
Perform the following steps to create a workflow:
- Click the Data Platform tab.
- In the Projects section of the page that appears, find your project and click Workflows in the Actions column.
- In the Workflows pane on the left of the page that appears, right-click the folder on which you want
to perform operations and select Create Workflow.
- In the Create Workflow dialog box, specify Workflow Name, Description, Select Resource Group, and Target
Cluster.
- Select Existing Cluster: When the workflow is executed, the jobs run on the cluster that you selected.
- Create Cluster from Template: When the workflow is executed, the jobs run on a temporary cluster created by using
the cluster template that you selected. The cluster is automatically released when
the workflow ends. For more information, see Manage cluster templates.
Note Only the clusters that are associated with the project are displayed in the
Select Existing Cluster drop-down list. To select a different cluster, disassociate the existing clusters
from the project first. For more information, see
Manage projects.
- Click OK.
Edit a workflow
Perform the following steps to edit a workflow:
- Drag and drop different types of job nodes to the canvas for editing a workflow.
You need to associate each job node with jobs of the same type.
- Drag a line from the center in the lower part of each job node on the canvas to associate
this job node with other job nodes based on the dependencies between the jobs. Arrows
indicate the running direction of the workflow.
- After the job nodes are associated, drag the END widget from the Controller Node section
to the canvas and associate the END widget with the job nodes at the end of the workflow
to complete the design of the workflow.

When you edit a workflow, you can click
Lock in the upper-right corner to lock the workflow. This way, only you can edit or run
the workflow. Other members in the project can edit this workflow only after it is
unlocked.
Note Only the RAM user who performs the lock operation and the Alibaba Cloud account can
unlock the workflow.
Configure workflow scheduling
You can enable the workflow scheduling feature and configure the parameters related
to scheduling. Then, relevant workflows run periodically based on the parameter settings,
and jobs are delivered to a specified cluster for running. Perform the following steps
to configure the parameters on the Basic Attributes, Scheduling Settings, and Alert
Settings tabs in the Workflow Scheduling pane:
- Click the Data Platform tab.
- In the Projects section of the page that appears, find your project and click Workflows in the Actions column.
- On the workflow design page, click Configure.
- On the Basic Attributes tab of the Workflow Scheduling pane, modify the workflow description, resource group,
and the cluster used to run the jobs in the workflow as needed.
- After the basic attributes are modified, click the Scheduling Settings tab to configure the parameters related to workflow scheduling.
Parameter |
Description |
Scheduling Status |
Start or stop workflow scheduling. After you select Start for Scheduling Status, Scheduling appears in the upper-right corner of the workflow editing canvas, which indicates
that the workflow is being scheduled.
|
Time-based Scheduling |
In this section, you can specify Start Time and Recurrence for scheduling the workflow. During the specified period, the workflow runs based
on the recurrence settings.
|
Dependency-based Scheduling |
Select a dependent workflow of the current workflow. The current workflow is executed
only after the dependent workflow ends.
- Specify Project.
- Then, select a dependent workflow from the Dependent Workflow drop-down list.
|
- Click the Alert Settings tab to configure alert parameters.
Parameter |
Description |
Execution Failed |
Specifies whether to send a notification to a specific alert contact group or DingTalk
alert group if the workflow fails.
|
Actions on Failures |
Specifies whether to send a notification to a specific alert contact group or DingTalk
alert group if a job node in the workflow fails to run.
|
Executed |
Specifies whether to send a notification to a specific alert contact group or DingTalk
alert group if the workflow succeeds.
|
Action on Startup Timeout |
Specifies whether to send a notification to a specific alert contact group or DingTalk
alert group if a job node in the workflow does not start within 30 minutes after it
is delivered to a cluster.
|
Node execution timed out |
Specifies whether to send a notification to a specific alert contact group or DingTalk
alert group if the running time of a job node exceeds the expected maximum running
duration in the job configuration.
|
Run a workflow
You can specify the business time of a workflow. Time variables in jobs of the workflow
are calculated by using the specified business time. The business time is used for
rerunning the workflow instance in a specific period of time. You can rerun a single
workflow instance or multiple workflow instances at a time. If no time variables are
configured for your jobs, you can select Execute.
- Click the Data Platform tab.
- In the Projects section of the page that appears, find your project and click Workflows in the Actions column.
- On the page that appears, select a specific workflow and click Run in the upper-right corner.
- Configure runtime parameters.
- Execute: Run a workflow immediately. You can use the specified time as the business time
of the workflow. Time-related variables are calculated based on the business time.
- Run Periodically: Specify Start Time and Recurrence. After you turn on Skip Successful Nodes, if the workflow instance that runs at a specific business time is successful, the
system skips this workflow instance and continues to run the workflow instances that
fail at other business time.
To run multiple workflows at the same time, you must specify the start time and recurrence.
The trigger time of specific scheduling rules is used as the business time of the
workflows, and time-related variables are calculated based on the business time. A
maximum of 100 points in time are supported at a time.
- Click OK.
View the running details of workflows
After you run a workflow, you can perform the following steps to view the running
details of the workflow:
- Click the Records tab in the lower part of the workflow page.
You can view the running status of a specific workflow instance.
- Find your workflow instance and click Details in the Actions column to go to the scheduling center.
You can view the details of the workflow instance. You can also pause, resume, stop,
or rerun the workflow instance. For more information, see Scheduling center.
- View Details: You can view the details and running status of a workflow instance.
- Stop Workflow: When you click this button, all running job nodes stop running.
- Suspend current workflow: When you click this button, the running job nodes continue
running, but the subsequent job nodes in the workflow will not start.
- Recovery Workflow: When you click this button, suspended workflow instances are resumed.
- Rerun Workflow: When you click this button, you can determine whether to rerun failed
job nodes or rerun all job nodes from the START node.
Operations that can be performed on workflows
In the
Workflows pane, you can right-click a specific workflow and perform the following operations:
- Clone Workflow: Clone a workflow with the same design to the same folder.
Note The settings of the scheduling parameters for the original workflow cannot be cloned.
- Rename Workflow: Rename a workflow.
- Delete Workflow: Delete a workflow. A running workflow cannot be deleted.