In E-MapReduce (EMR), jobs are the executable units of data development within a project. This topic covers the full job lifecycle: creating a job, configuring its settings, adding annotations, running it, and managing it.
Prerequisites
Before you begin, ensure that you have:
A project created in EMR. For more information, see Manage projects.
Create a job
To create a job, you need at minimum:
A name and job type (the job type cannot be changed after creation)
A folder within the project to place the job
Steps:
Go to the Data Platform tab.
Log on to the Alibaba Cloud EMR console.
In the top navigation bar, select the region where your cluster resides and select a resource group.
Click the Data Platform tab.
In the Projects section, find the project you want to manage and click Edit Job in the Actions column.
In the Edit Job pane on the left, right-click the target folder and select Create Job.
Right-clicking a folder also gives you the options to Create Subfolder, Rename Folder, or Delete Folder.
In the Create Job dialog box, specify Name and Description, and select a job type from the Job Type drop-down list. EMR supports the following job types: Shell, Hive, Hive SQL, Spark, Spark SQL, Spark Shell, Spark Streaming, MapReduce, Sqoop, Pig, Flink, Streaming SQL, Presto SQL, and Impala SQL.
Click OK.
After the job is created, configure and edit it as needed.
Configure a job
Open the Job Settings panel by clicking Job Settings in the upper-right corner of the job page. The panel has four tabs.
For type-specific configuration (how to develop and write scripts for each job type), see Jobs.
Basic settings
| Parameter | Description |
|---|---|
| Name | The name of the job. |
| Job Type | The type of the job. |
| Retries | The number of retries allowed if the job fails. Valid values: 0–5. |
| Actions on Failures | The action to take when the job fails. Pause suspends the current workflow. Run Next Job continues to the next job in the workflow. |
| Use Latest Job Content and Parameters | Controls which content and parameters are used when a failed job is rerun. When turned off, the rerun uses the original content and parameters. When turned on, the rerun uses the latest content and parameters. |
| Description | The description of the job. Click Edit on the right to modify it. |
| Resources | JAR packages and user-defined functions (UDFs) required to run the job. Click the icon on the right to add resources. Upload resources to Object Storage Service (OSS) before adding them here. |
| Configuration Parameters | Variables referenced in the job script as ${Variable name}. Add variables as key-value pairs by clicking the icon on the right. Select Password to hide a value. Configure time variables based on the scheduling start time — see Configure job time and date. |
Advanced settings
| Section | Parameter | Description |
|---|---|---|
| Mode | Job Submission Node | Where the job is submitted. Worker Node submits the job to YARN via a launcher; YARN allocates resources to run it. Header/Gateway Node runs the job as a process on the allocated node. For details on the trade-offs, see Job submission modes. |
| Mode | Estimated Maximum Duration | The maximum allowed run time of the job, in seconds. Valid values: 0–10800. |
| Environment Variables | — | Environment variables passed to the job at runtime. These are equivalent to exporting variables in the job script before execution. |
| Scheduling Parameters | Queue, Memory (MB), vCores, Priority, Run By | Resource allocation parameters for the YARN scheduler. If left blank, the Hadoop cluster defaults apply. The Memory (MB) parameter sets the memory quota for the launcher, not the job itself. |
Shared libraries
In the Dependent Libraries section, specify the dependency libraries required by the job. Job execution depends on library files related to data sources. Enter each library as a reference string — for example, sharedlibs:streamingsql:datasources-bundle:2.0.0. Separate multiple libraries with commas.
The sharedlibs annotation (see Add annotations) is only valid for Streaming SQL jobs.
Alert settings
| Parameter | Description |
|---|---|
| Execution Failed | Send a notification to an alert contact group or a DingTalk alert group if the job fails. |
| Action on Startup Timeout | Send a notification if the job startup times out. |
| Job execution timed out. | Send a notification if the job execution times out. |
Add annotations
Annotations let you set job parameters directly in the job script, without using the Job Settings panel.
Annotation parameters take precedence over parameters set in the Job Settings panel. If the same parameter is configured in both places, the annotation value takes effect.
Use the following format. Start annotations at the beginning of the line — do not indent the !!!. Add one annotation per line.
!!! @<Annotation name>: <Annotation content>The following table lists all supported annotations.
| Annotation | Description | Example |
|---|---|---|
rem | Adds a comment. | !!! @rem: This is a comment. |
env | Sets an environment variable. | !!! @env: ENV_1=ABC |
var | Defines a custom variable. | !!! @var: var1="value1"<br>!!! @var: var2=${yyyy-MM-dd} |
resource | Adds a resource file from OSS. | !!! @resource: oss://bucket1/dir1/file.jar |
sharedlibs | Adds dependency libraries (Streaming SQL jobs only). Separate multiple libraries with commas. | !!! @sharedlibs: sharedlibs:streamingsql:datasources-bundle:1.7.0,... |
scheduler.queue | Specifies the YARN queue to submit the job to. | !!! @scheduler.queue: default |
scheduler.vmem | Specifies the memory required to run the job, in MiB. | !!! @scheduler.vmem: 1024 |
scheduler.vcores | Specifies the number of vCores required. | !!! @scheduler.vcores: 1 |
scheduler.priority | Specifies the job priority. Valid values: 1–100. | !!! @scheduler.priority: 1 |
scheduler.user | Specifies the user who submits the job. | !!! @scheduler.user: root |
Invalid annotations — unknown annotation names or incorrectly formatted content — are automatically skipped.
Run a job
On the job page, click Run in the upper-right corner.
In the Run Job dialog box, select a resource group and the cluster, then click OK.
View run details in the lower panel of the job page:
Log tab: operational logs for the current run.

Records tab: execution records across all job instances.
Click Details in the Action column of a job instance to open the Scheduling Center tab, where you can inspect instance-level details.
More operations
Right-click a job in the Edit Job pane to access the following operations.
| Operation | Description |
|---|---|
| Clone Job | Copies the job's configuration and creates a new job in the same folder. |
| Rename Job | Renames the job. |
| Delete Job | Deletes the job. A job can only be deleted if it is not associated with a workflow, or if the associated workflow is neither running nor being scheduled. |
Job submission modes
When you submit a Spark job, EMR uses a launcher process (spark-submit) to start it. This launcher typically consumes more than 600 MiB of memory. The Memory (MB) parameter in Job Settings sets the memory allocated to the launcher.
EMR supports two submission modes. Choose based on cluster stability requirements and master node resource usage.
| Mode | How it works | When to use |
|---|---|---|
| Worker Node | The launcher runs on a core node inside a YARN container, monitored by YARN. | Preferred for most workloads. Reduces resource pressure on the master node and keeps job processes under YARN management. |
| Header/Gateway Node | The launcher runs on the master node as a standalone process, outside YARN monitoring. | Use only if YARN-managed submission is not suitable. Running many jobs in this mode risks master node stability. |
Memory breakdown
The total memory consumed by a job instance is:
Memory consumed by a job instance = Memory consumed by the launcher + Memory consumed by a jobFor Spark jobs specifically:
Memory consumed by a job = Memory consumed by the spark-submit logical module (not the process) + Memory consumed by the driver + Memory consumed by the executorThe process that hosts the driver depends on how Spark is launched on YARN.
| Spark launch mode | Where spark-submit and driver run | Notes |
|---|---|---|
| yarn-client mode (LOCAL submission) | Driver runs in the same process as spark-submit. | The process runs on the master node, outside YARN monitoring. |
| yarn-client mode (YARN submission) | Same process as above, but runs on a core node. | The process occupies a YARN container and is monitored by YARN. |
| yarn-cluster mode | Driver runs in a separate process from spark-submit. | The driver occupies a YARN container. |
What's next
Configure job time and date: Set up time variables for scheduled jobs.
Manage projects: Associate clusters, add members, and configure global variables for your project.