All Products
Search
Document Center

E-MapReduce:Edit a job

Last Updated:Mar 26, 2026

In E-MapReduce (EMR), jobs are the executable units of data development within a project. This topic covers the full job lifecycle: creating a job, configuring its settings, adding annotations, running it, and managing it.

Prerequisites

Before you begin, ensure that you have:

Create a job

To create a job, you need at minimum:

  • A name and job type (the job type cannot be changed after creation)

  • A folder within the project to place the job

Steps:

  1. Go to the Data Platform tab.

    1. Log on to the Alibaba Cloud EMR console.

    2. In the top navigation bar, select the region where your cluster resides and select a resource group.

    3. Click the Data Platform tab.

  2. In the Projects section, find the project you want to manage and click Edit Job in the Actions column.

  3. In the Edit Job pane on the left, right-click the target folder and select Create Job.

    Right-clicking a folder also gives you the options to Create Subfolder, Rename Folder, or Delete Folder.
  4. In the Create Job dialog box, specify Name and Description, and select a job type from the Job Type drop-down list. EMR supports the following job types: Shell, Hive, Hive SQL, Spark, Spark SQL, Spark Shell, Spark Streaming, MapReduce, Sqoop, Pig, Flink, Streaming SQL, Presto SQL, and Impala SQL.

  5. Click OK.

After the job is created, configure and edit it as needed.

Configure a job

Open the Job Settings panel by clicking Job Settings in the upper-right corner of the job page. The panel has four tabs.

For type-specific configuration (how to develop and write scripts for each job type), see Jobs.

Basic settings

ParameterDescription
NameThe name of the job.
Job TypeThe type of the job.
RetriesThe number of retries allowed if the job fails. Valid values: 0–5.
Actions on FailuresThe action to take when the job fails. Pause suspends the current workflow. Run Next Job continues to the next job in the workflow.
Use Latest Job Content and ParametersControls which content and parameters are used when a failed job is rerun. When turned off, the rerun uses the original content and parameters. When turned on, the rerun uses the latest content and parameters.
DescriptionThe description of the job. Click Edit on the right to modify it.
ResourcesJAR packages and user-defined functions (UDFs) required to run the job. Click the icon on the right to add resources. Upload resources to Object Storage Service (OSS) before adding them here.
Configuration ParametersVariables referenced in the job script as ${Variable name}. Add variables as key-value pairs by clicking the icon on the right. Select Password to hide a value. Configure time variables based on the scheduling start time — see Configure job time and date.

Advanced settings

SectionParameterDescription
ModeJob Submission NodeWhere the job is submitted. Worker Node submits the job to YARN via a launcher; YARN allocates resources to run it. Header/Gateway Node runs the job as a process on the allocated node. For details on the trade-offs, see Job submission modes.
ModeEstimated Maximum DurationThe maximum allowed run time of the job, in seconds. Valid values: 0–10800.
Environment VariablesEnvironment variables passed to the job at runtime. These are equivalent to exporting variables in the job script before execution.
Scheduling ParametersQueue, Memory (MB), vCores, Priority, Run ByResource allocation parameters for the YARN scheduler. If left blank, the Hadoop cluster defaults apply. The Memory (MB) parameter sets the memory quota for the launcher, not the job itself.

Shared libraries

In the Dependent Libraries section, specify the dependency libraries required by the job. Job execution depends on library files related to data sources. Enter each library as a reference string — for example, sharedlibs:streamingsql:datasources-bundle:2.0.0. Separate multiple libraries with commas.

The sharedlibs annotation (see Add annotations) is only valid for Streaming SQL jobs.

Alert settings

ParameterDescription
Execution FailedSend a notification to an alert contact group or a DingTalk alert group if the job fails.
Action on Startup TimeoutSend a notification if the job startup times out.
Job execution timed out.Send a notification if the job execution times out.

Add annotations

Annotations let you set job parameters directly in the job script, without using the Job Settings panel.

Important

Annotation parameters take precedence over parameters set in the Job Settings panel. If the same parameter is configured in both places, the annotation value takes effect.

Use the following format. Start annotations at the beginning of the line — do not indent the !!!. Add one annotation per line.

!!! @<Annotation name>: <Annotation content>

The following table lists all supported annotations.

AnnotationDescriptionExample
remAdds a comment.!!! @rem: This is a comment.
envSets an environment variable.!!! @env: ENV_1=ABC
varDefines a custom variable.!!! @var: var1="value1"<br>!!! @var: var2=${yyyy-MM-dd}
resourceAdds a resource file from OSS.!!! @resource: oss://bucket1/dir1/file.jar
sharedlibsAdds dependency libraries (Streaming SQL jobs only). Separate multiple libraries with commas.!!! @sharedlibs: sharedlibs:streamingsql:datasources-bundle:1.7.0,...
scheduler.queueSpecifies the YARN queue to submit the job to.!!! @scheduler.queue: default
scheduler.vmemSpecifies the memory required to run the job, in MiB.!!! @scheduler.vmem: 1024
scheduler.vcoresSpecifies the number of vCores required.!!! @scheduler.vcores: 1
scheduler.prioritySpecifies the job priority. Valid values: 1–100.!!! @scheduler.priority: 1
scheduler.userSpecifies the user who submits the job.!!! @scheduler.user: root

Invalid annotations — unknown annotation names or incorrectly formatted content — are automatically skipped.

Run a job

  1. On the job page, click Run in the upper-right corner.

  2. In the Run Job dialog box, select a resource group and the cluster, then click OK.

  3. View run details in the lower panel of the job page:

    • Log tab: operational logs for the current run. 运行日志

    • Records tab: execution records across all job instances.

    • Click Details in the Action column of a job instance to open the Scheduling Center tab, where you can inspect instance-level details.

More operations

Right-click a job in the Edit Job pane to access the following operations.

OperationDescription
Clone JobCopies the job's configuration and creates a new job in the same folder.
Rename JobRenames the job.
Delete JobDeletes the job. A job can only be deleted if it is not associated with a workflow, or if the associated workflow is neither running nor being scheduled.

Job submission modes

When you submit a Spark job, EMR uses a launcher process (spark-submit) to start it. This launcher typically consumes more than 600 MiB of memory. The Memory (MB) parameter in Job Settings sets the memory allocated to the launcher.

EMR supports two submission modes. Choose based on cluster stability requirements and master node resource usage.

ModeHow it worksWhen to use
Worker NodeThe launcher runs on a core node inside a YARN container, monitored by YARN.Preferred for most workloads. Reduces resource pressure on the master node and keeps job processes under YARN management.
Header/Gateway NodeThe launcher runs on the master node as a standalone process, outside YARN monitoring.Use only if YARN-managed submission is not suitable. Running many jobs in this mode risks master node stability.

Memory breakdown

The total memory consumed by a job instance is:

Memory consumed by a job instance = Memory consumed by the launcher + Memory consumed by a job

For Spark jobs specifically:

Memory consumed by a job = Memory consumed by the spark-submit logical module (not the process) + Memory consumed by the driver + Memory consumed by the executor

The process that hosts the driver depends on how Spark is launched on YARN.

Spark launch modeWhere spark-submit and driver runNotes
yarn-client mode (LOCAL submission)Driver runs in the same process as spark-submit.The process runs on the master node, outside YARN monitoring.
yarn-client mode (YARN submission)Same process as above, but runs on a core node.The process occupies a YARN container and is monitored by YARN.
yarn-cluster modeDriver runs in a separate process from spark-submit.The driver occupies a YARN container.

What's next