You can create jobs to develop tasks in a project. E-MapReduce (EMR) supports the following types of jobs in data development: Shell, Hive, Hive SQL, Spark, Spark SQL, Spark Shell, Spark Streaming, MapReduce, Sqoop, Pig, Flink, Streaming SQL, Presto SQL, and Impala SQL.
Create a job
- You have logged on to the Alibaba Cloud EMR console by using your Alibaba Cloud account.
- In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
- Click the Data Platform tab.
- In the Projects section of the page that appears, find the project that you want to edit and click Edit Job in the Actions column.
- In the Edit Job pane on the left, right-click the folder on which you want to perform operations and select Create Job.
- In the Create Job dialog box, specify Name and Description, and select a specific job type from the Job Type drop-down list. Note The job type cannot be changed after the job is created.
- Click OK. Note You can also right-click the folder and select Create Subfolder, Rename Folder, or Delete Folder to perform the corresponding operation.
Run a job
In the Edit Job pane, click a job. In the upper-right corner of the page, click Run.
After you run a job, view the operational log on the Records tab in the lower part of the job page.
Click Details in the Action column that corresponds to a job instance to go to the Scheduling Center tab. On this tab, you can view detailed information about the job instance.
Available operations on jobs
- Clone Job: Clone the configurations of a job that already exists in the same folder.
- Rename Job: Rename a job.
- Delete Job: Delete a job. You can delete a job only when the job is not associated with a workflow or the associated workflow is not running or being scheduled.
Job submission modes
The spark-submit process, which is the launcher in a data development module, is used to submit Spark jobs. This process typically occupies more than 600 MiB of memory. The Memory (MB) parameter in the Job Settings panel specifies the size of the memory allocated to the launcher.
- Header/Gateway Node: In this mode, the spark-submit process runs on the master node and is not monitored by YARN. The spark-submit process requests a large amount of memory. A large number of jobs consume many resources of the master node, which causes an unstable cluster.
- Worker Node: In this mode, the spark-submit process runs on a core node, occupies a YARN container, and is monitored by YARN. This mode reduces the resource usage on the master node.
- If Spark applications are launched in yarn-client mode, the driver runs in the same process as spark-submit. If you submit a job in LOCAL mode, the process runs on the master node and is not monitored by YARN. If you submit a job in YARN mode, the process runs on a core node, occupies a YARN container, and is monitored by YARN.
- If Spark applications are launched in yarn-cluster mode, the driver runs in a separate process and occupies a YARN container. In this case, the driver and spark-submit run in different processes.
To sum up, the job submission mode determines whether the spark-submit process runs on the master or core node, and whether the spark-submit process is monitored by YARN. Whether the driver and spark-submit run in the same process depends on the launching mode of Spark applications, which can be yarn-client or yarn-cluster.
!!! @<Annotation name>: <Annotation content>
!!!) that start an annotation. Add one annotation in a line.
|rem||Adds a comment.||
|env||Adds an environment variable.||
|var||Adds a custom variable.||
|resource||Adds a resource file.||
|sharedlibs||Adds dependency libraries. This annotation is valid only in Streaming SQL jobs. Separate multiple dependency libraries with commas (,).||
|scheduler.queue||Specifies the queue to which the job is submitted.||
|scheduler.vmem||Specifies the memory required to run the job. Unit: MiB.||
|scheduler.vcores||Specifies the number of vCores required to run the job.||
|scheduler.priority||Specifies the priority of the job. Valid values: 1 to 100.||
|scheduler.user||Specifies the user who submits the job.||
- Invalid annotations are automatically skipped. For example, an unknown annotation or an annotation whose content is in an invalid format will be skipped.
- Job parameters specified in annotations take precedence over job parameters specified in the Job Settings panel. If a parameter is specified both in an annotation and in the Job Settings panel, the parameter setting specified in the annotation takes effect.