Configure a Hadoop MapReduce job in the EMR Data Platform to submit MapReduce workloads to your cluster.
Prerequisites
Before you begin, ensure that you have:
A project created in E-MapReduce (EMR). See Manage projects
Step 1: Open the job editor
Log on to the Alibaba Cloud EMR console.
In the top navigation bar, select the region where your cluster resides and select a resource group.
Click the Data Platform tab.
In the Projects section, find your project and click Edit Job in the Actions column.
Step 2: Create a MapReduce job
In the Edit Job pane on the left, right-click the folder where you want to create the job and select Create Job.
In the Create Job dialog box, fill in the following fields:
Field Description Name Enter a name for the job. Description (Optional) Enter a description. Job Type Select MR to create a Hadoop MapReduce job. Click OK.
Step 3: Configure the job content
In the Content field, enter the command-line parameters for your job. Start from the argument that comes after hadoop jar — do not include hadoop jar itself.
The full command format is:
hadoop jar <jar-file-path> [MainClass] -D <key>=<value> ...In the Content field, enter everything after hadoop jar:
<jar-file-path> [MainClass] -D <key>=<value> ...Example: Sleep job (no data input or output)
The sleep job submits mapper and reducer tasks that sleep for a specified period, without reading or writing data. In Hadoop 2.6.0, it is packaged in hadoop-mapreduce-client-jobclient-2.6.0-tests.jar.
Full submission command:
hadoop jar /path/to/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar sleep -m 3 -r 3 -mt 100 -rt 100Content field entry (omit hadoop jar):
/path/to/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar sleep -m 3 -r 3 -mt 100 -rt 100The parameters -m 3 -r 3 -mt 100 -rt 100 configure 3 mappers, 3 reducers, and a 100 ms sleep time for each task.
Example: Job with OSS input and output paths
For jobs that read or write data, specify the input and output paths. EMR supports both Hadoop Distributed File System (HDFS) and OSS paths. To use OSS, set the paths to OSS paths:
jar ossref://emr/checklist/jars/chengtao/hadoop/hadoop-mapreduce-examples-2.6.0.jar randomtextwriter -D mapreduce.randomtextwriter.totalbytes=320000 oss://emr/checklist/data/chengtao/hadoop/Wordcount/InputStep 4: Save the job
Click Save.
What's next
Schedule the job in a workflow to automate execution.
Associate your project with a cluster to run the job.