This topic describes how to configure a Hadoop MapReduce job.

Prerequisites

A project is created. For more information, see Manage projects.

Procedure

  1. Go to the Data Platform tab.
    1. Log on to the Alibaba Cloud EMR console by using your Alibaba Cloud account.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Data Platform tab.
  2. In the Projects section, find your project and click Edit Job in the Actions column.
  3. Create a Hadoop MapReduce job.
    1. In the Edit Job pane on the left, right-click the folder on which you want to perform operations and select Create Job.
    2. In the Create Job dialog box, specify Name and Description, and select MR from the Job Type drop-down list.
      This option indicates that a Hadoop MapReduce job will be created. You can use the following command syntax to submit a Hadoop MapReduce job:
      hadoop jar xxx.jar [MainClass] -D xxx ....
    3. Click OK.
  4. Edit job content.
    1. Specify the command line parameters required to submit the job in the Content field.
      Start from the parameter that follows hadoop jar. Enter the path of the JAR package that is used to run the job. Then, specify [MainClass] and other command line parameters.
      For example, you want to submit a Hadoop sleep job. Instead of reading and writing data, this job submits only some mapper and reducer tasks to the Hadoop cluster, and sleeps for a period of time during the execution of each task. In Hadoop 2.6.0, this job is packaged in hadoop-mapreduce-client-jobclient-2.6.0-tests.jar. You can run the following command to submit the job:
      hadoop jar /path/to/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar sleep -m 3 -r 3 -mt 100 -rt 100
      To configure this job in EMR, enter the following command in the Content field:
      /path/to/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar sleep -m 3 -r 3 -mt 100 -rt 100
      Note Click + Enter an OSS path in the lower part of the page. In the OSS File dialog box, set File Prefix to OSSREF and specify File Path. The system automatically completes the path of the Hadoop MapReduce script in OSS.
    2. Click Save.
      In the preceding example, the sleep job does not involve data input or output. To configure a job that reads data and provides processing results, such as a wordcount job, you must specify the data input and output paths.
      You can read data from and write data to HDFS or OSS in EMR. To read data from and write data to OSS, set the input and output paths to the paths in OSS. Sample code:
      jar ossref://emr/checklist/jars/chengtao/hadoop/hadoop-mapreduce-examples-2.6.0.jar randomtextwriter -D mapreduce.randomtextwriter.totalbytes=320000 oss://emr/checklist/data/chengtao/hadoop/Wordcount/Input