edit-icon download-icon

Hadoop MapReduce

Last Updated: May 04, 2018

Hadoop MapReduce job configuration

  1. Log on to Alibaba Cloud E-MapReduce Console Job List.

  2. Click Create a job on the upper right corner to enter the job creating page.

  3. Enter the job name.

  4. Select Hadoop job type to create a Hadoop Mapreduce job. This type of job is Hadoop job submitted in the background by using the following process:

    1. hadoop jar xxx.jar [MainClass] -Dxxx ....
  5. Enter the Parameters with command line parameters required to submit this job. Note that the content to be filled in this option box must start with the first parameter after hadoop jar. That is, in the option box, the address of the jar package required to run this job is the first to be entered, followed by MainClass and other command line parameters you can provide.

    Suppose you want to submit a Hadoop sleep job. This job does not write/read any data. It succeeds by submitting mapper and reducer tasks to the cluster, and waiting for each task to sleep for a while. In Hadoop (such as hadoop-2.6.0), this job is packaged in hadoop-mapreduce-client-jobclient-2.6.0-tests.jar of the Hadoop release version. If this job is submitted from the command line, the command is as follows:

    1. hadoop jar /path/to/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar sleep -m 3 -r 3 -mt 100 -rt 100

    To configure this job in E-MapReduce, the content to be entered in the option box of Parameters on the configuration page is as follows:

    1. /path/to/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar sleep -m 3 -r 3 -mt 100 -rt 100

    Note that since the path for the jar package here is an absolute path of E-MapReduce host machine, problem may occur. The user may store these jar packages under any path, and as the cluster is created and released, these jar packages are released and become unavailable. Therefore, follow these steps:

    1. You must upload your jar packages to the OSS bucket for storage. When configuring Hadoop parameters, click Select OSS path to select jar packages to run from the OSS directory. The system will complete the OSS address for jar packages automatically. Switch the jar prefix of the code to ossref (click Switch resource type) to guarantee this jar package is properly downloaded by E-MapReduce.

    2. Click OK to automatically enter the option box of Parameters with OSS path of this package. When a job is submitted, the system finds the corresponding jar packages automatically.

    3. You can further complete other command line parameters for job running behind the jar package path of this OSS.

  6. Select the policy for failed operations.

  7. Click OK to complete job configuration.

In the preceding example, sleep job has no input/output data. If the job must read data and process input results (such as wordcount), the data input and output paths must be specified. You can read/write the data on HDFS of E-MapReduce cluster or on OSS. To read/write data on OSS, when entering input and output paths, write the data path as the OSS path. For example:

  1. jar ossref://emr/checklist/jars/chengtao/hadoop/hadoop-mapreduce-examples-2.6.0.jar randomtextwriter -D mapreduce.randomtextwriter.totalbytes=320000 oss://emr/checklist/data/chengtao/hadoop/Wordcount/Input
Thank you! We've received your feedback.