Hadoop MapReduce

Last Updated: Dec 28, 2017

  1. Log on to Alibaba Cloud E-MapReduce Console Job List.

  2. Click Create a job on the upper right corner of this page to enter the job creating page.

  3. Enter the job name.

  4. Select Hadoop job type to create a Hadoop Mapreduce job. This type of job is Hadoop job submitted in the background by using the following process.

    1. hadoop jar xxx.jar [MainClass] -Dxxx ....
  5. Fill in the Parameters with command line parameters required to submit this job. Note that the content to be filled in this option box must be started with the first parameter after “hadoop jar”. That is to say, in the option box, the address of the jar package required to run this job is the first to be filled in with, followed by [MainClass] and other command line parameters you can provide on your own.

    For example, if you want to submit a Hadoop sleep job which doesn’t write/read any data, this job will succeed just by submitting mapper reducer tasks to the cluster and waiting for each task to sleep for a while. In Hadoop (such as hadoop-2.6.0), this job is packaged in hadoop-mapreduce-client-jobclient-2.6.0-tests.jar of the Hadoop release version. If this job is submitted from the command line, the command will be:

    1. hadoop jar /path/to/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar sleep -m 3 -r 3 -mt 100 -rt 100

    To configure this job in E-MapReduce, the content to be filled in the option box of Parameters on the configuration page will be:

    1. /path/to/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar sleep -m 3 -r 3 -mt 100 -rt 100

    Note that since the path for the jar package here is an absolute path of E-MapReduce host machine, there may be a problem - the user may store these jar packages under any path and these jar packages will be released and not available as the cluster is created and released. Therefore, follow these steps:

    1. You shall upload your jar packages to the OSS bucket for storage; when configuring Hadoop parameters, click Select OSS path to select jar packages to be performed from the OSS directory. The system will complete the OSS address for jar packages automatically. Switch the jar prefix of the code to “ossref” (click Switch resource type) to guarantee this jar package can be downloaded correctly by E-MapReduce.

    2. Click OK to automatically fill in the option box of Parameters with OSS path of this package. When a job is submitted, the system will find the corresponding jar packages automatically as per this path.

    3. You can further complete other command line parameters for job running behind the jar package path of this OSS.

  6. Select the policy for failed operations.

  7. Click OK to complete job configuration definition.

In the above example, sleep job has no data input/output. If the job needs to read data and process input results (e.g. wordcount), the data input and output paths are required to be specified. You can read/write the data on HDFS of E-MapReduce cluster as well as on OSS. To read/write the data on OSS, just write the data path as the OSS path when filling in input and output paths. For example:

```shelljar ossref://emr/checklist/jars/chengtao/hadoop/hadoop-mapreduce-examples-2.6.0.jar randomtextwriter -D mapreduce.randomtextwriter.totalbytes=320000 oss://emr/checklist/data/chengtao/hadoop/Wordcount/Input

Thank you! We've received your feedback.