MapReduce

Last Updated: Jun 22, 2016

This section is to introduce how to run the example program ‘MapReduce WordCount’ rapidly after the MaxCompute console has already been installed. The users who use Maven can search “odps-sdk-mapred” from Maven Library to get different versions of Java SDK. The configuration is shows as follows:

  1. <dependency>
  2. <groupId>com.aliyun.odps</groupId>
  3. <artifactId>odps-sdk-mapred</artifactId>
  4. <version>0.20.7</version>
  5. </dependency>

Notes:

  • To compile and run MapReduce requires JDK1.6.
  • For installing ODPS console quickly, refer to Quick Start. For the use method of ODPS client, please refer to Console;

Next we will introduce the operation step by step.

1.Create input and output tables and upload the data. For the SQL statement to create table, refer to CREATE TABLE

  1. CREATE TABLE wc_in (key STRING, value STRING);
  2. CREATE TABLE wc_out (key STRING, cnt BIGINT);
  3. -- Create input table and output table

2.Use Tunnel Commands to upload data:

  1. tunnel u kv.txt wc_in
  2. -- Upload example data

The data in kv.txt is shown as follows:

  1. 238,val_238
  2. 186,val_86
  3. 186,val_86

You can also insert data directly by SQL statement as follows:

  1. insert into table wc_in select '238',' val_238' from (select count(*) from wc_in) a;

3.Write MapReduce program and compile it.

ODPS provides a convenient Eclipse development plug-in for the user, to facilitate the users to develop MapReduce program quickly and provides local debugging MapReduce function.

User needs to create an MaxCompute project in Eclipse and then write MapReduce program. After loacl debugging is passed, upload the compiled program to ODPS. For more details, please refer to MapReduce Eclipse Plug-in.

4.Add .jar package into the project. (for example, the name of jar package is “word-count-1.0.jar”):

  1. add jar word-count-1.0.jar;

5.Run “-jar” command on MaxCompute console:

  1. jar -resources word-count-1.0.jar -classpath /home/resources/word-count-1.0.jar
  2. com.taobao.jingfan.WordCount wc_in wc_out;

6.Check the running result on ODPS console:

  1. select * from wc_out;

Note:

  • If other resources are used in java program, make sure to add ‘-resources’ parameters. For more details about jar commands, please refer to Jar Commands.
Thank you! We've received your feedback.