MapReduce

Last Updated: May 07, 2018

The following section assumes the MaxCompute console has been installed to use the example ‘MapReduce WordCount’. Maven users can search “odps-sdk-mapred” from the Maven Library to get the required SDK (available in different versions). The configuration is as follows:

  1. <dependency>
  2. <groupId>com.aliyun.odps</groupId>
  3. <artifactId>odps-sdk-mapred</artifactId>
  4. <version>0.20.7</version>
  5. </dependency>

Note:

  • To compile and run MapReduce requires JDK1.6.
  • For installing the MaxCompute console, see Quick Start, and for using the MaxCompute client, see Console;

Procedure

  1. Create input and output tables and upload the data. For the SQL statement to create table, see CREATE TABLE:

    1. CREATE TABLE wc_in (key STRING, value STRING);
    2. CREATE TABLE wc_out (key STRING, cnt BIGINT);
    3. -- Create input table and output table
  2. Use Tunnel Commands to upload data:

    1. tunnel u kv.txt wc_in
    2. -- Upload example data

    The data is shown in kv.txt as follows:

    1. 238,val_238
    2. 186,val_86
    3. 186,val_86

    You can also insert data directly using the INSERT statement as follows:

    1. insert into table wc_in select '238',' val_238' from (select count(*) from wc_in) a;
  3. Write MapReduce program and compile it.

    MaxCompute supports an Eclipse development plug-in to help quickly develop MapReduce programs and provide a local debugging MapReduce function.

    Users must create a MaxCompute project in Eclipse first, and then write the MapReduce program. After the local debugging is run successfully, users can upload the compiled program to MaxCompute. For more information, see MapReduce Eclipse Plug-in.

  4. Add .jar package into the project. (in this example, the name of the JAR package is “word-count-1.0.jar”):

    1. add jar word-count-1.0.jar;
  5. Run “-jar” command on MaxCompute console:

    1. jar -resources word-count-1.0.jar -classpath /home/resources/word-count-1.0.jar
    2. com.taobao.jingfan.WordCount wc_in wc_out;
  6. Check the running result on ODPS console:

    1. select * from wc_out;

    Note:

    If other resources are used the in java program, you must add ‘-resources’ parameters. For more information about JAR commands, see Jar Commands.

Thank you! We've received your feedback.