This topic describes how to write a MapReduce program by using MaxCompute Studio, generate a JAR file, and then run a MapReduce job on the MaxCompute client. A WordCount MapReduce job is used in this topic.

Prerequisites

Make sure that the following requirements are met:

  • The MaxCompute client is installed and configured.

    For more information about how to install and configure the MaxCompute client, see Install and configure the MaxCompute client.

  • MaxCompute Studio is installed and connected to the MaxCompute project that you want to use.

    For more information about how to install MaxCompute Studio and connect it to a MaxCompute project, see Install MaxCompute Studio and Manage project connections.

  • The source data file is prepared and saved to your computer.

    The sample file in this topic is data.txt, whose content is hello,odps. You can prepare such a file and save it to the bin directory of the MaxCompute client.

Precautions

If you want to use Maven to develop a MapReduce program, you can search for odps-sdk-mapred, odps-sdk-commons, and odps-sdk-core in the Maven Central Repository to obtain different versions of SDK for Java. The following dependencies must be configured in the pom.xml file:
<dependency>
    <groupId>com.aliyun.odps</groupId>
    <artifactId>odps-sdk-mapred</artifactId>
    <version>0.36.4-public</version>
</dependency>
<dependency>
    <groupId>com.aliyun.odps</groupId>
    <artifactId>odps-sdk-commons</artifactId>
    <version>0.36.4-public</version>
</dependency>
<dependency>
    <groupId>com.aliyun.odps</groupId>
    <artifactId>odps-sdk-core</artifactId>
    <version>0.36.4-public</version>
</dependency>

Step 1: Develop a MapReduce program

Write, run, and debug a MapReduce program by using MaxCompute Studio.

  1. Create a MaxCompute Java module.
    1. Start IntelliJ IDEA. In the top navigation bar, choose File > New > Module.
    2. In the left-side navigation pane of the New Module dialog box, click MaxCompute Java.
    3. Specify Module SDK and click Next.
    4. Enter a module name, such as mapreduce, in the Module name field and click Finish.
  2. Write, run, and debug a WordCount MapReduce program.
    1. In the Project pane, expand your MaxCompute Java module and choose src > main > java. Then, right-click java and choose New > MaxCompute Java.
    2. In the Create new MaxCompute java class dialog box, click Driver, enter the name of the MaxCompute Java class that you want to create in the Name field, and then press Enter. In this example, the name of the MaxCompute Java class is WordCount.
      Create a Java class
    3. In the code editor for WordCount.java, write a WordCount MapReduce program to count the number of words.
      For the complete WordCount sample code, see WordCount.
    4. In the left-side navigation pane, right-click WordCount.java and select Run.
    5. In the Run/Debug Configurations dialog box, set MaxCompute project to the MaxCompute project that you want to use.
      Configure project information
    6. Click OK to run and debug the WordCount.java script. Make sure that the script is compiled as expected.

Step 2: Generate and upload a MapReduce JAR file

Package the compiled WordCount.java script into a JAR file and upload the file to the MaxCompute project.

  1. In the left-side navigation pane of IntelliJ IDEA, right-click WordCount.java and select Deploy to server.
  2. In the Package a jar and submit resource dialog box, configure the parameters and click OK to package and upload the script.
    Package the script

    For more information about the parameters, see Package the code.

    Note If you use Maven to develop the MapReduce program, after you package the script into a JAR file, you must manually upload the JAR file to your MaxCompute project from the MaxCompute client. For more information about how to upload a JAR file, see Add resources. Example:
    add jar mapreduce-1.0-SNAPSHOT.jar;

Step 3: Run a MapReduce job

Run the JAR command based on the JAR file uploaded to your MaxCompute project to run a MapReduce job.

  1. Run the MaxCompute client or open the MaxCompute client in MaxCompute Studio.
    The MaxCompute client is integrated in MaxCompute Studio. You can run the MaxCompute client in MaxCompute Studio. For more information, see Integrate with MaxCompute client.
  2. Create input and output tables.
    The input table contains the source data of the MapReduce job. The output table contains the processing results of the MapReduce job. Example:
    --Create an input table named wc_in. 
    create table wc_in (key STRING, value STRING);
    --Create an output table named wc_out. 
    create table wc_out (key STRING, cnt BIGINT);
    For more information about the CREATE TABLE syntax, see Create a table.
  3. Run the Tunnel Upload command to insert data into the wc_in table.
    Example:
    tunnel upload data.txt wc_in;
    For more information about Tunnel commands, see Tunnel commands.
  4. Run the JAR command to call the uploaded JAR file and run a MapReduce job.
    Example:
    jar -resources mapreduce-1.0-SNAPSHOT.jar -classpath mapreduce-1.0-SNAPSHOT.jar com.aliyun.odps.mapred.open.example.WordCount wc_in wc_out;
    • -resources mapreduce-1.0-SNAPSHOT.jar: The -resources option specifies the name of the resource that is called by the MapReduce job. In this example, the resource is the mapreduce-1.0-SNAPSHOT.jar file that is uploaded in Step 2.
    • -classpath mapreduce-1.0-SNAPSHOT.jar : The -classpath option specifies the path of the JAR file that contains MainClass.
    • com.aliyun.odps.mapred.open.example.WordCount: MainClass defined in the MapReduce program.
    • wc_in wc_out: the input table and output table.
    For more information about the JAR command, see Syntax.
  5. Run the following command to view the result data that is written to the wc_out table:
    select * from wc_out;
    Command output:
    +------------+------------+
    | key        | cnt        |
    +------------+------------+
    | hello      | 1          |
    | odps       | 1          |
    +------------+------------+