All Products
Search
Document Center

MaxCompute:Getting started

Last Updated:Jul 17, 2023

This topic describes how to write a MapReduce program by using MaxCompute Studio, generate a JAR file, and then run a MapReduce job on the MaxCompute client. A WordCount MapReduce job is used in this topic.

Prerequisites

  • The MaxCompute client is installed and configured.

    For information about how to install and configure the MaxCompute client, see MaxCompute client (odpscmd).

  • MaxCompute Studio is installed and connected to the MaxCompute project that you want to use.

    For information about how to install MaxCompute Studio and connect it to a MaxCompute project, see Install MaxCompute Studio and Manage project connections.

  • The source data file is prepared and saved to your on-premises machine.

    In this topic, the sample file data.txt whose content is hello,odps is used. You can prepare such a file and save it to the bin directory of the MaxCompute client.

Precautions

If you want to use Maven to develop a MapReduce program, you can search for odps-sdk-mapred, odps-sdk-commons, and odps-sdk-core in the Maven Central Repository to obtain different versions of SDK for Java. The following dependencies must be configured in the pom.xml file:

<dependency>
    <groupId>com.aliyun.odps</groupId>
    <artifactId>odps-sdk-mapred</artifactId>
    <version>0.36.4-public</version>
</dependency>
<dependency>
    <groupId>com.aliyun.odps</groupId>
    <artifactId>odps-sdk-commons</artifactId>
    <version>0.36.4-public</version>
</dependency>
<dependency>
    <groupId>com.aliyun.odps</groupId>
    <artifactId>odps-sdk-core</artifactId>
    <version>0.36.4-public</version>
</dependency>

Procedure

  1. Step 1: Develop a MapReduce program

    Write, run, and debug a MapReduce program by using MaxCompute Studio.

  2. Step 2: Generate and upload a MapReduce JAR file

    Package the compiled WordCount.java script into a JAR file and upload the file to the MaxCompute project.

  3. Step 3: Run a MapReduce job

    Run the JAR command based on the JAR file uploaded to your MaxCompute project to run a MapReduce job.

Step 1: Develop a MapReduce program

  1. Create a MaxCompute Java module.

    1. Start IntelliJ IDEA. In the top navigation bar, choose File > New > Module.

    2. In the left-side navigation pane of the New Module dialog box, click MaxCompute Java.

    3. Configure Module SDK and click Next.

    4. Enter a module name, such as mapreduce, in the Module name field and click Finish.

  2. Write, run, and debug a WordCount MapReduce program.

    1. In the Project pane, expand your MaxCompute Java module and choose src > main > java. Then, right-click java and choose New > MaxCompute Java.

    2. In the Create new MaxCompute java class dialog box, click Driver, enter the name of the MaxCompute Java class that you want to create in the Name field, and then press Enter. For example,you can enter WordCount as the name.

      新建Java class
    3. In the code editor for WordCount.java, write a WordCount MapReduce program to count the number of words.

      For the complete WordCount sample code, see Sample code.

    4. In the left-side navigation pane, right-click WordCount.java and select Run.

    5. In the Run/Debug Configurations dialog box, set MaxCompute project to the MaxCompute project that you want to use.

      配置项目信息
    6. Click OK to run and debug the WordCount.java script to ensure that the script can be executed as expected.

Step 2: Generate and upload a MapReduce JAR file

  1. In the left-side navigation pane of IntelliJ IDEA, right-click WordCount.java and select Deploy to server.

  2. In the Package a jar and submit resource dialog box, configure the parameters and click OK to package and upload the script.

    打包

    For information about the parameters, see Procedure.

    Note

    If you use Maven to develop the MapReduce program, after you package the script into a JAR file, you must manually upload the JAR file to your MaxCompute project from the MaxCompute client. For information about how to upload a JAR file, see Add resources. Sample command:

    add jar mapreduce-1.0-SNAPSHOT.jar;

Step 3: Run a MapReduce job

  1. Log on to the MaxCompute client or start the MaxCompute client in MaxCompute Studio.

    The MaxCompute client is integrated in MaxCompute Studio. You can run the MaxCompute client in MaxCompute Studio. For more information, see Integrate the MaxCompute client.

  2. Create input and output tables.

    The input table contains the source data of the MapReduce job. The output table contains the processing results of the MapReduce job. Sample commands:

    --Create an input table named wc_in. 
    create table wc_in (key STRING, value STRING);
    --Create an output table named wc_out. 
    create table wc_out (key STRING, cnt BIGINT);

    For more information about the table creation syntax, see Create a table.

  3. Run the Tunnel Upload command to insert data into the wc_in table.

    Sample command:

    tunnel upload data.txt wc_in;

    For more information about Tunnel commands, see Tunnel commands.

  4. Run the JAR command to call the uploaded JAR file and run a MapReduce job.

    Sample command:

    jar -resources mapreduce-1.0-SNAPSHOT.jar -classpath mapreduce-1.0-SNAPSHOT.jar com.aliyun.odps.mapred.open.example.WordCount wc_in wc_out;
    • -resources mapreduce-1.0-SNAPSHOT.jar: The -resources option specifies the name of the resource that is called by the MapReduce job. In this example, the resource is the mapreduce-1.0-SNAPSHOT.jar file that is uploaded in Step 2.

    • -classpath mapreduce-1.0-SNAPSHOT.jar: The -classpath option specifies the path of the JAR file that contains MainClass.

    • com.aliyun.odps.mapred.open.example.WordCount: MainClass defined in the MapReduce program.

    • wc_in wc_out: the input table and output table.

    For more information about the JAR command, see Syntax.

  5. Run the following command to view the result data that is written to the wc_out table:

    select * from wc_out;

    The following result is returned:

    +------------+------------+
    | key        | cnt        |
    +------------+------------+
    | hello      | 1          |
    | odps       | 1          |
    +------------+------------+