All Products
Search
Document Center

MaxCompute:Develop a MapReduce program

Last Updated:Jul 11, 2023

This topic describes how to use MaxCompute Studio to develop a MapReduce program. The development process includes writing, debugging, packaging, uploading, and running a MapReduce program.

Prerequisites

The following prerequisites are met:

Write a MapReduce program

  1. In the left-side navigation pane of the Project tab, choose src > main > java, right-click java, and then choose New > MaxCompute Java.

    11
  2. Configure Name, select the Driver class, and then press Enter.

    新建Class
    • Name: the name of the MaxCompute Java class. If you have not created a package, specify this parameter in the packagename.classname format. The system automatically generates a package.

    • Select the Driver, Mapper, or Reducer class.

      Note

      You can select the Driver, Mapper, or Reducer class based on your business requirements.

      • Driver: the driver class in a MapReduce job. This class is used to build a MapReduce job to run. You can specify the Mapper and Reducer classes to run and various task configurations in the Driver class. The Driver class can be considered as the entry class of MapReduce jobs.

      • Mapper: the first stage of MapReduce data processing. In this stage, each data record is processed and the related key-value pair is generated.

      • Reducer: processes the intermediate output that is generated by the Mapper class, generates the final output, and then saves the final output in a MaxCompute table.

  3. After you create a MaxCompute Java class, develop a Java program in the editor.

    The Java template is automatically filled with the framework code. You need only to configure the input table, output table, and the Mapper and Reducer classes.

    编写程序

Run a MapReduce program on your on-premises machine to debug the program

Run the MapReduce program that you wrote on your on-premises machine to debug the program, and check whether the debugging results are as expected.

  1. Right-click the Java script that you wrote and select Run.

  2. In the Run/Debug Configurations dialog box, select the name of the MaxCompute project in which the MapReduce program runs.

    **
  3. Click OK to run the UDF.

    Note
    • The system reads data from the specified table in warehouse as the input during the local run. You can view the log output in the console.

    • If you want to use table data in a MaxCompute project, you must modify the endpoint and project name in the value of the MaxCompute project parameter. If the table data in the specified MaxCompute project is not downloaded to the warehouse directory, the data is downloaded first. If the data is already downloaded, skip this step.

Perform unit testing to debug a MapReduce program

You can write a test case based on the test case for WordCount unit testing in the examples folder.示例

Package and upload a MapReduce program

After you debug the MapReduce program that you wrote, package the MapReduce program into a JAR file and upload the file to your MaxCompute project as a resource. For more information, see Package, upload, and register a Java program.

Run a MapReduce program

Run the MapReduce program that you developed on the MaxCompute client.

  1. In the left-side navigation pane, click Project Explorer.

  2. Right-click the name of your MaxCompute project and select Open in Console.

  3. In the Console tool window, run the following command to start the MapReduce program.

    For more information about the command, see Submit a MapReduce job.

    jar -resources wordcount.jar -classpath D:\odps\clt\wordcount.jar com.aliyun.odps.examples.mr.WordCount wc_in wc_out;