edit-icon download-icon

Develop MapReduce

Last Updated: Apr 12, 2018

After the MaxCompute Java module is created, MR can be developed.

Develop the MR program

  1. Right-click the module source code directory src > main, select New, and select MaxCompute Java.

  2. Create Driver, Mapper, and Reducer.

image

  1. Set the input/output table and Mapper/Reducer class. The framework code is automatically filled in the template.

image

Debug the MR program

After the MR program is developed, test your code and check whether it meets the expectations. The following two methods are supported:

Unit test (UT): There are WordCount UT examples in the examples directory. You can refer to them to compile your UT.

image

Local MR running: During local running, the running data source must be specified. The following two methods are provided to set the test data source:

  • MaxCompute Studio uses the Tunnel Service to automatically download table data of a specific MaxCompute project to the warehouse directory. By default, 100 data records are downloaded. If more data is required for testing, use the Tunnel Command of the console or table downloading function of MaxCompute Studio.

  • Provide the mock project (example_project) and table data. You can see example_project in warehouse to set it by yourself.

  1. Run the MR program. Right-click the Driver class and select Run. In the displayed Run Configuration dialog box, configure the MaxCompute project on which the MR program runs.image

  2. Click OK. If table data of the specified MaxCompute project is not downloaded to warehouse, download data first. If a mock project is used or the MaxCompute project table data is downloaded, skip this step. Then, the MR local run framework reads specified table data in warehouse as the MR input and runs the MR program locally. You can view log output and result display on the console.

image

Run the MR program in the production environment

After local debugging is complete, release the MR program to the server and run it in the MaxCompute distributed environment.

  1. Package the MR program to a JAR package and release it to the server. For more information, see How to package and release MR.

  2. Use the MaxCompute console integrated with MaxCompute Studio in seamless mode, that is, in the Project Explorer window, right-click Project and select Open in Console, and input the commands similar to the following JAR command in the console command line:

    1. jar -libjars wordcount.jar -classpath D:\odps\clt\wordcount.jar com.aliyun.odps.examples.mr.WordCount wc_in wc_out;
Thank you! We've received your feedback.