edit-icon download-icon

ODPS_MR task

Last Updated: Apr 10, 2018

MaxCompute (formerly known as ODPS) supports MapReduce API, whose Java API can be used to write MapReduce program for processing MaxCompute data. You can create ODPS_MR nodes and use them in Task Scheduling. For how to edit and use the MaxCompute MR, see the WordCount sample of MaxCompute documentation.

Create an ODPS_MR node

After the newly created MaxCompute MapReduce program is uploaded as a resource to MaxCompute, an ODPS_MR node must be created to run the program. See the following detailed instructions:

  1. On the Data Development page, click New > Create Task in the toolbar.

  2. Complete the configurations in the New Task dialog box.

  3. Click Create.

Add a resource

You must run the JAR command in both the MaxCompute console and the Alibaba Cloud big data platform. Therefore, generate the mapreduce_examples.jar package by using the Export function of Eclipse or other tools such as Ant, and then upload the package to the MaxCompute resource.

  1. Select Resource in the left-side navigation pane, and click Upload.

    Upload

  2. Complete the configurations in the Upload Resource dialog box. Note that the Upload as ODPS resource checkbox must be selected.

  3. Click Submit.

    Note:

    For notes on uploading resources, see Resource management.

    For the JAR package in the sample, see mapreduce_examples.jar.

Edit an ODPS_MR node

Details of the ODPS_MR node are as follows.

  1. --Create an input table
  2. CREATE TABLE if not exists jingyan_wc_in (key STRING, value STRING);
  3. --Create an output table
  4. CREATE TABLE if not exists jingyan_wc_out (key STRING, cnt BIGINT);
  5. ---Create system dual
  6. drop table if exists dual;
  7. create table dual(id bigint); --If the project does not have the pseudo table, create the table and initialize data.
  8. ---Initialize data to the system pseudo table
  9. insert overwrite table dual select count(*)from dual;
  10. ---Insert sample data to the input table wc_in
  11. insert overwrite table jingyan_wc_in select * from (
  12. select 'project','val_pro' from dual
  13. union all
  14. select 'problem','val_pro' from dual
  15. union all
  16. select 'package','val_a' from dual
  17. union all
  18. select 'pad','val_a' from dual
  19. ) b;
  20. -- Reference the newly uploaded JAR package (Find the package in the Resource Management pane and double-click to reference it.)
  21. --@resource_reference{"mapreduce_examples.jar"}
  22. jar -libjars mapreduce_examples.jar -classpath ./mapreduce_examples.jar com.aliyun.odps.mapreduce.examples.WordCount jingyan_wc_in jingyan_wc_out

Note:

When an MR calls multiple JAR resources, classpath must be written in the following format: -classpath http://xxxxx1, http://xxxxx2. That is, two paths must be separated by a comma.

Run ODPS_MR

  1. Click Run in the Data Development page.

  2. Access IP is not required in this sample. Click Continue Running.

View results

You can query the data of the output table in SQL script.

Thank you! We've received your feedback.