MaxCompute (formerly known as ODPS) supports MapReduce API, whose Java API can be used to write MapReduce program for processing MaxCompute data. You can create ODPS_MR nodes and use them in Task Scheduling. For how to edit and use the MaxCompute MR, see the WordCount sample of MaxCompute documentation.
After the newly created MaxCompute MapReduce program is uploaded as a resource to MaxCompute, an ODPS_MR node must be created to run the program. See the following detailed instructions:
On the Data Development page, click New > Create Task in the toolbar.
Complete the configurations in the New Task dialog box.
You must run the JAR command in both the MaxCompute console and the Alibaba Cloud big data platform. Therefore, generate the mapreduce_examples.jar package by using the Export function of Eclipse or other tools such as Ant, and then upload the package to the MaxCompute resource.
Select Resource in the left-side navigation pane, and click Upload.
Complete the configurations in the Upload Resource dialog box. Note that the Upload as ODPS resource checkbox must be selected.
For notes on uploading resources, see Resource management.
For the JAR package in the sample, see mapreduce_examples.jar.
Details of the ODPS_MR node are as follows.
--Create an input table
CREATE TABLE if not exists jingyan_wc_in (key STRING, value STRING);
--Create an output table
CREATE TABLE if not exists jingyan_wc_out (key STRING, cnt BIGINT);
---Create system dual
drop table if exists dual;
create table dual(id bigint); --If the project does not have the pseudo table, create the table and initialize data.
---Initialize data to the system pseudo table
insert overwrite table dual select count(*)from dual;
---Insert sample data to the input table wc_in
insert overwrite table jingyan_wc_in select * from (
select 'project','val_pro' from dual
select 'problem','val_pro' from dual
select 'package','val_a' from dual
select 'pad','val_a' from dual
-- Reference the newly uploaded JAR package (Find the package in the Resource Management pane and double-click to reference it.)
jar -libjars mapreduce_examples.jar -classpath ./mapreduce_examples.jar com.aliyun.odps.mapreduce.examples.WordCount jingyan_wc_in jingyan_wc_out
Click Run in the Data Development page.
Access IP is not required in this sample. Click Continue Running.
You can query the data of the output table in SQL script.