This section is to introduce how to run the example program ‘MapReduce WordCount’ rapidly after the MaxCompute console has already been installed. The users who use Maven can search “odps-sdk-mapred” from Maven Library to get different versions of Java SDK. The configuration is shows as follows:
Next we will introduce the operation step by step.
1.Create input and output tables and upload the data. For the SQL statement to create table, see CREATE TABLE:
CREATE TABLE wc_in (key STRING, value STRING);
CREATE TABLE wc_out (key STRING, cnt BIGINT);
-- Create input table and output table
2.Use Tunnel Commands to upload data:
tunnel u kv.txt wc_in
-- Upload example data
The data in kv.txt is shown as follows:
You can also insert data directly by SQL statement as follows:
insert into table wc_in select '238',' val_238' from (select count(*) from wc_in) a;
3.Write MapReduce program and compile it.
ODPS provides a convenient Eclipse development plug-in for the user, to facilitate the users to develop MapReduce program quickly and provides local debugging MapReduce function.
User needs to create a MaxCompute project in Eclipse and then write MapReduce program. After local debugging is passed, upload the compiled program to ODPS. For more information, please see MapReduce Eclipse Plug-in.
4.Add .jar package into the project. (for example, the name of jar package is “word-count-1.0.jar”):
add jar word-count-1.0.jar;
5.Run “-jar” command on MaxCompute console:
jar -resources word-count-1.0.jar -classpath /home/resources/word-count-1.0.jar
com.taobao.jingfan.WordCount wc_in wc_out;
6.Check the running result on ODPS console:
select * from wc_out;
- If other resources are used in java program, make sure to add ‘-resources’ parameters. For more information about jar commands, please see Jar Commands.