This topic describes how to write a MapReduce program by using MaxCompute Studio, generate a JAR file, and then run a MapReduce job on the MaxCompute client. A WordCount MapReduce job is used in this topic.
Make sure that the following requirements are met:
- The MaxCompute client is installed and configured.
For more information about how to install and configure the MaxCompute client, see Install and configure the MaxCompute client.
- MaxCompute Studio is installed and connected to the MaxCompute project that you want to use.
- The source data file is prepared and saved to your computer.
The sample file in this topic is data.txt, whose content is
hello,odps. You can prepare such a file and save it to the
bindirectory of the MaxCompute client.
<dependency> <groupId>com.aliyun.odps</groupId> <artifactId>odps-sdk-mapred</artifactId> <version>0.36.4-public</version> </dependency> <dependency> <groupId>com.aliyun.odps</groupId> <artifactId>odps-sdk-commons</artifactId> <version>0.36.4-public</version> </dependency> <dependency> <groupId>com.aliyun.odps</groupId> <artifactId>odps-sdk-core</artifactId> <version>0.36.4-public</version> </dependency>
Step 1: Develop a MapReduce program
Write, run, and debug a MapReduce program by using MaxCompute Studio.
- Create a MaxCompute Java module.
- Start IntelliJ IDEA. In the top navigation bar, choose .
- In the left-side navigation pane of the New Module dialog box, click MaxCompute Java.
- Specify Module SDK and click Next.
- Enter a module name, such as mapreduce, in the Module name field and click Finish.
- Write, run, and debug a WordCount MapReduce program.
- In the Project pane, expand your MaxCompute Java module and choose . Then, right-click java and choose .
- In the Create new MaxCompute java class dialog box, click Driver, enter the name of the MaxCompute Java class that you want to create in the Name field, and then press Enter. In this example, the name of the MaxCompute Java class is WordCount.
- In the code editor for WordCount.java, write a WordCount MapReduce program to count the number of words.
For the complete WordCount sample code, see WordCount.
- In the left-side navigation pane, right-click WordCount.java and select Run.
- In the Run/Debug Configurations dialog box, set MaxCompute project to the MaxCompute project that you want to use.
- Click OK to run and debug the WordCount.java script. Make sure that the script is compiled as expected.
Step 2: Generate and upload a MapReduce JAR file
Package the compiled WordCount.java script into a JAR file and upload the file to the MaxCompute project.
- In the left-side navigation pane of IntelliJ IDEA, right-click WordCount.java and select Deploy to server.
- In the Package a jar and submit resource dialog box, configure the parameters and click OK to package and upload the script.
For more information about the parameters, see Package the code.Note If you use Maven to develop the MapReduce program, after you package the script into a JAR file, you must manually upload the JAR file to your MaxCompute project from the MaxCompute client. For more information about how to upload a JAR file, see Add resources. Example:
add jar mapreduce-1.0-SNAPSHOT.jar;
Step 3: Run a MapReduce job
JAR command based on the JAR file uploaded to your MaxCompute project to run a MapReduce job.
- Run the MaxCompute client or open the MaxCompute client in MaxCompute Studio.
The MaxCompute client is integrated in MaxCompute Studio. You can run the MaxCompute client in MaxCompute Studio. For more information, see Integrate with MaxCompute client.
- Create input and output tables.
The input table contains the source data of the MapReduce job. The output table contains the processing results of the MapReduce job. Example:
For more information about the CREATE TABLE syntax, see Create a table.
--Create an input table named wc_in. create table wc_in (key STRING, value STRING); --Create an output table named wc_out. create table wc_out (key STRING, cnt BIGINT);
- Run the Tunnel Upload command to insert data into the wc_in table.
For more information about Tunnel commands, see Tunnel commands.
tunnel upload data.txt wc_in;
- Run the
JARcommand to call the uploaded JAR file and run a MapReduce job.Example:
jar -resources mapreduce-1.0-SNAPSHOT.jar -classpath mapreduce-1.0-SNAPSHOT.jar com.aliyun.odps.mapred.open.example.WordCount wc_in wc_out;
-resources mapreduce-1.0-SNAPSHOT.jar: The
-resourcesoption specifies the name of the resource that is called by the MapReduce job. In this example, the resource is the mapreduce-1.0-SNAPSHOT.jar file that is uploaded in Step 2.
-classpath mapreduce-1.0-SNAPSHOT.jar: The
-classpathoption specifies the path of the JAR file that contains MainClass.
com.aliyun.odps.mapred.open.example.WordCount: MainClass defined in the MapReduce program.
wc_in wc_out: the input table and output table.
JARcommand, see Syntax.
- Run the following command to view the result data that is written to the wc_out table:
select * from wc_out;
+------------+------------+ | key | cnt | +------------+------------+ | hello | 1 | | odps | 1 | +------------+------------+