Write a MapReduce program in MaxCompute Studio, package it as a JAR file, and run it on the MaxCompute client. This topic walks through a WordCount example that counts word occurrences in a text file.
How it works
MaxCompute MapReduce processes data through three stages:
(input) <key, value> → map → <key, value> → combine → <key, value> → reduce → <key, value> (output)For the WordCount example: the map stage splits each line into words and emits <word, 1> pairs; the reduce stage sums the counts per word and writes the result to the output table.
Prerequisites
Before you begin, ensure that you have:
The MaxCompute client (odpscmd) installed and configured. For details, see MaxCompute client (odpscmd).
MaxCompute Studio installed and connected to your MaxCompute project. For details, see Install MaxCompute Studio and Manage project connections.
A source data file saved to your local machine. This topic uses a file named
data.txtwith the contenthello,odps. Save it to thebindirectory of the MaxCompute client.
Maven SDK dependencies
To develop a MapReduce program with Maven, search for odps-sdk-mapred, odps-sdk-commons, and odps-sdk-core in the Maven Central Repository to find the required SDK for Java versions. This example uses version 0.36.4-public. Add the following dependencies to your pom.xml:
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>odps-sdk-mapred</artifactId>
<version>0.36.4-public</version>
</dependency>
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>odps-sdk-commons</artifactId>
<version>0.36.4-public</version>
</dependency>
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>odps-sdk-core</artifactId>
<version>0.36.4-public</version>
</dependency>Step 1: Develop a MapReduce program
Create a MaxCompute Java module in IntelliJ IDEA.
In the top navigation bar, choose File > New > Module.
In the New Module dialog box, select MaxCompute Java in the left-side navigation pane.
Configure Module SDK and click Next.
Enter a module name in the Module name field — for example,
mapreduce— and click Finish.
Create and write the WordCount MapReduce program.
In the Project pane, expand your MaxCompute Java module and navigate to src > main > java. Right-click java and choose New > MaxCompute Java.
In the Create new MaxCompute java class dialog box, click Driver, enter a class name in the Name field — for example,
WordCount— and press Enter.
In the code editor for
WordCount.java, write the WordCount MapReduce logic to count word occurrences. For the complete sample code, see Sample code.
Run and debug the program.
In the Project pane, right-click
WordCount.javaand select Run.In the Run/Debug Configurations dialog box, set MaxCompute project to your target project.

Click OK to run and debug the script and verify it executes as expected.
Step 2: Generate and upload a MapReduce JAR file
In the Project pane, right-click
WordCount.javaand select Deploy to server.In the Package a jar and submit resource dialog box, configure the parameters and click OK to package and upload the script. For parameter details, see Procedure.
NoteIf you developed the MapReduce program with Maven, manually upload the JAR file from the MaxCompute client after packaging. For details, see Add resources. Sample command:
add jar mapreduce-1.0-SNAPSHOT.jar;
Step 3: Run a MapReduce job
Log on to the MaxCompute client, or start it from within MaxCompute Studio. For details on the integrated client, see Integrate the MaxCompute client.
Create input and output tables. The input table holds the source data; the output table receives the processing results.
-- Create an input table named wc_in. create table wc_in (key STRING, value STRING); -- Create an output table named wc_out. create table wc_out (key STRING, cnt BIGINT);For table creation syntax, see Create a table.
Upload the source data file to
wc_in. Confirm the file content before uploading. Thedata.txtfile used in this example contains:hello,odpsRun the Tunnel Upload command:
tunnel upload data.txt wc_in;For Tunnel command reference, see Tunnel commands.
Run the
JARcommand to execute the MapReduce job.Parameter
Description
-resources mapreduce-1.0-SNAPSHOT.jarThe resource called by the MapReduce job — the JAR file uploaded in Step 2
-classpath mapreduce-1.0-SNAPSHOT.jarThe path of the JAR file that contains MainClass
com.aliyun.odps.mapred.open.example.WordCountMainClass defined in the MapReduce program
wc_in wc_outThe input table and output table
jar -resources mapreduce-1.0-SNAPSHOT.jar -classpath mapreduce-1.0-SNAPSHOT.jar com.aliyun.odps.mapred.open.example.WordCount wc_in wc_out;For
JARcommand syntax, see Syntax.Verify the results. Run the following command to query the output table:
select * from wc_out;Expected output:
+------------+------------+ | key | cnt | +------------+------------+ | hello | 1 | | odps | 1 | +------------+------------+
What's next
To learn more about the MaxCompute client, see MaxCompute client (odpscmd).
To explore MapReduce sample code in detail, see Sample code.