All Products
Search
Document Center

DataWorks:MaxCompute MR node

Last Updated:Mar 26, 2026

Use a MaxCompute MR node to schedule and run MapReduce jobs written in Java within DataWorks, and integrate them with other nodes in your workflow.

Background

MaxCompute provides two MapReduce programming interfaces. MapReduce is a distributed computing framework that combines user-written business logic with built-in components to create a complete distributed program that runs concurrently on a Hadoop cluster.

  • MaxCompute MapReduce: The native interface. Delivers fast execution and rapid development without exposing the file system.

  • Extended MaxCompute MapReduce (MR2): An extension that supports more complex job scheduling. Its MapReduce implementation is consistent with the native interface.

For details on both interfaces, see MapReduce.

Prerequisites

Before you begin, ensure that you have:

Limitations

For limits that apply to MaxCompute MR nodes, see Limits.

Develop and run a MaxCompute MR node

This section walks through a WordCount example that counts the occurrences of each string in the wc_in table and writes the results to the wc_out table.

Step 1: Develop the MR code

  1. Upload, submit, and publish the mapreduce-examples.jar resource. See Resource management.

    For details on the internal logic of mapreduce-examples.jar, see the WordCount example.
  2. In the MaxCompute MR node editor, enter the following code:

    ParameterDescription
    --@resource_referenceAuto-generated when you right-click a resource name in the Resource section and choose Reference Resource.
    -resourcesThe name of the referenced JAR resource file.
    -classpathThe path to the JAR file. When a resource is referenced, the path is always ./ followed by the JAR filename (for example, ./mapreduce-examples.jar).
    com.aliyun.odps.mapred.open.example.WordCountThe main class in the JAR file called at execution. This must exactly match the main class defined in the JAR file.
    wc_inThe input table for the MapReduce job.
    wc_outThe output table for the MapReduce job.
    -- Create the input table.
    CREATE TABLE IF NOT EXISTS wc_in (key STRING, VALUE STRING);
    -- Create the output table.
    CREATE TABLE IF NOT EXISTS wc_out (key STRING, cnt BIGINT);
        --- Create the system dual table.
        DROP TABLE IF EXISTS dual;
        CREATE TABLE dual(id BIGINT); -- If the pseudo table does not exist in the workspace, you must create it and initialize its data.
        --- Initialize data in the system pseudo table.
        INSERT OVERWRITE TABLE dual SELECT count(*) FROM dual;
        --- Insert sample data into the input table wc_in.
        INSERT OVERWRITE TABLE wc_in SELECT * FROM (
        SELECT 'project','val_pro' FROM dual
        UNION ALL
        SELECT 'problem','val_pro' FROM dual
        UNION ALL
        SELECT 'package','val_a' FROM dual
        UNION ALL
        SELECT 'pad','val_a' FROM dual
          ) b;
    -- Reference the JAR resource that you uploaded. Right-click the resource name in the Resource section and choose Reference Resource to auto-generate this statement.
    --@resource_reference{"mapreduce-examples.jar"}
    jar -resources mapreduce-examples.jar -classpath ./mapreduce-examples.jar com.aliyun.odps.mapred.open.example.WordCount wc_in wc_out

    The key parameters in the jar command are:

Multiple JAR files

If the job depends on more than one JAR file, list all paths in -classpath, separated by commas:

-classpath ./xxxx1.jar,./xxxx2.jar

Step 2: Run the MR job

  1. In the Run Configuration section on the right, set the Compute Engine Instance, Compute Quota, and Resource Group.

    To access a data source on the Public Network or in a Virtual Private Cloud (VPC), use a scheduling resource group that can connect to the data source. See Network connectivity solutions.
  2. In the toolbar, click Run. In the Parameters dialog box, select your MaxCompute data source and run the job.

Step 3: (Optional) Verify the results

Query the output table using a MaxCompute SQL node:

SELECT * FROM wc_out;

Expected output:

+------------+------------+
| key        | cnt        |
+------------+------------+
| package    | 1          |
| pad        | 1          |
| problem    | 1          |
| project    | 1          |
| val_a      | 2          |
| val_pro    | 2          |
+------------+------------+

Configure scheduling and deploy

After developing and testing the node in the editor, complete the following steps to put it into production:

  1. Configure scheduling: If the node runs periodically, set its scheduling properties. See Node scheduling configuration.

  2. Deploy the node: Deploy the node to make it available for scheduling. See Node and workflow deployment.

  3. Monitor execution: After deployment, track the job status in Operation Center. See Getting started with Operation Center.

What's next

Explore more MaxCompute MR job examples for different scenarios:

For solutions to common issues, see FAQ about MaxCompute MapReduce.