MaxCompute supports the MapReduce API. You can create and commit ODPS MR nodes that call the Java API operations of MapReduce to develop MapReduce programs for processing data in MaxCompute.

For more information about how to edit and use ODPS MR nodes, see the WordCount sample in the MaxCompute documentation.

Before creating ODPS MR nodes, you must upload, commit, and then deploy required resources.

Create a resource

  1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.
  2. On the DataStudio page, create a JAR resource.
    You can create a JAR resource in either of the following ways:
    • Move the pointer over the Create icon and choose MaxCompute > Resource > JAR.
    • Find the target workflow, click MaxCompute, right-click Resource, and choose Create > JAR.
  3. In the Create Resource dialog box that appears, enter the resource name and select the target folder. Then, click Upload and select the target file to upload.
    Note If multiple MaxCompute computing engines are bound to the current workspace, you must select one from the Engine Instance MaxCompute drop-down list.

    In this example, the mapreduce_example.jar file is uploaded.

    • If the selected JAR package has been uploaded from the MaxCompute client, clear Upload to MaxCompute. If you do not clear it, an error will occur during the upload process.
    • The resource name can be different from the name of the uploaded file.
    • Convention for naming resources: A resource name can contain letters (case-insensitive), digits, underscores (_), and periods (.). It must be 1 to 128 characters in length. A JAR resource name must end with .jar. A Python resource name must end with .py.
  4. Click OK.
  5. Click the Commit icon in the toolbar to commit the resource to the development environment.
  6. Deploy the node.

    For more information, see Deploy a node.

Create an ODPS MR node

  1. On the DataStudio page, find the target workflow, right-click MaxCompute, and choose Create > ODPS MR.
  2. In the Create Node dialog box that appears, set parameters and click Commit to create an ODPS MR node. On the node configuration tab that appears, edit the code of the ODPS MR node.

    After you edit the code of the ODPS MR node, you can save the code and commit the node. For more information, see ODPS MR node configuration tab.

    The sample code is as follows:
    -- Create an input table.
    CREATE TABLE if not exists jingyan_wc_in (key STRING, value STRING);
    -- Create an output table.
    CREATE TABLE if not exists jingyan_wc_out (key STRING, cnt BIGINT);
        --- Create the dual table.
        drop table if exists dual;
        create table dual(id bigint); -- Create the dual table if no dual table exists in the current workspace and initialize the table.
        --- Initialize the dual table.
        insert overwrite table dual select count(*) from dual;
        --- Insert the sample data to the wc_in table.
        insert overwrite table jingyan_wc_in select * from (
        select 'project','val_pro' from dual 
        union all 
        select 'problem','val_pro' from dual
        union all 
        select 'package','val_a' from dual
        union all 
        select 'pad','val_a' from dual
          ) b;
    -- Reference the uploaded JAR package. You can find the JAR package in the resource list, right-click the JAR resource, and select Insert Resource Path.
    jar -resources mapreduce-examples.jar -classpath ./mapreduce-examples.jar jingyan_wc_in jingyan_wc_out
    The code is described as follows:
    • --@resource_reference: references a resource. Find the target resource, right-click it, and select Insert Resource Path to generate the reference statement.
    • -resources: the name of the referenced JAR resource.
    • -classpath: the path of the JAR resource. You can simply enter ./Resource name because the resource has been referenced.
    • the main class in the JAR resource to be called during node running. It must be the same as the main class name in the JAR resource.
    • jingyan_wc_in: the name of the input table of the ODPS MR node. The input table is created in the preceding code.
    • jingyan_wc_out: the name of the output table of the ODPS MR node. The output table is created in the preceding code.
    • If you use multiple JAR resources in a single ODPS MR node, separate the resource paths with commas (,), for example, -classpath ./xxxx1.jar,./xxxx2.jar.
  3. Configure the node properties.

    Click the Properties tab in the right-side navigation pane. On the Properties tab that appears, set the relevant parameters. For more information, see Properties.

  4. Commit the node.

    After the node properties are configured, click the Save icon in the upper-left corner. Then, commit or commit and unlock the node to the development environment.

  5. Deploy the node.

    For more information, see Deploy a node.

  6. Test the node in the production environment.