MaxCompute supports the MapReduce API. You can create and commit ODPS MR nodes that call the Java API operations of MapReduce to develop MapReduce programs for processing data in MaxCompute.

Prerequisites

Required resources are uploaded and committed.

For more information about how to edit and use an ODPS MR node, see WordCount.

Procedure

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Create a JAR resource.
    1. On the Data Development tab, move the pointer over the Create icon icon and choose MaxCompute > Resource > JAR.
      Alternatively, you can click a workflow in the Business process section, right-click MaxCompute, and then choose New > Resource > JAR.
    2. In the New resource dialog box, set the Resource Name and Destination folder parameters.
      Note
      • If multiple MaxCompute compute engines are bound to the current workspace, you must select one from the MaxCompute Engine instance drop-down list.
      • If the selected JAR package has been uploaded from the MaxCompute client, clear Upload as an ODPS resource. If you do not clear it, an error will occur during the upload process.
      • The resource name can be different from the name of the uploaded file.
      • A resource name can contain letters, digits, underscores (_), and periods (.), and is not case-sensitive. It must be 1 to 128 characters in length. A JAR resource name must end with .jar, and a Python resource name must end with .py.
    3. Click Click Upload, select a local JAR package, and then click Open.
      In this example, upload the mapreduce_example.jar package.
    4. Click Confirm.
    5. Click the Save icon and Submit icon icons in the toolbar to save and commit the resource to the development environment.
  3. Create an ODPS MR node.
    1. On the Data Development tab, move the pointer over the Create icon icon and choose MaxCompute > ODPS MR.
      Alternatively, you can click a workflow in the Business process section, right-click MaxCompute, and then choose New > ODPS MR.
    2. In the New node dialog box, set the Node name and Destination folder parameters.
      Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
    3. Click Submit.
  4. On the node configuration tab, enter the following sample code:
    -- Create an input table.
    CREATE TABLE if not exists jingyan_wc_in (key STRING, value STRING);
    -- Create an output table.
    CREATE TABLE if not exists jingyan_wc_out (key STRING, cnt BIGINT);
        --- Create the dual table.
        drop table if exists dual;
        create table dual(id bigint); -- Create the dual table if no dual table exists in the current workspace.
        --- Initialize the dual table.
        insert overwrite table dual select count(*)from dual;
        --- Insert the sample data to the wc_in table.
        insert overwrite table jingyan_wc_in select * from (
        select 'project','val_pro' from dual 
        union all 
        select 'problem','val_pro' from dual
        union all 
        select 'package','val_a' from dual
        union all 
        select 'pad','val_a' from dual
          ) b;
    -- Reference the created JAR resource. You can find the JAR resource in the resource list, right-click the JAR resource, and then select Reference Resources to reference the resource.
    --@resource_reference{"mapreduce-examples.jar"}
    jar -resources mapreduce-examples.jar -classpath ./mapreduce-examples.jar com.aliyun.odps.mapred.open.example.WordCount jingyan_wc_in jingyan_wc_out
    Code description:
    • --@resource_reference: references a resource. Find the resource to be referenced in the resource list, right-click it, and then select Reference Resources to generate the reference statement.
    • -resources: the name of the referenced JAR resource.
    • -classpath: the path of the referenced JAR resource. You need to enter only ./Resource name because the resource has been referenced.
    • com.aliyun.odps.mapred.open.example.WordCount: the name of the main class in the JAR resource to be called during node running. The main class name must be the same as that in the JAR resource.
    • jingyan_wc_in: the name of the input table of the ODPS MR node. The input table is created by using the preceding code.
    • jingyan_wc_out: the name of the output table of the ODPS MR node. The output table is created by using the preceding code.
    • If you reference multiple JAR resources in an ODPS MR node, separate the resource paths with commas (,), for example, -classpath ./xxxx1.jar,./xxxx2.jar.
  5. On the node configuration tab, click the Scheduling configuration tab in the right-side navigation pane. On the Scheduling configuration tab, set the scheduling properties for the node. For more information, see Basic properties.
  6. Save and commit the node.
    Notice You must set the Rerun and Parent Nodes parameters before you can commit the node.
    1. Click the Save icon in the toolbar to save the node.
    2. Click the Commit icon in the toolbar.
    3. In the Commit Node dialog box, enter your comments in the Change description field.
    4. Click OK.
    In a workspace in standard mode, you must click Deploy in the upper-right corner after you commit the node. For more information, see Deploy nodes.
  7. Test the node. For more information, see View auto triggered nodes.