This topic describes the ODPS MR node functions. The MaxCompute supports MapReduce programming APIs. You can use the Java API provided by MapReduce to write MapReduce programs for processing data in MaxCompute. You can create ODPS MR nodes and use them in Task Scheduling.

For more information about how to edit and use the ODPS MR, see the examples in the MaxCompute documentation WordCount examples.

To use an ODPS MR node, you must upload and release the resource for usage, and then create the ODPS MR node.

Create a resource instance

  1. Right-click Business Flow under Data Development, and select Create Business Flow.

  2. Right-click Resource, and select Create Resource > JAR.

  3. Enter the resource name in Create Resource according to the naming convention, and set the resource type to JAR, and then select a local JAR package.

    Note
    • If this JAR package has been uploaded to the ODPS client, you must deselect Upload to ODPS. Otherwise, an error will be reported during the upload process.
    • The resource name is not always the same as the uploaded file name.
    • The resource name can be 1 to 128 characters in length, and include letters, numbers, underscores (_), and periods (.). It is case insensitive. The resource file extension is .jar if the resource is a JAR resource, and .py for a python resource.
  4. Click Submit to submit the resource to the development scheduling server.

  5. Publish a node task.

    For more information about the operation, see Release management.

Create an ODPS MR node

  1. Right-click the Business Flow under Data Development, and select Create Business Flow.

  2. Right-click Data Development, and select Create Data Development Node > ODPS MR.

  3. Edit the node code. Double click the new ODPS MR node and enter the following interface:

    The node code editing example as follows:
    jar -resources base_test.jar -classpath ./base_test.jar com.taobao.edp.odps.brandnormalize.Word.NormalizeWordAll
    The code description as follows:
    • The code -resources base_test.jar indicates the file name of the referenced JAR resource.
    • The code -classpathis the JAR package path.
    • The code com.taobao.edp.odps.brandnormalize.Word.NormalizeWordAll indicates the main class in the JAR package is called during execution. It must be consistent with the main class name in the JAR package.

    When one MR calls multiple JAR resources, the classpath must be written as follows: -classpath ./xxxx1.jar,./xxxx2.jar, that is, two paths must be separated by a comma (,).

  4. Node scheduling configuration.

    Click the Schedule on the right of the node task editing area to go to the Node Scheduling  Configuration page. For more information about node scheduling configuration, see Scheduling configuration.

  5. Submit the node.

    After completing the configuration, click Save in the upper-left corner of the page or press Ctrl+S to submit (and unlock) the node in the development environment.

  6. Publish a node task.

    For more information about the operation, see Release management.

  7. Test in the production environment.

    For more information about the operation, see Cyclic task.