All Products
Search
Document Center

DataWorks:CDH MR node

Last Updated:Mar 26, 2026

Run MapReduce (MR) jobs on an Alibaba Cloud CDH cluster by creating a CDH MR node in DataWorks. Upload a compiled JAR package as a resource, configure the compute resource and resource group, then run and schedule the node for recurring execution.

Prerequisites

Before you begin, ensure that you have:

Root account users can skip the RAM user step. Grant the Workspace Administrator role with caution — it carries extensive permissions.

Create a CDH JAR resource

Upload your compiled JAR package to DataWorks so the CDH MR node can reference it during execution.

  1. Go to Resource management and click Click to Upload to select the JAR package from your local machine.

  2. Set the following fields:

    FieldDescription
    Storage PathThe path in DataWorks where the resource is stored
    Data SourceThe data source associated with this resource
    Resource GroupThe resource group used to manage and run the resource
  3. Click Save.

Create a node

See Create a node for instructions.

Develop the node

Reference your JAR package in the node editor and add the command to run the MapReduce job.

  1. Open the CDH MR node. The code editor opens.

  2. In the Resource Management pane on the left, right-click the JAR resource and select Reference Resource. DataWorks inserts a reference statement in the following format:

    ##@resource_reference{"<jar-filename>"}
  3. Below the reference statement, add the command to run your MapReduce job. Use the following pattern:

    <jar-filename> <main-class> <input-path> <output-path>

    Example:

    ##@resource_reference{"onaliyun_mr_wordcount-1.0-SNAPSHOT.jar"}
    onaliyun_mr_wordcount-1.0-SNAPSHOT.jar cn.apache.hadoop.onaliyun.examples.EmrWordCount oss://onaliyun-bucket-2/cdh/datas/wordcount02/inputs oss://onaliyun-bucket-2/cdh/datas/wordcount02/outputs

    The bucket name and paths in this example are for illustration. Replace them with your actual OSS bucket and paths.

Run and debug the node

Configure the compute resource and resource group, then run the node to verify the job executes correctly.

  1. In the Run Configuration section, set the following fields:

    FieldDescription
    Compute ResourceSelect the CDH cluster you registered in DataWorks
    Resource GroupSelect a scheduling resource group that has network connectivity to the data source. See Network connectivity solutions for how to connect a resource group to a data source
  2. On the toolbar, click Run.

What's next

  • Schedule the node: To run the node on a recurring schedule, configure its Time Property and related scheduling properties in the Scheduling configuration panel on the right. See Node scheduling configuration.

  • Publish the node: Click the image icon to publish the node to the production environment. Only published nodes are scheduled for execution. See Publish a node.

  • Monitor runs: After publishing, monitor scheduled runs in the O&M Center. See Getting started with Operation Center.