All Products
Search
Document Center

DataWorks:Create a CDH MR node

Last Updated:Mar 26, 2026

Use a Cloudera's Distribution Including Apache Hadoop (CDH) MapReduce (MR) node in DataWorks DataStudio to run MapReduce jobs against ultra-large datasets stored in your CDH cluster.

Prerequisites

Before you begin, ensure that you have:

Warning

The Workspace Administrator role grants broader permissions than needed for task development. Assign it only when strictly necessary.

Limitations

CDH MR tasks run on serverless resource groups or old-version exclusive resource groups for scheduling. We recommend that you use serverless resource groups.

Step 1: Create a CDH MR node

  1. Log on to the DataWorks console. In the top navigation bar, select a region. In the left-side navigation pane, choose Data Development and O\&M > Data Development. Select your workspace and click Go to Data Development.

  2. On the DataStudio page, find your workflow, right-click its name, and choose Create Node > CDH > CDH MR.

  3. In the Create Node dialog box, set the Engine Instance, Path, and Name parameters.

  4. Click Confirm.

Step 2: Create and reference a CDH JAR resource

CDH MR nodes run JAR files uploaded to DataStudio as CDH JAR resources. Create the resource first, then reference it from your node.

Create the CDH JAR resource

  1. In the DataStudio file tree, find your workflow and click CDH.

  2. Right-click Resource and choose Create Resource > CDH JAR.

  3. In the Create Resource dialog box, click Upload and select your JAR file.

    image.png

Reference the CDH JAR resource in your node

  1. Open the configuration tab of your CDH MR node.

  2. Under Resource in the CDH folder, right-click your resource name and select Insert Resource Path. A clause in the ##@resource_reference{""} format appears at the top of the editor, confirming the resource is referenced.

    image.png

  3. Write your job command below the resource reference clause. The following example runs a word count job:

    ##@resource_reference{"onaliyun_mr_wordcount-1.0-SNAPSHOT.jar"}
    onaliyun_mr_wordcount-1.0-SNAPSHOT.jar cn.apache.hadoop.onaliyun.examples.EmrWordCount oss://onaliyun-bucket-2/cdh/datas/wordcount02/inputs oss://onaliyun-bucket-2/cdh/datas/wordcount02/outputs

    The command uses these parameters:

    ParameterDescriptionExample
    JAR file nameName of the uploaded CDH JAR resourceonaliyun_mr_wordcount-1.0-SNAPSHOT.jar
    Main classFully qualified name of the MapReduce main classcn.apache.hadoop.onaliyun.examples.EmrWordCount
    Input pathOSS path to the input data directoryoss://onaliyun-bucket-2/cdh/datas/wordcount02/inputs
    Output pathOSS path where the job writes resultsoss://onaliyun-bucket-2/cdh/datas/wordcount02/outputs

    Replace the JAR file name, main class, and input/output paths with your actual values.

Note

Do not add comments to CDH MR node code.

Step 3: Configure scheduling properties

To have DataWorks run the task on a schedule, click Properties in the right-side navigation pane and configure the following:

Important

Configure both Rerun and Parent Nodes on the Properties tab before committing the task.

Step 4: Debug the task

  1. (Optional) Select a resource group and configure parameters. Click the 高级运行 icon in the toolbar. In the Parameters dialog box, select the resource group to use for debugging. If your code uses scheduling parameters, assign values to those parameters for the debug run. See Differences in scheduling parameter value assignment among run modes.

  2. Save and run the task. Click the 保存 icon to save, then click the 运行 icon to run.

  3. (Optional) Run smoke testing. Smoke testing verifies the task in the development environment before or after you commit it. See Perform smoke testing.

What's next

Commit and deploy the task:

  1. Click the 保存 icon to save the task.

  2. Click the 提交 icon to commit the task.

  3. In the Submit dialog box, enter a Change description and click Confirm.

If your workspace is in standard mode, deploy the task to the production environment after committing. Click Deploy in the top navigation bar. See Deploy tasks.

Monitor the task in Operation Center:

Click Operation Center in the upper-right corner of the node's configuration tab to view the task in the production environment. See View and manage auto triggered tasks.

To view more information about the task, click Operation Center in the top navigation bar of the DataStudio page. See Operation Center overview.