All Products
Search
Document Center

DataWorks:Create a CDH MR node

Last Updated:Jun 20, 2026

In DataWorks, you can create CDH MR (MapReduce) nodes to process large-scale datasets.

Prerequisites

  • A workflow is created in DataStudio.

    In DataStudio, development tasks are organized into workflows. You must create a workflow before you can create a node. For more information, see Create a workflow.

  • A CDH cluster is created and registered with your DataWorks workspace.

    You must register your CDH cluster with a DataWorks workspace before creating CDH nodes and tasks. For more information, see Bind a CDH compute resource in the old version of DataStudio.

  • (Optional) If you are using a RAM user, the user must be added to the workspace and assigned the Development or Workspace Administrator role. The Workspace Administrator role has extensive permissions, so assign it with caution. For more information on adding members, see Add members to a workspace.

  • A serverless resource group is purchased and configured. The configuration includes binding the resource group to your workspace and setting up the network. For more information, see Use a serverless resource group.

Limitations

You can run this type of task on a serverless resource group or an old-version exclusive resource group for scheduling. We recommend using a serverless resource group.

Step 1: Create a CDH MR node

  1. Log on to the DataWorks console. In the target region, click Data Development and O&M > Data Development in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.

  2. Right-click a workflow and choose Create Node > cdh > CDH MR.

  3. In the Create Node dialog box, configure the node's engine instance, path, and name.

  4. Click OK. The node editor opens, where you can develop and configure the task.

Step 2: Create and reference a CDH JAR resource

DataWorks allows you to upload resources from your local machine to DataStudio and then reference them in your nodes.

  1. Create a CDH JAR resource.

    In the corresponding workflow, right-click cdh > Resources and choose Create Resource > CDH JAR. In the Create Resource dialog box, click Click Upload and select the file that you want to upload.

    In the dialog box, the Storage path defaults to /user/admin/lib. If Kerberos authentication is enabled, you must grant the current user write permissions on this directory. The uploaded JAR package cannot exceed 50 MB in size, and its Name must end with the .jar suffix.

  2. Reference the CDH JAR resource.

    1. Open the created CDH MR node to its edit page.

    2. Under cdh > Resources, find the resource that you want to reference (for example, onaliyun_mr_wordcount-1.0-SNAPSHOT.jar). Right-click the resource name and choose Insert Resource Path.

      Selecting Insert Resource Path adds a statement in the ##@resource_reference{""} format to the code editor, which indicates that the resource is successfully referenced. The following command is an example. Replace the placeholder resource package, bucket name, and path with your actual information.

      ##@resource_reference{"onaliyun_mr_wordcount-1.0-SNAPSHOT.jar"}
      onaliyun_mr_wordcount-1.0-SNAPSHOT.jar cn.apache.hadoop.onaliyun.examples.EmrWordCount oss://onaliyun-bucket-2/cdh/datas/wordcount02/inputs oss://onaliyun-bucket-2/cdh/datas/wordcount02/outputs
      Note

      Comments are not supported in the code editor for CDH MR nodes.

Step 3: Configure task scheduling

If you need to run the task on a recurring schedule, click Scheduling in the right-side pane to configure its scheduling properties:

Step 4: Debug the code

  1. (Optional) Select a runtime resource group and assign values to custom parameters.

  2. Save and run the SQL statements.

    In the toolbar, click the 保存 icon to save the SQL statements, and then click the 运行 icon to run the task.

  3. (Optional) Perform smoke testing.

    To run smoke testing in the development environment, you can do so during the commit process or after you commit the node. For more information, see Perform smoke testing.

Next steps

  1. Commit and deploy the task.

    1. In the toolbar, click the 保存 icon to save the node.

    2. In the toolbar, click the 提交 icon to commit the task.

    3. In the Commit Node dialog box, enter a Change Description.

    4. Click Determine.

    In a standard mode workspace, you must deploy the task to the production environment after you commit it. In the top menu bar, click Deploy. For more information, see Deploy tasks.

  2. View scheduled tasks.

    1. In the upper-right corner of the editor, click O&M Personnel to open the production environment's Operation Center.

    2. View the scheduled tasks that are running. For more information, see Manage scheduled tasks.

    To view more details about scheduled tasks, click Operation Center in the top menu bar. For more information, see Operation Center overview.