All Products
Search
Document Center

DataWorks:CDH Hive nodes

Last Updated:Nov 18, 2025

If you have a Cloudera Distribution for Hadoop (CDH) cluster, you can use CDH Hive nodes in DataWorks to run Hive tasks, such as data query jobs and batch data processing. This topic describes how to configure and use CDH Hive nodes.

Prerequisites

  • An Alibaba Cloud CDH cluster is created and attached to a DataWorks workspace. For more information, see Data Development (New): Attach a CDH computing resource.

    Important

    The Hive component must be installed on the CDH cluster, and the Hive connection information must be configured when you attach the cluster.

  • (Optional) If you use a RAM user, the user must be added to the corresponding workspace for task development and granted the Developer or Workspace Administrator role. The Workspace Administrator role has extensive permissions and must be granted with caution. For more information about adding members, see Add members to a workspace.

    Note

    If you use an Alibaba Cloud account, you can skip this step.

  • A Hive data source is configured in DataWorks and passes the connectivity test. For more information, see Data Source Management.

Limits

You can run this type of task on Serverless resource groups (recommended) or legacy exclusive resource groups.

Create a node

For more information, see Create a node.

Develop the node

In the SQL editing area, you can develop the code for a node. In your code, use the ${variable_name} format to define a variable. Then, on the right side of the node editing page, you can assign a value to the variable in the Scheduling Configurations section under Scheduling Parameters. This lets you dynamically pass parameters to the code in scheduling scenarios. For more information, see Supported formats for scheduling parameters. The following is an example.

SHOW TABLES;

SELECT * FROM userinfo ;
-- You can use this with scheduling parameters.
SELECT '${var}'; 

Test the node

  1. In the Computing Resources section of Debug Configuration, you can configure the Computing Resource and Resource Group.

    1. Set Computing Resource to the name of the CDH cluster that you registered in DataWorks.

    2. Set Resource Group to the scheduling resource group that passed the connectivity test with the data source. For more information, see Network connectivity solutions.

  2. Click Run Job on the toolbar at the top of the node editing page.

What to do next

  • Node scheduling: To execute a node in a project folder periodically, you need to set a Scheduling Policy in the Scheduling Configuration section on the right of the node and configure the scheduling properties.

  • Publish a node: If the node needs to run in the production environment, click the image icon to publish it. A node in the project folder runs on a schedule only after it is published to the production environment.

  • Node O&M: After the node is published, you can view the status of the auto triggered task in Operation Center. For more information, see Get started with Operation Center.