Use a CDH Hive node in DataWorks to run Hive tasks, such as data queries or batch processing jobs, on your deployed CDH cluster. This topic describes how to configure and use a CDH Hive node.
Prerequisites
Before you begin, ensure that you have:
An Alibaba Cloud CDH cluster bound to a DataWorks workspace, with the Hive component installed and Hive connection information configured. For more information, see Data Studio: Associate a CDH computing resource.
A Hive data source configured in DataWorks that has passed the connectivity test. For more information, see Data Source Management.
(RAM users only) Added to the workspace with the Developer or Workspace Administrator role. For more information, see Add members to a workspace.
The Workspace Administrator role has extensive permissions. Grant it with caution. Root account users can skip this step.
Supported resource groups
Run this node type using one of the following resource groups:
| Resource group | Description |
|---|---|
| Serverless Resource Group (recommended) | — |
| Exclusive Resource Group for Scheduling | — |
Create a node
For instructions, see Create a node.
Write task code
Write your Hive SQL in the node editor. To pass values dynamically at runtime, define variables using the ${variable_name} syntax, then assign values to those variables under Scheduling Configuration > Scheduling Parameters.
SHOW TABLES;
SELECT * FROM userinfo;
-- You can use this with scheduling parameters.
SELECT '${var}';For supported expressions and value sources, see Sources and expressions of scheduling parameters.
Run the node
In the Scheduling Configuration tab, go to the Run Configuration section and configure the following settings:
Setting What to select Notes Compute Resource The CDH cluster registered in DataWorks Select the cluster name you registered when binding the CDH computing resource. Resource Group The scheduling resource group that passed the connectivity test For connectivity options, see Network connectivity solutions. In the toolbar at the top of the node editor, click Run.
What's next
Node scheduling configuration: Set up recurring runs by configuring Time Property and other scheduling properties in the Scheduling configuration panel.
Publish a node: Click the
icon to publish the node to the production environment. Only published nodes are scheduled.Task O&M: Monitor scheduled runs in O&M Center after publishing. For more information, see Getting started with Operation Center.