All Products
Search
Document Center

DataWorks:CDH Impala node

Last Updated:Mar 26, 2026

A CDH Impala node lets you write and run Impala SQL scripts in DataWorks. It offers faster query performance than CDH Hive. Use this guide to configure and run a CDH Impala node end to end.

Prerequisites

Before you begin, ensure that you have:

  • An Alibaba Cloud CDH cluster bound to a DataWorks workspace. For details, see Data Studio: Associate a CDH computing resource.

    Important

    The Impala component must be installed on your CDH cluster, and its connection information must be configured when you bind the cluster.

  • (Optional, RAM users only) The RAM user added to the workspace with the Developer or Workspace Administrator role. Grant the Workspace Administrator role with caution — it carries extensive permissions. For details, see Add members to a workspace. Root account users can skip this step.

  • A Hive data source configured in DataWorks with a successful connectivity test. For details, see Data Source Management.

Create a node

For instructions, see Create a node.

Develop a node

Write your task code in the SQL editor. To pass dynamic values at runtime, define variables in your SQL using the ${VariableName} format. Then assign values to each variable in Scheduling Configuration > Scheduling Parameters on the right side of the node editor. DataWorks substitutes those values when the node runs. For more information, see Sources and expressions of scheduling parameters.

Example:

SHOW TABLES;

SELECT * FROM userinfo;

-- You can use this with Scheduling Parameters.
SELECT '${var}';

Debug a node

  1. In Run Configuration > Compute Resource, configure the following:

    Field What to select
    Compute Resource The CDH cluster you registered in DataWorks.
    Resource Group A Scheduling Resource Group with a successful connection to your data source. For details, see Network connectivity solutions.
  2. On the toolbar at the top of the node editor, click Run.

What's next

  • Node scheduling configuration: To run the node automatically on a recurring schedule, configure Time Property and related scheduling properties in the Scheduling configuration panel on the right side of the page.

  • Publish a node: To move the node to the production environment, click the image icon. Only nodes published to the production environment are scheduled.

  • Getting started with Operation Center: After publishing, monitor scheduled runs in the O&M Center.