All Products
Search
Document Center

DataWorks:Create a CDH Impala node

Last Updated:Jun 20, 2026

In DataWorks, you can use a CDH Impala node to write and run Impala SQL scripts. Compared to CDH Hive, CDH Impala nodes provide faster query performance. This topic describes how to create and use a CDH Impala node.

Prerequisites

You have created a CDH cluster and registered it with DataWorks.

Before you create CDH-related nodes and develop CDH tasks, you must register your CDH cluster with a DataWorks workspace. For more information, see Bind a CDH compute resource in the old version of DataStudio.

Limitations

You can run this type of task on a serverless resource group or an old-version exclusive resource group for scheduling. We recommend using a serverless resource group.

Step 1: Create a CDH Impala node

  1. Log on to the DataWorks console. In the target region, click Data Development and O&M > Data Development in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.

  2. Right-click a workflow and choose Create Node > CDH > CDH Impala.

    Note

    You can also hover over the New button at the top and follow the on-screen instructions to create a CDH node.

  3. In the Create Node dialog box, enter a Name for the node and click OK.

Step 2: Develop an Impala task

Double-click the created node to open the task development page.

(Optional) Select a CDH engine instance

If your workspace is registered with multiple CDH clusters, select the appropriate one from the Engine Instance CDH drop-down list at the top of the page. If only one cluster is bound, no selection is needed. For example, select CDH production + test environment. To access a domain that has an IP address allowlist, you must use an exclusive resource group for scheduling.

Simple example

Enter the task code in the SQL editor. Example:

SHOW tables;
SELECT * FROM userinfo ;

Using scheduling parameters

DataWorks provides Scheduling Parameter to dynamically pass values to your code during scheduled runs. You can define variables in your code by using the ${Variable name} format and then assign values to these variables at Scheduling Settings > Parameter. For information about the supported formats for scheduling parameters, see Supported formats of scheduling parameters.

SELECT '${var}'; -- Use with scheduling parameters.

Step 3: Configure task scheduling

If you need to run the task on a recurring schedule, click Scheduling in the right-side pane to configure its scheduling properties:

Step 4: Debug the code

  1. (Optional) Select a runtime resource group and assign values to custom parameters.

  2. Save and run the SQL statements.

    In the toolbar, click the 保存 icon to save the SQL statements, and then click the 运行 icon to run the task.

  3. (Optional) Perform smoke testing.

    To run smoke testing in the development environment, you can do so during the commit process or after you commit the node. For more information, see Perform smoke testing.

Next steps

  1. Commit and deploy the task.

    1. In the toolbar, click the 保存 icon to save the node.

    2. In the toolbar, click the 提交 icon to commit the task.

    3. In the Commit Node dialog box, enter a Change Description.

    4. Click Determine.

    In a standard mode workspace, you must deploy the task to the production environment after you commit it. In the top menu bar, click Deploy. For more information, see Deploy tasks.

  2. View scheduled tasks.

    1. In the upper-right corner of the editor, click O&M Personnel to open the production environment's Operation Center.

    2. View the scheduled tasks that are running. For more information, see Manage scheduled tasks.

    To view more details about scheduled tasks, click Operation Center in the top menu bar. For more information, see Operation Center overview.