All Products
Search
Document Center

DataWorks:Create a CDH Impala node

Last Updated:Mar 25, 2026

CDH Impala nodes let you run Impala SQL queries directly in DataWorks DataStudio against your Cloudera's Distribution Including Apache Hadoop (CDH) cluster. Compared to CDH Hive nodes, CDH Impala nodes provide higher query performance, making them a better choice for interactive and ad hoc queries.

Prerequisites

Before you begin, ensure that you have:

Limitations

CDH Impala tasks run on serverless resource groups or old-version exclusive resource groups for scheduling. We recommend that you run tasks on serverless resource groups.

Step 1: Create a CDH Impala node

  1. Go to the DataStudio page. Log on to the DataWorks console. In the top navigation bar, select the region you want. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select your workspace from the drop-down list and click Go to Data Development.

  2. In the Scheduled Workflow pane, right-click the workflow you want, then choose Create Node > CDH > CDH Impala.

    Alternatively, hover over the Create icon at the top of the Scheduled Workflow pane and create a CDH Impala node from there.
  3. In the Create Node dialog box, enter a Name and click Confirm.

Step 2: Develop an Impala task

Double-click the node name to open its configuration tab.

Select a CDH cluster (optional)

If multiple CDH clusters are registered to your workspace, select one from the Engine Instance CDH drop-down list. If only one cluster is registered, skip this step.

image.png

Write SQL code

In the SQL editor, enter your Impala SQL. For example:

SHOW tables;

SELECT * FROM userinfo;

Use scheduling parameters

DataWorks scheduling parameters let you inject dynamic values into your SQL at runtime. Define a variable in your code using ${variable} syntax, then assign a value in the Scheduling Parameter section on the Properties tab.

SELECT '${var}'; -- Replace var with a scheduling parameter value at runtime.

For supported formats, see Supported formats of scheduling parameters.

Step 3: Configure scheduling properties

To run the task on a periodic schedule, click Properties in the right-side navigation pane and configure the following:

Step 4: Debug the task

  1. (Optional) Select a resource group and assign values to scheduling parameters. Click the 高级运行 icon in the top toolbar to open the Parameters dialog box. Select the resource group to use for debugging. If your SQL uses scheduling parameters, assign concrete values to each variable for this debug run. For details on how values are assigned in different run modes, see What are the differences in the value assignment logic of scheduling parameters among the Run, Run with Parameters, and Perform Smoke Testing in Development Environment modes?

  2. Save and run the SQL. Click the 保存 icon to save, then click the 运行 icon to execute the SQL.

  3. (Optional) Run smoke testing. Smoke testing validates the task logic in the development environment before or after committing. See Perform smoke testing.

What's next

Commit and deploy the task

  1. Click the 保存 icon to save.

  2. Click the 提交 icon to commit.

  3. In the Submit dialog box, enter a Change description and click Confirm.

If your workspace is in standard mode, deploy the task to the production environment after committing: click Deploy in the top navigation bar of DataStudio. See Deploy tasks.

View the task in Operation Center

  1. Click Operation Center in the upper-right corner of the node configuration tab to go to Operation Center in the production environment.

  2. Locate your task and check its run status, logs, and rerun history. See View and manage auto triggered tasks.

For a full overview of Operation Center, see Overview.