CDH Impala nodes let you run Impala SQL queries directly in DataWorks DataStudio against your Cloudera's Distribution Including Apache Hadoop (CDH) cluster. Compared to CDH Hive nodes, CDH Impala nodes provide higher query performance, making them a better choice for interactive and ad hoc queries.
Prerequisites
Before you begin, ensure that you have:
Registered a CDH cluster with your DataWorks workspace. See DataStudio (old version): Associate a CDH computing resource.
Limitations
CDH Impala tasks run on serverless resource groups or old-version exclusive resource groups for scheduling. We recommend that you run tasks on serverless resource groups.
Step 1: Create a CDH Impala node
Go to the DataStudio page. Log on to the DataWorks console. In the top navigation bar, select the region you want. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select your workspace from the drop-down list and click Go to Data Development.
In the Scheduled Workflow pane, right-click the workflow you want, then choose Create Node > CDH > CDH Impala.
Alternatively, hover over the Create icon at the top of the Scheduled Workflow pane and create a CDH Impala node from there.
In the Create Node dialog box, enter a Name and click Confirm.
Step 2: Develop an Impala task
Double-click the node name to open its configuration tab.
Select a CDH cluster (optional)
If multiple CDH clusters are registered to your workspace, select one from the Engine Instance CDH drop-down list. If only one cluster is registered, skip this step.

Write SQL code
In the SQL editor, enter your Impala SQL. For example:
SHOW tables;
SELECT * FROM userinfo;Use scheduling parameters
DataWorks scheduling parameters let you inject dynamic values into your SQL at runtime. Define a variable in your code using ${variable} syntax, then assign a value in the Scheduling Parameter section on the Properties tab.
SELECT '${var}'; -- Replace var with a scheduling parameter value at runtime.For supported formats, see Supported formats of scheduling parameters.
Step 3: Configure scheduling properties
To run the task on a periodic schedule, click Properties in the right-side navigation pane and configure the following:
Basic properties — configure basic properties for the task. See Configure basic properties.
Time properties and dependencies — scheduling cycle, rerun behavior, and upstream node dependencies. See Configure time properties and Configure same-cycle scheduling dependencies.
ImportantConfigure the Rerun and Parent Nodes parameters before committing the task.
Resource properties — the resource group that runs the task. If the task needs to access the internet or a virtual private cloud (VPC), select a resource group that has the required network connectivity. See Configure the resource property and Network connectivity solutions.
Step 4: Debug the task
(Optional) Select a resource group and assign values to scheduling parameters. Click the
icon in the top toolbar to open the Parameters dialog box. Select the resource group to use for debugging. If your SQL uses scheduling parameters, assign concrete values to each variable for this debug run. For details on how values are assigned in different run modes, see What are the differences in the value assignment logic of scheduling parameters among the Run, Run with Parameters, and Perform Smoke Testing in Development Environment modes?Save and run the SQL. Click the
icon to save, then click the
icon to execute the SQL.(Optional) Run smoke testing. Smoke testing validates the task logic in the development environment before or after committing. See Perform smoke testing.
What's next
Commit and deploy the task
Click the
icon to save.Click the
icon to commit.In the Submit dialog box, enter a Change description and click Confirm.
If your workspace is in standard mode, deploy the task to the production environment after committing: click Deploy in the top navigation bar of DataStudio. See Deploy tasks.
View the task in Operation Center
Click Operation Center in the upper-right corner of the node configuration tab to go to Operation Center in the production environment.
Locate your task and check its run status, logs, and rerun history. See View and manage auto triggered tasks.
For a full overview of Operation Center, see Overview.