A Cloudera's Distribution Including Apache Hadoop (CDH) Presto node lets you run distributed SQL queries against real-time data in your CDH environment directly from DataWorks DataStudio. Use this workflow to create the node, write Presto SQL, configure scheduling, and debug and deploy your task.
Prerequisites
Before you begin, ensure that you have:
A workflow created in DataStudio. See Create a workflow
A CDH cluster registered to your DataWorks workspace. See Register a CDH or CDP cluster to DataWorks
A serverless resource group purchased and configured — associated with your workspace and with network access set up. See Create and use a serverless resource group
(RAM users only) The RAM user added to the workspace with the Development role. The Workspace Administrator role also works but grants broader permissions than needed — assign it with caution. See Add workspace members and assign roles to them
Limitations
CDH Presto tasks run on serverless resource groups or old-version exclusive resource groups. We recommend that you run tasks on serverless resource groups.
Step 1: Create a CDH Presto node
Go to the DataStudio page. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O\&M \> Data Development. Select your workspace from the drop-down list and click Go to Data Development.
On the DataStudio page, find the desired workflow, right-click the workflow name, and choose Create Node \> CDH \> CDH Presto.
Alternatively, move the pointer over the Create icon at the top of the Scheduled Workflow pane and create a CDH node as prompted.
In the Create Node dialog box, configure the Name parameter and click Confirm.
Step 2: Develop a Presto task
Double-click the node name to open its configuration tab, then perform the following operations.
Select a CDH compute engine instance (optional)
If multiple CDH clusters are registered to your workspace, select one from the Engine Instance CDH drop-down list. If only one CDH cluster is registered, skip this step.

Write SQL code
In the SQL editor, enter your Presto SQL statements. Example:
show tables;
select * from userinfo;Use scheduling parameters
DataWorks scheduling parameters let you substitute dynamic values into task code at run time. Define variables in your SQL using the ${Variable} format, then assign values in the Scheduling Parameter section of the Properties tab.
select '${var}'; -- Replace var with a scheduling parameter value.For supported formats, see Supported formats of scheduling parameters.
Step 3: Configure task scheduling properties
Click Properties in the right-side navigation pane to configure how and when the task runs.
| Configuration area | What to set | Reference |
|---|---|---|
| Basic properties | Basic task settings | Configure basic properties |
| Scheduling cycle and rerun | Run frequency, rerun policy, and parent node dependencies | Configure time properties |
| Scheduling dependencies | Same-cycle dependencies between nodes | Configure same-cycle scheduling dependencies |
| Resource properties | Resource group assignment for scheduling | Configure the resource property |
Configure Rerun and Parent Nodes on the Properties tab before you commit the task.
If the node needs access to the internet or a virtual private cloud (VPC), select the resource group for scheduling connected to the node. See Network connectivity solutions.
Step 4: Debug task code
(Optional) Select a resource group and assign values to custom parameters.
Click the
icon in the top toolbar. In the Parameters dialog box, select the resource group to use for debugging.If your task code uses scheduling parameters, assign values to those variables for the debug run. See Differences in value assignment logic among Run, Run with Parameters, and Perform Smoke Testing modes.
Save and run the SQL statements. Click the
icon to save, then click the
icon to run.(Optional) Perform smoke testing. You can perform smoke testing on the task in the development environment when you commit the task or after you commit the task. See Perform smoke testing.
What's next
Commit and deploy the task:
Click the
icon to save the task.Click the
icon to commit the task.In the Submit dialog box, fill in the Change description field and click Confirm.
If your workspace is in standard mode, deploy the task to the production environment: click Deploy in the top navigation bar of DataStudio. See Deploy tasks.
View and monitor the task:
Click Operation Center in the upper-right corner of the node configuration tab to go to Operation Center in the production environment.
View your scheduled task. See View and manage auto triggered tasks.
To view more information about the task, click Operation Center in the top navigation bar of the DataStudio page. For more information, see Overview.