A CDH Presto node is a distributed SQL query engine used for real-time data analytics on your CDH cluster in DataWorks.
Prerequisites
Before you begin, ensure that you have:
An Alibaba Cloud CDH cluster registered in DataWorks, with the Presto component installed and configured at bind time. For setup instructions, see Data Studio: Associate a CDH computing resource.
ImportantThe Presto component must be installed on the CDH cluster and its settings must be configured when you bind the cluster to your workspace.
(Optional) If using a RAM user account, the user added to the workspace with the Developer or Workspace Administrator role. The Workspace Administrator role grants extensive permissions — assign it with caution. For details, see Add members to a workspace. Root account users can skip this step.
A Hive data source configured in DataWorks with a passing connectivity test. For setup instructions, see Data Source Management.
Create a node
See Create a node for instructions.
Write SQL
Write your Presto SQL in the SQL editor. A minimal example:
SHOW TABLES;
SELECT * FROM userinfo;Use scheduling parameters
The editor supports scheduling parameters using the ${variable_name} format. Define the variable in your code, then assign its value in Scheduling configuration > Scheduling parameters on the right panel. This lets you pass dynamic values to scheduled runs without modifying the code.
-- ${var} is resolved at runtime from Scheduling parameters
SELECT '${var}';For the full list of supported parameter formats and expressions, see Sources and expressions of scheduling parameters.
Run and debug
In Run Configuration > Compute resource, set the following:
Field What to set Compute resource Your registered CDH cluster Resource group A scheduling resource group that has passed the data source connectivity test. See Network connectivity solutions if none are available. Click Run on the toolbar.
Next steps
Schedule the node: To run the node on a recurring schedule, configure Time Property and related properties in the Scheduling configuration panel. See Node scheduling configuration.
Publish to production: Click the
icon to publish the node. Only published nodes run on a schedule in the production environment. See Publish a node.Monitor runs: After publishing, track scheduled runs in O&M Center. See Getting started with Operation Center.