Use an E-MapReduce (EMR) Trino node in DataWorks to run interactive SQL queries across multiple data sources — such as Hive, MySQL, and others — without moving data between systems. For background on Trino, see Trino.
Prerequisites
Before you begin, make sure that you have:
An Alibaba Cloud EMR cluster registered with your DataWorks workspace. Tasks cannot be created until the cluster is bound. See Bind an EMR compute engine in the legacy DataStudio.
A serverless resource group purchased, bound to your workspace, and configured with network access. See Use serverless resource groups.
A workflow created in DataStudio. All nodes must belong to a workflow. See Create a workflow.
(Optional) If developing as a RAM user, the RAM user added to the workspace with the Develop or Workspace Administrator role. The Workspace Administrator role carries extensive permissions — assign it with caution. See Add members to a workspace.
Limitations
EMR Trino tasks run only on a serverless resource group.
To manage metadata for a DataLake or custom cluster — including real-time metadata display, audit logs, data lineage, and EMR data governance tasks — configure the EMR-HOOK on the cluster first. See Use Hive extensions to record data lineage and access history.
If Lightweight Directory Access Protocol (LDAP) authentication is enabled for Trino, download the keystore file from the
/etc/taihao-apps/trino-confdirectory on the cluster's master node. Then upload it in the DataWorks console: More > Management Center > Cluster Management > Account Mappings > Edit Account Mappings > Upload Keystore File.A single query run returns a maximum of 10,000 records and 10 MB of data.
Step 1: Create an EMR Trino node
Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Development and O&M > Data Development, select your workspace from the drop-down list, and click Go to Data Development.
In DataStudio, right-click the target workflow and choose Create Node > EMR > EMR Trino.
In the Create Node dialog box, enter a Name, select an Engine Instance, a Node Type, and a Path, then click Confirm.
Node names can contain uppercase letters, lowercase letters, Chinese characters, digits, underscores (_), and periods (.).
Step 2: Develop an EMR Trino task
Double-click the node to open the task development page.
Select an EMR cluster instance (optional)
If multiple EMR clusters are registered with your workspace, select the target cluster from the drop-down at the top of the node configuration page. If only one cluster is registered, it is selected automatically.

Configure connectors
Trino accesses data sources through connectors. Configure the appropriate connector before writing queries:
Hive: Configure the built-in Hive connector. See Hive connector.
MySQL: Configure the built-in MySQL connector. See MySQL connector.
Other data sources: See Configure connectors.
Write SQL
All Trino queries use a three-part path: <catalog>.<schema>.<table>. The catalog maps to the data source, the schema to the database, and the table to a specific table within that schema.
Enter your query in the editor. The following examples cover the most common patterns:
-- Query a Hive table
SELECT * FROM hive.default.hive_table;
-- Query a MySQL table
SELECT * FROM mysql.rt_data.rt_user;
-- Join a Hive table and a MySQL table
SELECT DISTINCT a.id, a.name, b.rt_name
FROM hive.default.hive_table a
INNER JOIN mysql.rt_data.rt_user b ON a.id = b.id;
-- Query a Hive table using a scheduling parameter
SELECT * FROM hive.default.${table_name};DataWorks scheduling parameters let you pass dynamic values to your SQL at runtime. Define variables using the ${variable_name} format, then assign values in the Properties > Scheduling Parameter section of the right-side pane. See Supported formats for scheduling parameters and Configure and use scheduling parameters.Run the SQL task
Two run modes are available:
Mode | When to use | Behavior |
Run | Routine execution using saved parameter values | Runs the task with the currently configured scheduling parameters |
Advanced Run | One-off runs where you need to override parameter values | Opens a dialog to select a resource group and set parameter values for this run only |
To run the task:
Click the
(Run with Parameters) icon. In the Parameters dialog box, select your scheduling resource group and click Run.- The scheduling resource group must have passed a network connectivity test with the compute resources. See Network connectivity solutions. - Each query returns a maximum of 10,000 records with a total size limit of 10 MB.
Click the
icon to save your SQL.
Configure advanced settings (optional)
Adjust SQL execution behavior in the Advanced Settings section of the right-side pane:
Parameter | Description | Default |
| Controls how multiple SQL statements are executed. Set to |
|
| Applies to test runs in the development environment. Set to |
|
Step 3: Configure task scheduling
Click Scheduling Configuration in the right-side pane and configure the scheduling properties. Configure the Rerun Property and Upstream Dependent Node before submitting. For full scheduling options, see Overview.
Step 4: Submit and deploy
Click the
icon to save the node.Click the
icon to submit the task. In the Submit dialog box, enter a Change description and choose whether to require a code review.- Configure the Rerun and Parent Nodes properties before submitting. - When code review is enabled, a reviewer must approve the submitted code before it can be deployed. This prevents unverified changes from reaching the production environment. See Code review.
For workspaces in standard mode, click Deploy in the upper-right corner after submission to deploy the task to the production environment. See Deploy tasks.
What's next
After deployment, click Operation Center in the upper-right corner to monitor the task's scheduling status. See Manage periodic tasks.
Troubleshooting
The node run fails with a connection timeout

Root cause: The resource group cannot reach the EMR cluster due to a network connectivity issue.
Resolution:
Go to the computing resource list page to initialize the resource.
Find the affected resource and click Re-initialize.
Wait for initialization to complete and verify that the status shows success before rerunning the task.

