This topic answers common questions about Trino on E-MapReduce (EMR).
Background
Deployment
Connectivity
Configuration
Troubleshooting
What is the relationship between Trino and Presto?
Trino was formerly known as PrestoSQL, an open-source project from Starburst. Starburst was founded by the original creators of the Presto project after they left Facebook. In 2021, PrestoSQL was renamed Trino. Facebook continues to maintain its own fork, PrestoDB. The two projects have diverged in development direction but share similar basic syntax and usage.
In EMR V3.44.0 and V5.10.0, PrestoSQL is renamed Trino. In earlier EMR versions, the console still displays PrestoSQL, but Trino is the engine actually running.
What are the differences across Trino versions?
See the Trino release notes for a full changelog. Each version brings performance improvements, so select a later version when creating a cluster.
How is Trino deployed on EMR? Does Trino support high availability?
Trino uses a coordinator-worker architecture. The coordinator node runs on the master-1-1 node, and worker nodes run on core or task nodes.
Trino does not support high availability (HA). In an HA cluster, the coordinator is still deployed only on master-1-1. If you don't need to run Trino alongside Hadoop components, select only Trino when creating the cluster and leave High Service Availability off — this avoids resource waste.
How does Trino connect to DLF? Can connectors still access the Hive metastore after DLF is enabled?
DLF integration at cluster creation
-
EMR V3.44.x and earlier, V5.10.x and earlier (DataLake clusters): When creating the cluster, select the Hive service and set Metadata to DLF Unified Metadata. The Hive, Apache Iceberg, Apache Hudi, and Delta Lake connectors can then connect to Data Lake Formation (DLF). If you don't select the Hive service, configure the metadata storage center manually. For details, see Configure a metadata storage center for data in data lakes.
-
EMR V3.45.0 and later, V5.11.0 and later: Select DLF Unified Metadata for Metadata when creating a Trino cluster.
Accessing the Hive metastore alongside DLF
After DLF is enabled, default connectors such as the Hive connector lose access to the Hive metastore. To configure multiple Hive metastores or MySQL instances — or to add a connector type not included by default — use EMR's five placeholder connectors (Connector 1 through Connector 5). For each placeholder connector, set the connector.name parameter to the connector type you need, then add the corresponding configuration items. This lets you access the Hive metastore or any other supported connector type.
How does Trino access OSS? Is OSS-HDFS supported?
JindoSDK is installed by default in EMR clusters and supports password-free access. Use standard mode to access Object Storage Service (OSS) or query Hive tables stored in OSS.
OSS-HDFS is supported in Trino clusters.
How do I access the Trino web UI?
With Knox installed: Use the Knox proxy address. See Knox for details.
Without Knox: Access the web UI at {Public IP address}:{HTTP port}. The HTTP port is set by the http-server.http.port parameter; the default is 9090. If access fails, verify that port 9090 is open in the security group for your cluster.
High-security clusters: The default HTTP port is disabled. Add the following to the config.properties file on the master node group, then access the web UI using the method above.
web-ui.authentication.type=fixed
web-ui.user=trino
What the web UI shows: The web UI displays recently executed queries, including SQL statements and execution plans. Trino retains up to 200 recent queries. Queries with successful results are overwritten by newer ones, while information about abnormal queries is kept longer. To store more queries, increase the query.max-history parameter (default: 100).
I didn't install Trino when creating my cluster. What should I know before adding it later?
DataLake clusters: Add the Trino service directly if the cluster has sufficient remaining resources.
Hadoop clusters: Check whether any cluster services were upgraded before adding Trino, as upgrades can cause compatibility issues.
-
JindoSDK major version upgrades: Rerun the Presto or Trino upgrade script and copy the upgraded JindoSDK to the corresponding connector directory in the Trino installation path.
-
EMR V3.39.1: Check the service logs. If they report a missing Delta Lake class, copy the following file to each node: Source:
/opt/apps/ecm/service/deltalake/0.6.1-3.3/package/deltalake-0.6.1-3.3/presto-delta/delta-standalone-assembly-0.2.0.jarDestination:/usr/lib/presto-current/plugin/delta
Why don't my configuration changes take effect?
Trino configuration files are stored in /etc/emr/trino-conf. Check whether the changes on the Elastic Compute Service (ECS) instance match what you saved in the EMR console.
If the config file does not contain your changes:
Verify that your changes were saved, deployed, and correctly formatted. Note that modifying a configuration item for a specific node group or single node overrides the cluster-level default for that item — the cluster default no longer applies to that node.
If the config file contains your changes but they aren't active:
Trino loads configuration only on startup. Restart all Trino nodes for the changes to take effect.
Why can't Trino restart after I add a configuration item?
Error:
Error: Configuration property 'xxxxx' was not used
Trino validates all configuration items strictly on startup. This error means the configuration item was added to the wrong file, doesn't exist in that version of Trino, or is incorrectly formatted. Check the server.log file for details, correct the configuration, or roll back the change.
Queries are stuck or worker nodes are failing
Errors:
Could not communicate with the remote task. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes.No handle resolver for connector: hive ... Unrecognized token 'io': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
These errors indicate that worker nodes are overloaded or a worker process was terminated by the system. Adjust memory-related configuration parameters or reduce the number of concurrent queries.
How do I view Trino logs?
Trino log files are stored at /mnt/disk1/log/trino/var/log/. The server.log file contains output and exception stack traces.
To view the error details for a specific query, add the --debug flag when starting the Trino client — this prints the exception stack directly to the console.
The Trino coordinator communicates with worker nodes over HTTP. An HTTP exception on the coordinator often means the error originated on a worker node. If server.log on the coordinator shows no other errors, check the server.log on each worker node individually.
Queries fail or return no data
Work through the following diagnostic steps:
-
Verify the data source using another engine. Run the same query with Hive or Spark. If those engines also fail, the data source is the issue — check connectivity and data integrity.
-
Check metadata validity. If only Trino fails, verify that the table metadata is correct and up to date.
-
Check data access permissions. If metadata is valid but the query returns no results for a table that contains data, check permissions:
-
Hadoop Distributed File System (HDFS) proxyuser enabled: Set
hive.hdfs.impersonation.enabled=truein the Trino Hive connector configuration. -
Ranger enabled: Verify that the required permissions are correctly configured in Ranger.
-
After a cluster scale-out: Verify that the new node group or node has the necessary permissions to access the required files.
-
Columns are in the wrong order in Hudi or Delta Lake tables
Check that hive.parquet.use-columns-names=true is set in the hive.properties file for the Hive connector in Presto/Trino.
"Cannot query xxx table" when using the Hive connector on Iceberg, Hudi, or Delta Lake tables
Trino provides dedicated connectors for Apache Iceberg, Apache Hudi, and Delta Lake. Use the dedicated connector for each table format rather than the Hive connector.
If your job requires the Hive connector, use Trino's Table Redirection feature to forward queries to the appropriate connector. Add the following parameters to the Hive connector configuration:
hive.iceberg-catalog-name=iceberg
hive.delta-lake-catalog-name=delta-lake
hive.hudi-catalog-name=hudi
With these settings, the Hive connector automatically redirects queries on Iceberg, Delta Lake, and Hudi tables to their respective connectors.