This topic provides answers to some frequently asked questions about Trino.
What is the difference and relationship between Trino and Presto?
What is the deployment mode of Trino on EMR? Does Trino support the HA mode?
How does Trino access OSS? Is the OSS-HDFS service supported by Trino?
How do I access the web UI of Trino? What is web UI of Trino?
Why are the columns in the wrong order when I query data in a Hudi or Delta table?
Why am I unable to restart Trino after I add a configuration item?
What is the difference and relationship between Trino and Presto?
Trino, formerly known as PrestoSQL, is an open source service of Starburst. Starburst was established after the original founders of the Presto project left Facebook. In 2021, PrestoSQL was rebranded as Trino. Presto is still maintained by Facebook and is commonly referred to as PrestoDB. The division of PrestoDB and PrestoSQL leads to different development directions. However, the basic syntax and usage of these two services are similar.
In E-MapReduce (EMR) V3.44.0 and V5.10.0, PrestoSQL is renamed as Trino. In earlier EMR versions, PrestoSQL is still displayed in the console, but Trino is actually used.
What are the differences among the versions of Trino?
For more information about the changes in each version of Trino, see Release notes of Trino. The performance of Trino is optimized as the version is updated. Therefore, we recommend that you select a later version.
What is the deployment mode of Trino on EMR? Does Trino support the HA mode?
Trino has a standard master-slave architecture. Its coordinator node is deployed on the master-1-1 node and worker nodes are deployed on core or task nodes.
Trino does not support high availability. In a high-availability cluster, Trino deploys the coordinator node only on the master-1-1 node. If you do not require a hybrid deployment with Hadoop, select only Trino when you create a cluster.
If you do not require a hybrid deployment with Hadoop, do not turn on High Service Availability. This helps prevent resource waste.
How does Trino connect to DLF? Can a connector access the Hive metastore after DLF is enabled for your cluster? What do I do if the connectors that are provided by EMR are insufficient?
If you select the Hive service and select DLF Unified Metadata for Metadata when you create a DataLake cluster of a minor version that is earlier than EMR V3.45.0 or V5.11.0, you can use a connector such as the Hive, Iceberg, Hudi, or Delta Lake connector to connect to Data Lake Formation (DLF). If you do not select the Hive service for the cluster, you can complete configurations based on your business requirements. For more information, see Configure a metadata storage center for data in data lakes. For DataLake clusters of V3.45.0 or a later minor version and of V5.11.0 or a later minor version, you can select DLF Unified Metadata for Metadata when you create a Trino cluster.
If you enable DLF for a cluster, default connectors such as the Hive connector cannot access the Hive metastore. If you want to configure multiple Hive metastores and MySQL instances or the provided connectors do not include the connector that you want to use, you can use the placeholder connectors that are provided by EMR. EMR provides five placeholder connectors: Connector 1 to Connector 5. You can select a connector and configure it as a Hive connector or another type of connector based on your business requirements by configuring the connector.name parameter. Then, add other configuration items by referring to the topic of the specified connector type. This way, you can use the connector to access the Hive metastore.
How does Trino access OSS? Is the OSS-HDFS service supported by Trino?
By default, JindoSDK is installed in EMR clusters and supports password-free access. You can use the standard mode to access Object Storage Service (OSS) or query Hive tables stored in OSS.
Trino clusters support the OSS-HDFS service.
How do I access the web UI of Trino? What is web UI of Trino?
If Knox is installed in your cluster, you can use the Knox proxy address to access the web UI of Trino. For more information, see Knox.
If Knox is not installed in your cluster, you can use a public address in the format of {Public IP address:HTTP port number} to access the web UI of Trino. You can obtain the HTTP port number from the value of the http-server.http.port parameter. The default HTTP port number is 9090. If you fail to access the web UI of Trino, check whether port 9090 is enabled for the security group to which your cluster belongs.
If your cluster is a high-security cluster, the default HTTP port is invalid. To access the web UI of Trino, ensure that you have the required network permissions. Then, add the following configurations to the config.properties file of the master node group and use the preceding method to access the web UI of Trino.
web-ui.authentication.type=fixed
web-ui.user=trino
On the web UIs of Trino, you can obtain information about queries recently executed by Trino, including SQL statements and execution plans. Trino stores only up to 200 most recent queries. Queries that have successful execution results are overwritten by new queries. Information about abnormal queries is retained for a longer period. You can modify the query.max-history parameter to increase the number of SQL statements stored in Trino. The default value of the query.max-history parameter is 100.
I do not select the Trino service when I create a cluster. What do I need to take note of when I want to add Trino to the cluster?
For a DataLake cluster, you can directly add the Trino service to the cluster if the remaining resources of the cluster are sufficient to use Trino.
For a Hadoop cluster, check whether you upgraded specific services of the cluster. If you upgraded specific services of the cluster, an error may occur after you add the Trino service to the cluster. In this case, you can use the following methods to resolve the issue:
If you performed separate upgrades for JindoSDK, especially major version upgrades, you must rerun the upgrade script of Presto or Trino and copy the upgraded JindoSDK to the corresponding connector in the installation path of Trino.
For clusters of EMR V3.39.1, check the logs of specific services. In most cases, the logs indicate that a class related to Delta Lake cannot be found. In this case, you must copy the /opt/apps/ecm/service/deltalake/0.6.1-3.3/package/deltalake-0.6.1-3.3/presto-delta/delta-standalone-assembly-0.2.0.jar file to the /usr/lib/presto-current/plugin/delta path on each node.
Why do modified configurations not take effect?
The configuration file of Trino resides in /etc/emr/trino-conf. You need to check whether the configuration file on the Elastic Compute Service (ECS) instance exactly shows the changes in the EMR console and whether the configuration file contains added or modified configuration items. You can use the following methods to troubleshoot the issue:
The configuration file contains no added or modified configuration items: In this case, you must check whether the modified configurations are saved, whether the configurations are deployed, and whether the modified configurations meet your requirements.
ImportantIf you modify the configuration of a node group or a single node by using a configuration item, the default configuration of the cluster no longer take effect.
The configuration file contains previously added or modified configuration items: If the modified configuration file exists on nodes, check whether all nodes of Trino are restarted. Trino loads the configurations only after the restart is complete.
Why do queries get stuck? Why do my worker nodes fail?
The worker nodes are overloaded and cannot provide services, or the worker nodes are restarted if the following error message is returned: Could not communicate with the remote task. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes.
or No handle resolver for connector: hive ... Unrecognized token 'io': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
. A process on a worker node may be terminated by the system. You need to modify the configurations as needed, especially the memory-related configurations, or limit the number of concurrent requests.
How do I view the logs of Trino?
By default, log files of Trino are stored in the /mnt/disk1/log/trino/var/log/ path. The information about outputs and abnormal stacks is stored in the server.log file.
If you only want to view the error details of a query, you can run the --debug
command to display the abnormal stack when you access the Trino client.
The coordinator node of Trino communicates with worker nodes over HTTP. If an HTTP exception is reported on the coordinator node, an error may occur on a worker node. In this case, if no other exceptions are found, you need to troubleshoot the worker nodes one by one.
Why am I unable to query data? Why do queries fail?
You can troubleshoot the issue based on the following instructions:
Use other engines such as Hive and Spark to access or query data. If you cannot access data as expected, check whether the data source is connected and whether the data is intact.
If you cannot access or query data only by using Trino, check whether the metadata information is valid.
If the metadata information is valid but no result is returned when you query a table that contains data, check whether you have data access permissions first.
If the proxyuser mechanism is enabled for the Hadoop Distributed File System (HDFS) where the data resides, you need to set the hive.hdfs.impersonation.enabled parameter to true for Trino.
If you enable Trino in Ranger, check whether the related permissions are correctly configured in Ranger.
If you scale out the cluster, check whether the node group or node has the permissions or capabilities to access the required files.
Why are the columns in the wrong order when I query data in a Hudi or Delta table?
You need to check whether the hive.parquet.use-columns-names parameter is set to true in the hive.properties file of Presto.
Why am I unable to restart Trino after I add a configuration item?
If the Server.log file contains the Error: Configuration property 'xxxxx' was not used
error message, the configuration item is added to a wrong location, or specific configuration requirements are not met. The verification for Trino configuration items is strict. For example, if the added configuration item does not exist, the configuration item is not correctly configured, or the configuration item is configured in the wrong file, the configuration cannot be identified and Trino fails to start. Check whether the new configuration is correct or perform a rollback operation.
Why is the error "Cannot query xxx table"
reported when I use a Hive connector to query data in an Iceberg, Hudi, or Delta Lake table?
Trino provides separate connectors for Iceberg, Hudi, and Delta Lake. We recommend that you query data in an Iceberg, Hudi, or Delta Lake table by using the Iceberg, Hudi, or Delta Lake connector. If a Hive connector is required for your job, use the Table Redirection feature to forward queries to the Iceberg, Hudi, or Delta Lake connector.
For example, you can configure the following parameters to specify Hive connectors in Trino. This way, you can use appropriate connectors to query data in Iceberg, Delta Lake, and Hudi tables.
hive.iceberg-catalog-name=iceberg
hive.delta-lake-catalog-name=delta-lake
hive.hudi-catalog-name=hudi