This topic describes how to manually configure and connect Presto to LindormDFS of Lindorm.
Background
Presto is an open source distributed SQL query engine that is suitable for interactive queries and analysis. You can use Presto to query data of gigabytes to petabytes. Presto allows you to query data online. You can use Presto to query data on Apache Hive, Apache Cassandra, and ApsaraDB RDS. You can also use Presto to query proprietary data stores.
This topic shows you how to use Presto to connect to a Hive metastore. This allows you to query data on LindormDFS. If you need to use Presto together with LindormDFS, configure the required dependencies.
Before you begin
Perform the following steps to configure Presto. This way, you can use Presto to perform read and write operations on LindormDFS.
Activate LindormDFS. For more information, see Activate LindormDFS.
Install Java Development Kit (JDK) 1.8 or later on compute nodes.
Install Apache Hive in a Hadoop cluster. For more information, see Connect Apache Hive to LindormDFS.
Download the Presto installation package and the presto-cli-xxx-executable.jar file.
Download Presto from the official website. In this topic, Presto 0.241 is used.
Configure Presto
You can perform the following steps to configure Presto. For more information, see Deploy Presto in the official documentation.
Decompress the Presto installation package to a specified directory.
tar -zxvf presto-server-0.241.tar.gz -C /usr/local/Create an etc directory in the directory after the Presto installation package is decompressed.
mkdir /usr/local/presto-server-0.241/etcConfigure a node properties file.
Create a file named
etc/node.properties.vim /usr/local/presto-server-0.241/etc/node.propertiesAdd the following content to the
etc/node.propertiesfile:node.environment=test node.id=ffffffff-ffff-ffff-ffff-ffffffffffff node.data-dir=/tmp/presto/data
Configure a JVM config file.
Create a file named
etc/jvm.config.vim /usr/local/presto-server-0.241/etc/jvm.configAdd the following content to the
etc/jvm.configfile:-server -Xmx8G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError
Configure a config properties file.
In this topic, a coordinator and a worker are deployed in the same Presto server. For information about how to deploy a coordinator and a worker to different servers, see Presto official documentation.
Create a file named
etc/config.properties.vim /usr/local/presto-server-0.241/etc/config.propertiesAdd the following content to the
etc/config.propertiesfile:coordinator=true node-scheduler.include-coordinator=true http-server.http.port=8080 query.max-memory=5GB query.max-memory-per-node=1GB query.max-total-memory-per-node=2GB discovery-server.enabled=true discovery.uri=http://xx.xx.xx.xx:8080 # Replace xx.xx.xx.xx with the IP address of the Presto server.
Configure a log level.
Create a file named
etc/log.properties.vim /usr/local/presto-server-0.241/etc/log.propertiesAdd the following content to the
etc/log.propertiesfile:com.facebook.presto=INFO
Configure a catalog properties file.
Create a folder named
etc/catalog.mkdir /usr/local/presto-server-0.241/etc/catalogCreate a file named
etc/catalog/hive.properties.vim /usr/local/presto-server-0.241/etc/catalog/hive.propertiesAdd the following content to the
etc/catalog/hive.propertiesfile:connector.name=hive hive.metastore.uri=thrift://xxxx:9083 # Replace xxxx with the IP address of the Hive metastore server that you want to connect. hive.config.resources=/usr/local/hadoop-2.7.3/etc/hadoop/core-site.xml,/usr/local/hadoop-2.7.3/etc/hadoop/hdfs-site.xml # Specify the directory of the core-site.xml file in the Hadoop cluster.
Copy and rename the
presto-cli-xxx-executable.jarfile to the bin directory in the Presto installation directory. Then, grant the user the read and write permissions on the file.cp ~/presto-cli-0.241-executable.jar /usr/local/presto-server-0.241/bin/mv /usr/local/presto-server-0.241/bin/presto-cli-0.241-executable.jar /usr/local/presto-server-0.241/bin/prestochmod +x /usr/local/presto-server-0.241/bin/presto
Verify the Presto configuration
Create test data and load the test data into Hive.
Create test data.
echo -e "tt1\ntt2\ntt1\ntt2\ntt3\ntt4\ntt4\ntt5\ntt6" > ~/test.txtUpload the test data to LindormDFS.
$HADOOP_HOME/bin/hadoop fs -mkdir /presto $HADOOP_HOME/bin/hadoop fs -put test.txt /presto/Create a table named test_data and load the test data into the table.
hive> create external table test_data(word string) row format delimited fields terminated by '\n' stored as textfile location '/presto';Check whether the data is properly loaded.
hive> select * from test_data;If the information similar to the following command output appears, the data is properly loaded.

Use Presto to connect to Hive. Then, you can use Hive to query data on LindormDFS. You can also use Presto to run WordCount.
Start the Presto server.
/usr/local/presto-server-0.241/bin/launcher startUse Presto to connect to Hive.
Use Presto to connect to Hive.
/usr/local/presto-server-0.241/bin/presto --server localhost:8080 --catalog hive --schema defaultQuery data on LindormDFS.
presto:default> select * from test_data;
Run WordCount.
presto:default> select word, count(*) from test_data group by word;