All Products
Search
Document Center

Connect Presto to LindormDFS

Last Updated: Jul 09, 2021

This topic describes how to manually configure and connect Presto to LindormDFS of ApsaraDB for Lindorm (Lindorm).

Background

Presto is an open source distributed SQL query engine that is suitable for interactive queries and analysis. You can use Presto to query data of gigabytes to petabytes. Presto allows you to query data online. You can use Presto to query data on Apache Hive, Apache Cassandra, and ApsaraDB RDS. You can also use Presto to query proprietary data stores.

Note

This topic shows you how to use Presto to connect to a Hive metastore. This allows you to query data on LindormDFS. If you need to use Presto together with LindormDFS, configure the required dependencies.

Before you begin

Perform the following steps to configure Presto. This way, you can use Presto to perform read and write operations on LindormDFS.

  1. Activate LindormDFS. For more information, see Activate LindormDFS.

  2. Install Java Development Kit (JDK) 1.8 or later on compute nodes.

  3. Install Apache Hive in a Hadoop cluster. For more information, see Connect Apache Hive to LindormDFS.

  4. Download the Presto installation package and the presto-cli-xxx-executable.jar file.

    Download Presto from the official website. In this topic, Presto 0.241 is used.

Configure Presto

You can perform the following steps to configure Presto. For more information, see Deploy Presto in the official documentation.

  1. Decompress the Presto installation package to a specified directory.

    tar -zxvf  presto-server-0.241.tar.gz -C /usr/local/
  2. Create an etc directory in the directory after the Presto installation package is decompressed.

    mkdir /usr/local/presto-server-0.241/etc
  3. Configure a node properties file.

    1. Create a file named etc/node.properties.

      vim /usr/local/presto-server-0.241/etc/node.properties
    2. Add the following content to the etc/node.properties file:

      node.environment=test
      node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
      node.data-dir=/tmp/presto/data
  4. Configure a JVM config file.

    1. Create a file named etc/jvm.config.

      vim /usr/local/presto-server-0.241/etc/jvm.config
    2. Add the following content to the etc/jvm.config file:

      -server
      -Xmx8G
      -XX:+UseG1GC
      -XX:G1HeapRegionSize=32M
      -XX:+UseGCOverheadLimit
      -XX:+ExplicitGCInvokesConcurrent
      -XX:+HeapDumpOnOutOfMemoryError
      -XX:+ExitOnOutOfMemoryError
  5. Configure a config properties file.

    In this topic, a coordinator and a worker are deployed in the same Presto server. For information about how to deploy a coordinator and a worker to different servers, see Presto official documentation.

    1. Create a file named etc/config.properties.

      vim /usr/local/presto-server-0.241/etc/config.properties
    2. Add the following content to the etc/config.properties file:

      coordinator=true
      node-scheduler.include-coordinator=true
      http-server.http.port=8080
      query.max-memory=5GB
      query.max-memory-per-node=1GB
      query.max-total-memory-per-node=2GB
      discovery-server.enabled=true
      discovery.uri=http://xx.xx.xx.xx:8080 # Replace xx.xx.xx.xx with the IP address of the Presto server.
  6. Configure a log level.

    1. Create a file named etc/log.properties.

      vim /usr/local/presto-server-0.241/etc/log.properties
    2. Add the following content to the etc/log.properties file:

      com.facebook.presto=INFO
  7. Configure a catalog properties file.

    1. Create a folder named etc/catalog.

      mkdir /usr/local/presto-server-0.241/etc/catalog
    2. Create a file named etc/catalog/hive.properties.

      vim /usr/local/presto-server-0.241/etc/catalog/hive.properties
    3. Add the following content to the etc/catalog/hive.properties file:

      connector.name=hive
      hive.metastore.uri=thrift://xxxx:9083 # Replace xxxx with the IP address of the Hive metastore server that you want to connect.
      hive.config.resources=/usr/local/hadoop-2.7.3/etc/hadoop/core-site.xml,/usr/local/hadoop-2.7.3/etc/hadoop/hdfs-site.xml # Specify the directory of the core-site.xml file in the Hadoop cluster.
  8. Copy and rename the presto-cli-xxx-executable.jar file to the bin directory in the Presto installation directory. Then, grant the user the read and write permissions on the file.

    cp ~/presto-cli-0.241-executable.jar  /usr/local/presto-server-0.241/bin/
    mv /usr/local/presto-server-0.241/bin/presto-cli-0.241-executable.jar  /usr/local/presto-server-0.241/bin/presto
    chmod +x /usr/local/presto-server-0.241/bin/presto

Verify the Presto configuration

  1. Create test data and load the test data into Hive.

    1. Create test data.

      echo -e "tt1\ntt2\ntt1\ntt2\ntt3\ntt4\ntt4\ntt5\ntt6" > ~/test.txt
    2. Upload the test data to LindormDFS.

      $HADOOP_HOME/bin/hadoop fs -mkdir /presto
      $HADOOP_HOME/bin/hadoop fs -put test.txt /presto/
    3. Create a table named test_data and load the test data into the table.

      hive> create external table test_data(word string) row format delimited fields terminated by '\n' stored as textfile location '/presto';
    4. Check whether the data is properly loaded.

      hive> select * from test_data;

      If the information similar to the following command output appears, the data is properly loaded.

      Command output
  2. Use Presto to connect to Hive. Then, you can use Hive to query data on LindormDFS. You can also use Presto to run WordCount.

    1. Start the Presto server.

      /usr/local/presto-server-0.241/bin/launcher start
    2. Use Presto to connect to Hive.

      1. Use Presto to connect to Hive.

        /usr/local/presto-server-0.241/bin/presto --server localhost:8080 --catalog hive --schema default
      2. Query data on LindormDFS.

        presto:default> select * from test_data;
        Command output
      3. Run WordCount.

        presto:default> select word, count(*) from test_data group by word;
        Command output