All Products
Search
Document Center

Lindorm:Use Apache Hive to access LindormDFS

Last Updated:Jun 26, 2024

This topic describes how to use Apache Hive to access LindormDFS.

Prerequisites

  • LindormDFS is activated for your Lindorm instance. For more information, see Activate LindormDFS.

  • Java Development Kits (JDKs) are installed on the compute engine nodes. The JDK version must be 1.8 or later.

  • Apache Derby is downloaded from the official website. Apache Derby V10.13.1.1 is used in this topic as an example.

  • The compressed Apache Hive package is downloaded from the official website. Apache Hive V2.3.7 is used in this topic as an example.

Configure Apache Derby

  1. Decompress the Apache Hive package to the specified directory.

     tar -zxvf db-derby-10.13.1.1-bin.tar.gz -C /usr/local/
  2. Modify the /etc/profile configuration file and configure environment variables.

    1. Run the following command to open the configuration file /etc/profile:

      vim /etc/profile
    2. Add the following information to the end of the content in the file:

      export DERBY_HOME=/usr/local/db-derby-10.13.1.1-bin
      export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar
    3. Create a directory that is used to store the data.

      mkdir $DERBY_HOME/data
    4. Start the LindormDFS service.

      nohup /usr/local/db-derby-10.13.1.1-bin/bin/startNetworkServer &

Configure Apache Hive

  1. Decompress the Apache Hive package to the specified directory.

    tar -zxvf apache-hive-2.3.7-bin.tar.gz -C /usr/local/
  2. Modify the /etc/profile configuration file and configure environment variables.

    1. Run the following command to open the configuration file /etc/profile:

      vim /etc/profile
    2. Add the following information to the end of the content in the file:

      export HIVE_HOME=/usr/local/apache-hive-2.3.7-bin
  3. Modify the hive-env.sh file.

    1. Run the following command to open the hive-env.sh file:

      vim /usr/local/apache-hive-2.3.7-bin/hive-env.sh
    2. Modify the hive-env.sh file. The following example shows the content that is modified in the file.

      # The heap size of the jvm stared by hive shell script can be controlled via
      export HADOOP_HEAPSIZE=1024
      
      # Set HADOOP_HOME to point to a specific hadoop install directory
      HADOOP_HOME=/usr/local/hadoop-2.7.3
      
      # Hive Configuration Directory can be controlled by:
      export HIVE_CONF_DIR=/usr/local/apache-hive-2.3.7-bin/conf
      
      # Folder containing extra ibraries required for hive compilation/execution can be controlled by:
      export HIVE_AUX_JARS_PATH=/usr/local/apache-hive-2.3.7-bin/lib
  4. Modify the hive-site.xml file.

    1. Run the following command to open the hive-site.xml file:

      vim  /usr/local/apache-hive-2.3.7-bin/conf/hive-site.xml
    2. Modify the hive-site.xml file. The following example shows the content that is modified in the file.

      <configuration>
       <property>
          <name>hive.metastore.warehouse.dir</name>
          <value>/user/hive/warehouse</value>
          <description>location of default database for the warehouse</description>
       </property>
       <property>
          <name>hive.exec.scratchdir</name>
          <value>/tmp/hive</value>
          <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
        </property>
         <property>
         <name>hive.metastore.schema.verification</name>
         <value>false</value>
          <description>
          Enforce metastore schema version consistency.
          True: Verify that version information stored in metastore matches with one from Hive jars.  Also disable automatic
                schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
                proper metastore schema migration. (Default)
          False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
          </description>
        </property>
        <property>
          <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:derby://127.0.0.1:1527/metastore_db;create=true </value>
          <description>JDBC connect string for a JDBC metastore </description>
        </property>
        <property>
          <name>datanucleus.schema.autoCreateAll</name>
          <value>true</value>
        </property>
      </configuration>
  5. Create a jpox.properties file.

    1. Run the following command to open the jpox.properties file:

      vim  /usr/local/apache-hive-2.3.7-bin/conf/jpox.properties
    2. Modify the jpox.properties file. The following example shows the content that is modified in the file.

      javax.jdo.PersistenceManagerFactoryClass =org.jpox.PersistenceManagerFactoryImpl
      org.jpox.autoCreateSchema = false
      org.jpox.validateTables = false
      org.jpox.validateColumns = false
      org.jpox.validateConstraints = false
      org.jpox.storeManagerType = rdbms
      org.jpox.autoCreateSchema = true
      org.jpox.autoStartMechanismMode = checked
      org.jpox.transactionIsolation = read_committed
      javax.jdo.option.DetachAllOnCommit = true
      javax.jdo.option.NontransactionalRead = true
      javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver
      javax.jdo.option.ConnectionURL = jdbc:derby://127.0.0.1:1527/metastore_db;create = true
      javax.jdo.option.ConnectionUserName = APP
      javax.jdo.option.ConnectionPassword = mine
  6. Create the required directories for Apache Hive.

    $HADOOP_HOME/bin/hadoop fs -ls /
    If the /user/hive/warehouse and /tmp/hive paths are not found in the hive-site.xml file, create the directories in the path and grant users the write permissions. 
    $HADOOP_HOME/bin/hadoop fs -ls /user/hive/warehouse
    $HADOOP_HOME/bin/hadoop fs -ls /tmp/hive
    $HADOOP_HOME/bin/hadoop fs -chmod 775  /user/hive/warehouse
    $HADOOP_HOME/bin/hadoop fs -chmod 775 /tmp/hive
  7. Modify the io.tmpdir directory.

    In addition, modify the value of each ${system:java.io.tmpdir} field in the hive-site.xml file. The value of the field is a path. You can create a path such as /tmp/hive/iotmp and use this path to replace the value.

    mkdir /usr/local/apache-hive-2.3.7-bin/iotmp  
    chmod 777 /usr/local/apache-hive-2.3.7-bin/iotmp 

    Modify ${system:user.name} in the following code:

    <property>
        <name>hive.exec.local.scratchdir</name>
        <value>/usr/local/apache-hive-2.3.7-bin/iotmp/${system:user.name}</value>
        <description>Local scratch space for Hive jobs</description>
    </property> 

    The following sample code shows the modification:

    <property>
        <name>hive.exec.local.scratchdir</name>
        <value>/usr/local/apache-hive-2.3.7-bin/iotmp/${user.name}</value>
        <description>Local scratch space for Hive jobs</description>
    </property>
  8. Initialize the Apache Hive service.

     nohup /usr/local/apache-hive-2.3.7-bin/bin/hive --service metastore &
     nohup /usr/local/apache-hive-2.3.7-bin/bin/hive --service hiveserver2 &

Verify Apache Hive

  1. Create a table in the Apache Hive shell.

    create table test (f1 INT, f2 STRING);
  2. Write data to the table.

    insert into test values (1,'2222');
  3. Check whether the data has been written to LindormDFS.

    ${HADOOP_HOME}/bin/hadoop fs -ls /user/hive/warehouse/test.db/test

    结果