All Products
Search
Document Center

Use Apache Hive to access LindormDFS

Last Updated: Jul 09, 2021

This topic describes how to use Apache Hive to access LindormDFS.

Preparations

  1. Activate the LindormDFS service. For more information, see Activate the LindormDFS service.

  2. Install Java Development Kits (JDKs) on compute nodes. The JDK version must be 1.8 or later.

  3. Download Apache Derby from the official website. The version of Apache Derby used in this topic is V10.13.1.1.

  4. Download the compressed Apache Hive package from the official website. The Apache Hive version used in this topic is V2.3.7.

Configure Apache Derby

  1. Extract the Apache Hive package to the specified directory.

     tar -zxvf db-derby-10.13.1.1-bin.tar.gz -C /usr/local/
  2. Modify the profile file in the /etc path and configure environment variables.

    1. Run the following command to open the profile file in the /etc path:

      vim /etc/profile
    2. Add the following information to the end of the content in the file:

      export DERBY_HOME=/usr/local/db-derby-10.13.1.1-bin
      export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar
    3. Create a directory that is used to store the data.

      mkdir $DERBY_HOME/data
    4. Start the LindormDFS service.

      nohup /usr/local/db-derby-10.13.1.1-bin/bin/startNetworkServer &

Configure Apache Hive

  1. Extract the Apache Hive package to the specified directory.

    tar -zxvf apache-hive-2.3.7-bin.tar.gz -C /usr/local/
  2. Modify the profile file in the /etc path and configure the environment variables.

    1. Run the following command to open the profile file in the /etc path:

      vim /etc/profile
    2. Add the following information to the end of the content in the file:

      export HIVE_HOME=/usr/local/apache-hive-2.3.7-bin
  3. Modify the hive-env.sh file.

    1. Run the following command to open the hive-env.sh file.

      vim /usr/local/apache-hive-2.3.7-bin/hive-env.sh
    2. Modify the hive-env.sh file, as shown in the following example:

      # The heap size of the jvm stared by hive shell script can be controlled via
      export HADOOP_HEAPSIZE=1024
      
      # Set HADOOP_HOME to point to a specific hadoop install directory
      HADOOP_HOME=/usr/local/hadoop-2.7.3
      
      # Hive Configuration Directory can be controlled by:
      export HIVE_CONF_DIR=/usr/local/apache-hive-2.3.7-bin/conf
      
      # Folder containing extra ibraries required for hive compilation/execution can be controlled by:
      export HIVE_AUX_JARS_PATH=/usr/local/apache-hive-2.3.7-bin/lib
  4. Modify the hive-site.xml file.

    1. Run the following command to open the hive-site.xml file:

      vim  /usr/local/apache-hive-2.3.7-bin/conf/hive-site.xml
    2. Modify the hive-site.xml file, as shown in the following example:

      <configuration>
       <property>
          <name>hive.metastore.warehouse.dir</name>
          <value>/user/hive/warehouse</value>
          <description>location of default database for the warehouse</description>
       </property>
       <property>
          <name>hive.exec.scratchdir</name>
          <value>/tmp/hive</value>
          <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
        </property>
         <property>
         <name>hive.metastore.schema.verification</name>
         <value>false</value>
          <description>
          Enforce metastore schema version consistency.
          True: Verify that version information stored in metastore matches with one from Hive jars.  Also disable automatic
                schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures
                proper metastore schema migration. (Default)
          False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
          </description>
        </property>
        <property>
          <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:derby://127.0.0.1:1527/metastore_db;create=true </value>
          <description>JDBC connect string for a JDBC metastore </description>
        </property>
        <property>
          <name>datanucleus.schema.autoCreateAll</name>
          <value>true</value>
        </property>
      </configuration>
  5. Create a jpox.properties file.

    1. Run the following command to open the jpox.properties file:

      vim  /usr/local/apache-hive-2.3.7-bin/conf/jpox.properties
    2. Modify the jpox.properties file, as shown in the following example:

      javax.jdo.PersistenceManagerFactoryClass =org.jpox.PersistenceManagerFactoryImpl
      org.jpox.autoCreateSchema = false
      org.jpox.validateTables = false
      org.jpox.validateColumns = false
      org.jpox.validateConstraints = false
      org.jpox.storeManagerType = rdbms
      org.jpox.autoCreateSchema = true
      org.jpox.autoStartMechanismMode = checked
      org.jpox.transactionIsolation = read_committed
      javax.jdo.option.DetachAllOnCommit = true
      javax.jdo.option.NontransactionalRead = true
      javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver
      javax.jdo.option.ConnectionURL = jdbc:derby://127.0.0.1:1527/metastore_db;create = true
      javax.jdo.option.ConnectionUserName = APP
      javax.jdo.option.ConnectionPassword = mine
  6. Create directories for Apache Hive.

    $HADOOP_HOME/bin/hadoop fs -ls /
    If the /user/hive/warehouse and /tmp/hive paths in the hive-site.xml file are not found, create the directories in the path and grant users the write permissions. 
    $HADOOP_HOME/bin/hadoop fs -ls /user/hive/warehouse
    $HADOOP_HOME/bin/hadoop fs -ls /tmp/hive
    $HADOOP_HOME/bin/hadoop fs -chmod 775  /user/hive/warehouse
    $HADOOP_HOME/bin/hadoop fs -chmod 775 /tmp/hive
  7. Modify the io.tmpdir directory.

    In addition, modify the value of each ${system:java.io.tmpdir} field in the hive-site.xml file. The value is a path. You can create a path such as /tmp/hive/iotmp and use this path to replace the value.

    mkdir /usr/local/apache-hive-2.3.7-bin/iotmp  
    chmod 777 /usr/local/apache-hive-2.3.7-bin/iotmp 

    Modify ${system:user.name} in the following code.

    <property>
        <name>hive.exec.local.scratchdir</name>
        <value>/usr/local/apache-hive-2.3.7-bin/iotmp/${system:user.name}</value>
        <description>Local scratch space for Hive jobs</description>
    </property> 

    The following code shows the modification.

    <property>
        <name>hive.exec.local.scratchdir</name>
        <value>/usr/local/apache-hive-2.3.7-bin/iotmp/${user.name}</value>
        <description>Local scratch space for Hive jobs</description>
    </property>
  8. Initialize the Apache Hive service.

     nohup /usr/local/apache-hive-2.3.7-bin/bin/hive --service metastore &
     nohup /usr/local/apache-hive-2.3.7-bin/bin/hive --service hiveserver2 &

Verify the Apache Hive service

  1. Create a table in the Apache Hive shell.

    hive> create table test (f1 INT, f2 STRING);
  2. Write data to the table.

    hive> insert into test values (1,'2222');
  3. Check whether the data is written to LindormDFS.

    ${HADOOP_HOME}/bin/hadoop fs -ls /user/hive/warehouse/test.db/test
    Result