This topic describes how to connect E-MapReduce Hive and ApsaraDB for HBase. The analysis of HBase tables is based on the connection between Hive and ApsaraDB for HBase.

Note ApsaraDB for HBase will be integrated into Spark. We recommend that you use Spark to analyze HBase data at that time.

Preparations

  • Purchase a Pay-As-You-Go EMR cluster and create configurations based on the actual scenarios. Note: Make sure ApsaraDB for HBase and the EMR cluster are in the same VPC. We recommend that you do not enable High Availability for the cluster.
  • Add the IP addresses of all nodes in the EMR cluster to the whitelist of ApsaraDB for HBase.
  • You can view the endpoint of ZooKeeper that is built in Hive in the ApsaraDB for HBase console.
  • You need to contact the Alibaba Cloud team to open the HDFS ports of an ApsaraDB for HBase for you.

Procedures

  1. Modify Hive configurations
    • Go to the Hive configuration directory /etc/ecm/hive-conf/.
    • Modify the hbase-site.xml file by setting the value of the hbase.zookeeper.quorum property to the endpoint of ZooKeeper that is built in HBase.
      <property>
                 <name>hbase.zookeeper.quorum</name> 
                 <value>hb-bp1mhyea7754bpigt-001.hbase.rds.aliyuncs.com,hb-bp1mhyea7754bpigt-002.hbase.rds.aliyuncs.com,hb-bp1mhyea7754bpigt-003.hbase.rds.aliyuncs.com</value> 
            </property>
  2. Connect to an HBase table using a Hive table
    Create a table in Hive using the HBase handler. By doing this, the same table is created in ApsaraDB for HBase as well.
    1. Start the Hive command-line interface (CLI).

    2. Use the following statement to create a table in Hive.
      CREATE TABLE hive_hbase_table(key int, value string)
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
      TBLPROPERTIES ("hbase.table.name" = "hive_hbase_table", "hbase.mapred.output.outputtable" = "hive_hbase_table");
    3. Insert data to the HBase table in Hive.

    4. Verify that the HBase table has been created and the data has been inserted to the table.


    5. Write data to the HBase table using the put command.

      Select all data from the table in Hive.

    6. Delete the table in Hive using the drop command. The table in HBase is deleted as well, which is to be verified in the subsequent step.

      View the contents on the table in HBase using the scan command. An error message appears showing the table does not exist.

      Note

      Existing HBase tables can be connected using the Hive external tables. Deleting a Hive external table does not cause the deletion of the corresponding HBase table.

    7. Create a table in ApsaraDB for HBase and write test data to the table using the put command.

    8. Create a Hive external table to connect to an HBase table and select all data from the HBase table.

    9. Verify that deleting the Hive external table does not cause the deletion of the corresponding HBase table.


Summary

For more operations on HBase using Hive, see HBase Integration. The operations in this topic are based on Hive installed on an Alibaba Cloud EMR cluster. Operations based on Hive installed on a custom MapReduce cluster of ECS instances are similar. Note: Configuration items in the configuration file hbase-site.xml of Hive may be different from those of ApsaraDB for HBase. You only need to configure the hbase.zookeeper.quorum property for connecting to ApsaraDB for HBase using Hive.