All Products
Search
Document Center

E-MapReduce:Use Hive to access HBase data in EMR

Last Updated:Mar 13, 2024

You can use a Hive internal or external table to access HBase data in your E-MapReduce (EMR) cluster. This topic describes how to use Hive to access EMR HBase data in your EMR cluster.

Prerequisites

  • A custom cluster is created, and Hive, HBase, and ZooKeeper are selected for the cluster. For more information, see Create a cluster.

  • You have logged on to the cluster. For more information, see Log on to a cluster.

Use a Hive internal table to access HBase data

If no table is created in HBase, you can create an internal table in Hive. This way, a table that has the same schema as the Hive internal table is automatically created in HBase. In this example, an internal table is created in Hive to access HBase data.

  1. Run the following command to open the Hive CLI:
    hive
  2. Create an internal table in Hive and query data in the table.

    1. Run the following command to create an internal table in Hive:

      create table hive_hbase_table(key int, value string)
      stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      with serdeproperties("hbase.columns.mapping" = ":key,cf1:val")
      tblproperties("hbase.table.name" = "hive_hbase_table", "hbase.mapred.output.outputtable" = "hive_hbase_table");
      Note

      HBaseStorageHandler is used to store the internal table and read HBase data.

    2. Run the following command to insert data into the internal table:

      insert into hive_hbase_table values(212,'bab');
    3. Run the following command to query data in the table:

      select * from hive_hbase_table;

      The following information is returned:

      OK
      212 bab
      Time taken: 0.337 seconds, Fetched: 1 row(s)
  3. After you exit the Hive CLI, run the following command to open the HBase CLI:

    hbase shell
  4. Run the following command to check whether a table exists in HBase:

    describe 'hive_hbase_table'

    The output shown in the following figure is returned.describe

    Note

    The preceding output shows that a table is created in HBase by Hive.

  5. Run the following command to check whether the table in HBase contains the same data as the Hive internal table:

    scan 'hive_hbase_table'

    The following output is returned:

    ROW                                           COLUMN+CELL                                                                                                                          
     212                                          column=cf1:val, timestamp=****, value=bab                                                                                   
    1 row(s) in 0.2320 seconds
    Note

    The preceding output shows that the table in HBase contains the same data as the Hive internal table. This indicates that you have used Hive to access HBase data.

Use a Hive external table to access HBase data

If you want to use Hive to access an existing HBase table named hbase_table, you can create an external table in Hive and establish a mapping between the Hive external table and the HBase table to access data in the HBase table.

  1. After you exit the Hive CLI, run the following command to open the HBase CLI:

    hbase shell
  2. Create an external table in HBase and query data in the table.

    1. Run the following command to create an external table in HBase:

      create 'hbase_table','f'
    2. Run the following command to insert data into the external table:

      put 'hbase_table','1122','f:col1','hello'
    3. Run the following command to query data in the table:

      scan 'hbase_table'

      The following information is returned:

      ROW                                                COLUMN+CELL
       1122                                              column=f:col1, timestamp=1627027165760, value=hello
      1 row(s) in 0.0170 seconds
  3. After you exit the HBase CLI, run the following command to open the Hive CLI:

    hive
  4. Run the following command to create an external table named hbase_table in Hive and establish a mapping between the Hive external table and the HBase table:

    create external table hbase_table(key int,col1 string,col2 string)
    stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    with serdeproperties("hbase.columns.mapping" = "f:col1,f:col2")
    tblproperties("hbase.table.name" = "hbase_table", "hbase.mapred.output.outputtable" = "hbase_table");
  5. Run the following command to query data in the hbase_table table in Hive:

    select * from hbase_table;

    The following output is returned:

    1122  hello NULL
    Note

    The preceding output shows that the hbase_table external table contains the same data as the HBase table. This indicates that you have used Hive to access HBase data.

Enable Kerberos authentication for EMR HBase

If you enable Kerberos authentication for EMR HBase, you must configure the parameters that are related to Kerberos authentication before you associate Hive with HBase tables and use Hive to query data from the tables. You can use one of the following methods to configure the parameters:

  • Configure the parameters in the Hive CLI

    set hbase.security.authentication=kerberos;
    set hbase.master.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM;
    set hbase.regionserver.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM;
    set hbase.zookeeper.quorum=master-1-1;
  • Configure the parameters by using environment variables

    env HIVE_OPTS="-hiveconf hbase.security.authentication=kerberos -hiveconf hbase.master.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM -hiveconf hbase.regionserver.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM -hiveconf hbase.zookeeper.quorum=master-1-1" hive
Note

To obtain the value of ${CLUSTER_ID}, log on to a node in your cluster and run the hostname command. Extract the string that starts from c- after the period (.) to the end of the return value and convert the string to uppercase. The converted string is the value of ${CLUSTER_ID}.

References

For information about how to use Hive to access data in ApsaraDB for HBase, see Use Hive to access data in ApsaraDB for HBase.