You can use a Hive internal or external table to access HBase data in your E-MapReduce (EMR) cluster. This topic describes how to use Hive to access EMR HBase data in your EMR cluster.
Prerequisites
A custom cluster is created, and Hive, HBase, and ZooKeeper are selected for the cluster. For more information, see Create a cluster.
You have logged on to the cluster. For more information, see Log on to a cluster.
Use a Hive internal table to access HBase data
If no table is created in HBase, you can create an internal table in Hive. This way, a table that has the same schema as the Hive internal table is automatically created in HBase. In this example, an internal table is created in Hive to access HBase data.
- Run the following command to open the Hive CLI:
hive
Create an internal table in Hive and query data in the table.
Run the following command to create an internal table in Hive:
create table hive_hbase_table(key int, value string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties("hbase.columns.mapping" = ":key,cf1:val") tblproperties("hbase.table.name" = "hive_hbase_table", "hbase.mapred.output.outputtable" = "hive_hbase_table");
NoteHBaseStorageHandler is used to store the internal table and read HBase data.
Run the following command to insert data into the internal table:
insert into hive_hbase_table values(212,'bab');
Run the following command to query data in the table:
select * from hive_hbase_table;
The following information is returned:
OK 212 bab Time taken: 0.337 seconds, Fetched: 1 row(s)
After you exit the Hive CLI, run the following command to open the HBase CLI:
hbase shell
Run the following command to check whether a table exists in HBase:
describe 'hive_hbase_table'
The output shown in the following figure is returned.
NoteThe preceding output shows that a table is created in HBase by Hive.
Run the following command to check whether the table in HBase contains the same data as the Hive internal table:
scan 'hive_hbase_table'
The following output is returned:
ROW COLUMN+CELL 212 column=cf1:val, timestamp=****, value=bab 1 row(s) in 0.2320 seconds
NoteThe preceding output shows that the table in HBase contains the same data as the Hive internal table. This indicates that you have used Hive to access HBase data.
Use a Hive external table to access HBase data
If you want to use Hive to access an existing HBase table named hbase_table, you can create an external table in Hive and establish a mapping between the Hive external table and the HBase table to access data in the HBase table.
After you exit the Hive CLI, run the following command to open the HBase CLI:
hbase shell
Create an external table in HBase and query data in the table.
Run the following command to create an external table in HBase:
create 'hbase_table','f'
Run the following command to insert data into the external table:
put 'hbase_table','1122','f:col1','hello'
Run the following command to query data in the table:
scan 'hbase_table'
The following information is returned:
ROW COLUMN+CELL 1122 column=f:col1, timestamp=1627027165760, value=hello 1 row(s) in 0.0170 seconds
After you exit the HBase CLI, run the following command to open the Hive CLI:
hive
Run the following command to create an external table named hbase_table in Hive and establish a mapping between the Hive external table and the HBase table:
create external table hbase_table(key int,col1 string,col2 string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties("hbase.columns.mapping" = "f:col1,f:col2") tblproperties("hbase.table.name" = "hbase_table", "hbase.mapred.output.outputtable" = "hbase_table");
Run the following command to query data in the hbase_table table in Hive:
select * from hbase_table;
The following output is returned:
1122 hello NULL
NoteThe preceding output shows that the hbase_table external table contains the same data as the HBase table. This indicates that you have used Hive to access HBase data.
Enable Kerberos authentication for EMR HBase
If you enable Kerberos authentication for EMR HBase, you must configure the parameters that are related to Kerberos authentication before you associate Hive with HBase tables and use Hive to query data from the tables. You can use one of the following methods to configure the parameters:
Configure the parameters in the Hive CLI
set hbase.security.authentication=kerberos; set hbase.master.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM; set hbase.regionserver.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM; set hbase.zookeeper.quorum=master-1-1;
Configure the parameters by using environment variables
env HIVE_OPTS="-hiveconf hbase.security.authentication=kerberos -hiveconf hbase.master.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM -hiveconf hbase.regionserver.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM -hiveconf hbase.zookeeper.quorum=master-1-1" hive
To obtain the value of ${CLUSTER_ID}
, log on to a node in your cluster and run the hostname
command. Extract the string that starts from c- after the period (.) to the end of the return value and convert the string to uppercase. The converted string is the value of ${CLUSTER_ID}.
References
For information about how to use Hive to access data in ApsaraDB for HBase, see Use Hive to access data in ApsaraDB for HBase.