Use a Hive internal table or external table to query HBase data in your E-MapReduce (EMR) cluster from the Hive command-line interface (CLI).
-
Internal table (also called managed table): Hive creates the corresponding table in HBase automatically. Use this when no HBase table exists yet.
-
External table: Hive maps to an existing HBase table without taking ownership. Use this when the HBase table already contains data.
Prerequisites
Before you begin, ensure that you have:
-
A custom cluster with Hive, HBase, and ZooKeeper selected. See Create a cluster
-
Logged on to the cluster. See Log on to a cluster
Use an internal table to access HBase data
When no table exists in HBase, create an internal table in Hive. Hive automatically creates a table with the same schema in HBase and keeps them in sync.
-
Open the Hive CLI:
hive -
Create the internal table:
Mapping value Meaning :keyHBase row key. Exactly one column must use this mapping. column-family:column-nameA specific column in a column family. Example: cf1:valmaps to columnvalin column familycf1.create table hive_hbase_table(key int, value string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties("hbase.columns.mapping" = ":key,cf1:val") tblproperties("hbase.table.name" = "hive_hbase_table", "hbase.mapred.output.outputtable" = "hive_hbase_table");HBaseStorageHandlerhandles storage and data reads for the internal table. Thehbase.columns.mappingproperty maps each Hive column to an HBase column using the following syntax: In this example,:keymaps to the HBase row key andcf1:valmaps to columnvalin column familycf1. -
Insert a row:
insert into hive_hbase_table values(212,'bab'); -
Query the table in Hive:
select * from hive_hbase_table;Expected output:
OK 212 bab Time taken: 0.337 seconds, Fetched: 1 row(s) -
Exit the Hive CLI, then open the HBase shell to verify that Hive created the corresponding table:
hbase shell -
Check the table structure in HBase:
describe 'hive_hbase_table'The output shows that a table is created in HBase by Hive.

-
Scan the table to confirm the data matches:
scan 'hive_hbase_table'Expected output:
ROW COLUMN+CELL 212 column=cf1:val, timestamp=****, value=bab 1 row(s) in 0.2320 secondsThe HBase table contains the same row inserted through Hive, confirming the integration is working.
Use an external table to access HBase data
When an HBase table already exists, create an external table in Hive and map it to the HBase table. Hive reads the data without taking ownership.
-
Open the HBase shell:
hbase shell -
Create a table and insert a row:
create 'hbase_table','f' put 'hbase_table','1122','f:col1','hello' -
Verify the data:
scan 'hbase_table'Expected output:
ROW COLUMN+CELL 1122 column=f:col1, timestamp=1627027165760, value=hello 1 row(s) in 0.0170 seconds -
Exit the HBase shell, then open the Hive CLI:
hive -
Create an external table in Hive that maps to the HBase table:
create external table hbase_table(key int,col1 string,col2 string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties("hbase.columns.mapping" = "f:col1,f:col2") tblproperties("hbase.table.name" = "hbase_table", "hbase.mapred.output.outputtable" = "hbase_table"); -
Query the external table:
select * from hbase_table;Expected output:
1122 hello NULLThe hbase_table external table contains the same data as the HBase table. This indicates that you have used Hive to access HBase data.
col2isNULLbecause no value was inserted forf:col2in HBase.
Configure Kerberos authentication
If Kerberos authentication is enabled for EMR HBase, set the following parameters before running Hive queries against HBase tables. Use either of the following methods.
Method 1: Set parameters in the Hive CLI
Run these commands at the start of your Hive session:
set hbase.security.authentication=kerberos;
set hbase.master.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM;
set hbase.regionserver.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM;
set hbase.zookeeper.quorum=master-1-1;
Method 2: Set parameters using environment variables
Pass the parameters when launching Hive:
env HIVE_OPTS="-hiveconf hbase.security.authentication=kerberos -hiveconf hbase.master.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM -hiveconf hbase.regionserver.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM -hiveconf hbase.zookeeper.quorum=master-1-1" hive
${CLUSTER_ID}, log on to a node in your cluster and run hostname. Extract the substring starting from c- after the first period (.) to the end of the output, then convert it to uppercase.What's next
To query data from ApsaraDB for HBase using Hive, see Use Hive to access data in ApsaraDB for HBase.