All Products
Search
Document Center

E-MapReduce:Access EMR HBase data from Hive

Last Updated:Mar 26, 2026

Use a Hive internal table or external table to query HBase data in your E-MapReduce (EMR) cluster from the Hive command-line interface (CLI).

  • Internal table (also called managed table): Hive creates the corresponding table in HBase automatically. Use this when no HBase table exists yet.

  • External table: Hive maps to an existing HBase table without taking ownership. Use this when the HBase table already contains data.

Prerequisites

Before you begin, ensure that you have:

Use an internal table to access HBase data

When no table exists in HBase, create an internal table in Hive. Hive automatically creates a table with the same schema in HBase and keeps them in sync.

  1. Open the Hive CLI:

    hive
  2. Create the internal table:

    Mapping value Meaning
    :key HBase row key. Exactly one column must use this mapping.
    column-family:column-name A specific column in a column family. Example: cf1:val maps to column val in column family cf1.
    create table hive_hbase_table(key int, value string)
    stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    with serdeproperties("hbase.columns.mapping" = ":key,cf1:val")
    tblproperties("hbase.table.name" = "hive_hbase_table", "hbase.mapred.output.outputtable" = "hive_hbase_table");

    HBaseStorageHandler handles storage and data reads for the internal table. The hbase.columns.mapping property maps each Hive column to an HBase column using the following syntax: In this example, :key maps to the HBase row key and cf1:val maps to column val in column family cf1.

  3. Insert a row:

    insert into hive_hbase_table values(212,'bab');
  4. Query the table in Hive:

    select * from hive_hbase_table;

    Expected output:

    OK
    212 bab
    Time taken: 0.337 seconds, Fetched: 1 row(s)
  5. Exit the Hive CLI, then open the HBase shell to verify that Hive created the corresponding table:

    hbase shell
  6. Check the table structure in HBase:

    describe 'hive_hbase_table'

    The output shows that a table is created in HBase by Hive.

    describe

  7. Scan the table to confirm the data matches:

    scan 'hive_hbase_table'

    Expected output:

    ROW                                           COLUMN+CELL
     212                                          column=cf1:val, timestamp=****, value=bab
    1 row(s) in 0.2320 seconds

    The HBase table contains the same row inserted through Hive, confirming the integration is working.

Use an external table to access HBase data

When an HBase table already exists, create an external table in Hive and map it to the HBase table. Hive reads the data without taking ownership.

  1. Open the HBase shell:

    hbase shell
  2. Create a table and insert a row:

    create 'hbase_table','f'
    put 'hbase_table','1122','f:col1','hello'
  3. Verify the data:

    scan 'hbase_table'

    Expected output:

    ROW                                                COLUMN+CELL
     1122                                              column=f:col1, timestamp=1627027165760, value=hello
    1 row(s) in 0.0170 seconds
  4. Exit the HBase shell, then open the Hive CLI:

    hive
  5. Create an external table in Hive that maps to the HBase table:

    create external table hbase_table(key int,col1 string,col2 string)
    stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    with serdeproperties("hbase.columns.mapping" = "f:col1,f:col2")
    tblproperties("hbase.table.name" = "hbase_table", "hbase.mapred.output.outputtable" = "hbase_table");
  6. Query the external table:

    select * from hbase_table;

    Expected output:

    1122  hello NULL

    The hbase_table external table contains the same data as the HBase table. This indicates that you have used Hive to access HBase data. col2 is NULL because no value was inserted for f:col2 in HBase.

Configure Kerberos authentication

If Kerberos authentication is enabled for EMR HBase, set the following parameters before running Hive queries against HBase tables. Use either of the following methods.

Method 1: Set parameters in the Hive CLI

Run these commands at the start of your Hive session:

set hbase.security.authentication=kerberos;
set hbase.master.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM;
set hbase.regionserver.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM;
set hbase.zookeeper.quorum=master-1-1;

Method 2: Set parameters using environment variables

Pass the parameters when launching Hive:

env HIVE_OPTS="-hiveconf hbase.security.authentication=kerberos -hiveconf hbase.master.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM -hiveconf hbase.regionserver.kerberos.principal=hbase/_HOST@EMR.${CLUSTER_ID}.COM -hiveconf hbase.zookeeper.quorum=master-1-1" hive
Note To get the value of ${CLUSTER_ID}, log on to a node in your cluster and run hostname. Extract the substring starting from c- after the first period (.) to the end of the output, then convert it to uppercase.

What's next

To query data from ApsaraDB for HBase using Hive, see Use Hive to access data in ApsaraDB for HBase.