All Products
Search
Document Center

E-MapReduce:Use Hive to access data in ApsaraDB for HBase

Last Updated:Mar 27, 2026

To run multi-table association queries on ApsaraDB for HBase data, connect Hive in your E-MapReduce (EMR) cluster to ApsaraDB for HBase. This topic walks you through the configuration and two integration patterns: mapping an external table to an existing HBase table, and creating a managed table that Hive controls end-to-end.

Prerequisites

Before you begin, ensure that you have:

  • A DataLake cluster. For more information, see Create a cluster.

  • An ApsaraDB for HBase cluster created in the same virtual private cloud (VPC) as your EMR cluster. For more information, see Purchase a cluster.

Note

This topic uses ApsaraDB for HBase Standard Edition V2.0. The ApsaraDB for HBase Performance-enhanced Edition (Lindorm) is not supported.

Step 1: Add Hive configurations

Configure the hbase.zookeeper.quorum property in Hive so that it can locate your ApsaraDB for HBase cluster.

  1. Go to the Configure tab.

    1. Log on to the EMR console.

    2. In the top navigation bar, select a region and a resource group based on your business requirements.

    3. On the EMR on ECS page, find your cluster and click Services in the Actions column.

    4. On the Services tab, find the Hive service and click Configure.

    5. Click the hbase-site.xml tab.

  2. Click Add Configuration Item. In the Add Configuration Item dialog box, add the following configuration item and click OK. For more information, see the Add configuration items section of the "Manage configuration items" topic.

    Configuration item

    Description

    hbase.zookeeper.quorum

    ZooKeeper address of your ApsaraDB for HBase cluster in the VPC. To get this address, go to the ApsaraDB for HBase cluster details page in the ApsaraDB for HBase console and open the Database Connection page. Example: hb-xxxx-master1-001.hbase.rds.aliyuncs.com:2181,hb-xxxx-master2-001.hbase.rds.aliyuncs.com:2181,hb-xxxx-master3-001.hbase.rds.aliyuncs.com:2181

Step 2: Check existing HBase tables

Connect to your ApsaraDB for HBase cluster (see Use HBase Shell to access an ApsaraDB for HBase Standard Edition cluster) and run the list command to check whether hive_hbase_table or hbase_table exists.

Map an external table to an existing HBase table

Use an external table when the HBase table already exists and you want to query or write to it through Hive without Hive managing the table's lifecycle. Dropping the Hive external table leaves the HBase table untouched.

  1. Create the HBase table and insert sample data using HBase Shell.

    create 'hbase_table','f'
    put 'hbase_table','1122','f:col1','hello'
    put 'hbase_table','1122','f:col2','hbase'
  2. In the Hive CLI, create an external table that maps to the HBase table.

    CREATE EXTERNAL TABLE hbase_table(key int, col1 string, col2 string)
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ("hbase.columns.mapping" = "f:col1,f:col2")
      TBLPROPERTIES (
        "hbase.table.name" = "hbase_table",
        "hbase.mapred.output.outputtable" = "hbase_table"
      );
  3. Query data from the external table.

    SELECT * FROM hbase_table;

    Expected output:

    1122    hello   hbase
  4. (Optional) Verify that dropping the Hive table does not affect HBase data. Drop the Hive external table:

    DROP TABLE hbase_table;

    Then run the list command in HBase Shell. The hbase_table table still exists, confirming that dropping an external table does not cascade to ApsaraDB for HBase.

Create a managed table

Use a managed table when you want Hive to control the table's full lifecycle. Hive creates the corresponding HBase table on creation and deletes it when the Hive table is dropped.

  1. Enter the Hive CLI.

    hive
  2. Create the managed table.

    CREATE TABLE hive_hbase_table(key int, value string)
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
      TBLPROPERTIES (
        "hbase.table.name" = "hive_hbase_table",
        "hbase.mapred.output.outputtable" = "hive_hbase_table"
      );
  3. Insert data through Hive and verify it is readable.

    INSERT INTO hive_hbase_table VALUES (212, 'bab');
    SELECT * FROM hive_hbase_table;
  4. Write data directly through HBase Shell and verify it appears in Hive. In HBase Shell, run:

    put 'hive_hbase_table','132','cf1:val','acb'

    Back in Hive, query the table:

    SELECT * FROM hive_hbase_table;

    Expected output:

    132 acb 212 bab
  5. (Optional) Verify that dropping the Hive table also removes the HBase table. Drop the managed table in Hive:

    DROP TABLE hive_hbase_table;

    Then run the following command in HBase Shell to confirm the table no longer exists:

    scan hive_hbase_table;

    The scan returns an error indicating that hive_hbase_table does not exist, confirming that dropping a managed table cascades to ApsaraDB for HBase.