ApsaraDB for HBase Performance-enhanced Edition allows you to connect to the database service by using Hive. However, Hive does not use the standard operations such as GET or PUT to call ApsaraDB for HBase, but calls the internal classes in ApsaraDB for HBase. Therefore, you must replace the existing JAR files in the hive/lib directory, rather than adding the alihbase-connector JAR file in the hive/lib directory. 1. Delete JAR files whose names start with hbase in the hive/lib directory. The following figure shows the JAR files to be deleted, which are highlighted in red. We recommend that you do not delete hive-hbase-handler-{version}.jar. This JAR file includes the logic code used to connect to ApsaraDB for HBase in Hive. 2. Click here to download all JAR files whose names start with alihbase- and put the JAR files in the hive/lib directory. If there are other dependencies, it depends on the actual situation. You do not need to replace the JAR files if they already exist. 3. If you have specified the dependency by setting the parameter --auxpath in hive/.hiverc or by setting this parameter when you start Hive, replace the loaded JAR file with a new JAR file starting with alihbase.

Prerequisites

Retrieve an endpoint

For more information, see Connect to a cluster. If you want to connect to the service over a public network, use the public endpoint in the API operation.

Retrieve the username and password

For more information, see Connect to a cluster. By default, both the username and the password are root. If you disable the Access Control List (ACL) feature on the cluster management page, the username and password are not required.

Add the IP address of the server where the Hive is deployed to the ApsaraDB for HBase whitelist.

All IP addresses of Hive servers used to connect to ApsaraDB for HBase must be added to the whitelist of the ApsaraDB for HBase cluster. Otherwise, Hive clients cannot connect to ApsaraDB for HBase. For more information, see Configure the whitelist.

Configure connection parameters in Hive

There are two methods to configure the parameters for connecting to ApsaraDB for HBase in Hive. You can specify Hive connection parameters in the hive-site.xml file. Add the following configurations to this file:

    <configuration>
      <! --
    The public endpoint or Virtual Private Cloud (VPC) internal endpoint used to connect the cluster. You can retrieve the endpoint on the Database Connection page in the console.
    -->
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>ld-xxxx-proxy-hbaseue.hbaseue.xxx.rds.aliyuncs.com:30020</value>
    </property>
    <! --
    By default, both the username and the password values are set to root. You can change the username and the password.
    -->
    <property>
        <name>hbase.client.username</name>
        <value>root</value>
    </property>
    <property>
        <name>hbase.client.password</name>
        <value>root</value>
    </property>
</configuration>
			

You can also specify the parameters by running the following commands on the Hive client:

set hbase.zookeeper.quorum=ld-xxxx-proxy-hbaseue.hbaseue.xxx.rds.aliyuncs.com:30020
set hbase.client.username=root
set hbase.client.password=root
			

After you specify the parameters, you can create Hive external tables for ApsaraDB for HBase.

How to use Hive

If the ApsaraDB for HBase table that you want to manage does not exist, you can run the command for creating a table in Hive. A Hive table and ApsaraDB for HBase table are created and automatically associated with each other.

  • Launch the Hive CLI.
  • Create an ApsaraDB for HBase table.
CREATE TABLE hive_hbase_table(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "hive_hbase_table", "hbase.mapred.output.outputtable" = "hive_hbase_table");
			
  • Insert data into the ApsaraDB for HBase table by Hive CLI.
insert into hive_hbase_table values(212,'bab'); 
			
  • View the ApsaraDB for HBase table. You can see the table has been created and the data has been inserted.
  • Write data to the ApsaraDB for HBase table and check the data in Hive.
  • Query the data in Hive:

  • After you delete the table in Hive, the associated table in ApsaraDB for HBase is also deleted.

  • Query the table in ApsaraDB for HBase. An error message is returned indicating that the table does not exist.
  • If the ApsaraDB for HBase table already exists, you can associate it with an external table in Hive. If you delete the external table, the ApsaraDB for HBase table will not be deleted.

  • Create an ApsaraDB for HBase table and use PUT to insert data into the table.
  • Create an external table that is associated with the ApsaraDB for HBase table in Hive and query data in Hive.
  • Delete the external table in Hive. The associated ApsaraDB for HBase table still exists.

For more information, visit https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration.

Note: If you associate Hive with snapshots of ApsaraDB for HBase Performance-enhanced Edition, you are unable to read HFiles by Hive. You can create a table in Hive and associate this Hive table with the ApsaraDB for HBase table or create an external table in Hive to associate with the existing ApsaraDB for HBase table. No matter which method you use, you can query the data in ApsaraDB for HBase by Hive.