This topic describes how to use Hive to access an ApsaraDB for HBase Performance-enhanced Edition cluster.
Prerequisites
The version of an ApsaraDB for HBase Performance-enhanced Edition cluster is 2.4.3 or later. For more information about how to view or update the version of an ApsaraDB for HBase Performance-enhanced Edition cluster, see Minor version updates.
The IP address of a client is added to the whitelist of the ApsaraDB for HBase Performance-enhanced Edition cluster. For more information about how to add a client to the whitelist of the ApsaraDB for HBase Performance-enhanced Edition cluster, see Configure IP address allowlists and security groups.
The endpoint (Java API endpoint) of the ApsaraDB for HBase Performance-enhanced Edition cluster is available in the ApsaraDB for HBase console.
Usage notes
To access the ApsaraDB for HBase Performance-enhanced Edition cluster over the Internet, replace the open source HBase client with the ApsaraDB for HBase client before you perform the data access operation. For more information, see Upgrade ApsaraDB for HBase SDK for Java.
If applications are deployed on an Elastic Compute Service (ECS) instance, and you want to access the ApsaraDB for HBase Performance-enhanced Edition cluster over a virtual private cloud (VPC), make sure that the ApsaraDB for HBase Performance-enhanced Edition cluster and the ECS instance meet the following requirements to ensure network connectivity:
The ApsaraDB for HBase cluster and the ECS instance are deployed in the same region. We recommend that you deploy the cluster and the instance in the same zone to reduce network latency.
The ApsaraDB for HBase Performance-enhanced Edition cluster and the ECS instance belong to the same VPC.
Set connection parameters in Hive
Method 1: Set connection parameters in the
hive-site.xml
configuration file.<configuration> <!-- The Java API endpoint of the cluster. You can obtain the endpoint on the Database Connection page in the ApsaraDB for HBase console. --> <property> <name>hbase.zookeeper.quorum</name> <value>ld-bp150tns0sjxs****-proxy-hbaseue.hbaseue.rds.aliyuncs.com:30020</value> </property> </configuration>
Method 2: Use a command to set connection parameters in Hive Client.
// The Java API endpoint of the cluster. You can obtain the endpoint on the Database Connection page in the ApsaraDB for HBase console. set hbase.zookeeper.quorum=ld-bp150tns0sjxs****-proxy-hbaseue.hbaseue.rds.aliyuncs.com:30020
Procedure
Currently, Hive cannot directly read the underlying files of the ApsaraDB for HBase Performance-enhanced Edition cluster. However, you can read data from and write data to ApsaraDB for HBase Performance-enhanced Edition tables by using external tables.
On the Hive Client page, set the endpoint of the ApsaraDB for HBase Performance-enhanced Edition cluster that you want to access.
hive // The Java API endpoint of the cluster. set hbase.zookeeper.quorum=ld-bp150tns0sjxs****-proxy-hbaseue.hbaseue.rds.aliyuncs.com:30020
Create a table named hive_hbase_table in the ApsaraDB for HBase Performance-enhanced Edition cluster.
CREATE 'hive_hbase_table','cf1'
Create an external table in Hive.
CREATE EXTERNAL TABLE hive_hbase_table(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "hive_hbase_table", "hbase.mapred.output.outputtable" = "hive_hbase_table");
If OK is returned, the external table is created.
Insert data and check whether the data is inserted.
Insert data to the external table in Hive and check whether the data is inserted in the ApsaraDB for HBase Performance-enhanced Edition cluster.
Insert data:
insert into hive_hbase_table values(212,'bab');
Check whether the data is inserted in the ApsaraDB for HBase Performance-enhanced Edition cluster:
scan 'hive_hbase_table'
Sample result:
ROW COLUMN+CELL 212 column=cf1:val, timestamp=2023-03-13T15:35:10.270, value=bab
Insert data to the hive_hbase_table table of the ApsaraDB for HBase Performance-enhanced Edition cluster and check whether the data is inserted in Hive.
Insert data:
put 'hive_hbase_table','213','cf1:val','dadsadasda'
Check whether the data is inserted in Hive:
SELECT * FROM hive_hbase_table;
Sample result:
212 bab 213 dadsadasda