How to create an HBase external table - MaxCompute - Alibaba Cloud Documentation Center

After you connect MaxCompute to the ApsaraDB for HBase service, you can create an HBase external table to link to a table in an ApsaraDB for HBase Standard Edition or Performance-enhanced Edition cluster. This lets you synchronize data from HBase to MaxCompute for subsequent extract, transform, and load (ETL) processing, perform federated computing, or export MaxCompute data to tables in HBase. This topic describes how to create an HBase external table.

Background information

ApsaraDB for HBase is a distributed database that is compatible with the standard HBase access protocol. It is available in the following editions: ApsaraDB for HBase Standard Edition, ApsaraDB for HBase Performance-enhanced Edition, and ApsaraDB for HBase Serverless Edition. MaxCompute supports access only to ApsaraDB for HBase Standard Edition and ApsaraDB for HBase Performance-enhanced Edition clusters.

By default, MaxCompute is not connected to the ApsaraDB for HBase service. Before you can create an HBase external table that links to a table in an ApsaraDB for HBase Standard Edition or Performance-enhanced Edition cluster, you must establish a network connection between MaxCompute and the virtual private cloud (VPC) where ApsaraDB for HBase is located. You must also configure the ApsaraDB for HBase whitelist. For more information, see Configure a whitelist.

Note

ApsaraDB for HBase Performance-enhanced Edition is developed based on the Lindorm kernel. You can also create an ApsaraDB for HBase Performance-enhanced Edition instance in the Lindorm console. To create an HBase external table in this scenario, see Lindorm external tables. This topic describes how to create an HBase external table for an HBase instance created in the ApsaraDB for HBase console.

Prerequisites

Before you create an HBase external table, make sure that the following prerequisites are met:

You have created an ApsaraDB for HBase Standard Edition or Performance-enhanced Edition cluster and completed the environment configuration.
For more information about how to purchase an ApsaraDB for HBase Standard Edition cluster and configure its environment, see Quick Start for ApsaraDB for HBase Standard Edition.
For more information about how to purchase an ApsaraDB for HBase Performance-enhanced Edition cluster and configure its environment, see Quick Start for ApsaraDB for HBase Performance-enhanced Edition.
You have created a table in the HBase cluster using an HBase client.
For more information about how to create HBase tables, see Use HBase Shell.
This topic uses a table named `mf_hbase_test_t` as an example. The table is created in an ApsaraDB for HBase Standard Edition or Performance-enhanced Edition cluster using the following commands.
```
-- Create an HBase table.
create 'mf_hbase_test_t', 'f'
-- Insert data. In this example, 1 is the RowKey, f is the column family, col1 is the column name, and hello is the column value.
put 'mf_hbase_test_t', '1', 'f:col1', 'hello'
```
You have created the target MaxCompute project to associate with the HBase table.
For more information about how to create a MaxCompute project, see Create a MaxCompute project.
You have installed the MaxCompute client.
For more information about how to install and configure the MaxCompute client, see Install and configure the MaxCompute client.
You have established a network connection between MaxCompute and ApsaraDB for HBase using a VPC connection solution.
For more information about VPC connection solutions, see Network connection process.

Limits

You can create HBase external tables only in regions that support VPC connections between MaxCompute and ApsaraDB for HBase. The supported regions are China (Beijing), China (Shanghai), China (Zhangjiakou), China (Ulanqab), China (Hangzhou), China (Shenzhen), China (Hong Kong), China East 2 Finance Cloud (Zone F), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), and US (Virginia). For more information about VPC access, see VPC access solution (leased line connection).
After the network is connected, MaxCompute can access only the VPC specified by the VPC ID. To access another region or another VPC in the same region, you must establish a connection between the specified VPC and the other VPCs.
HBase external tables support access only to HBase 2.x clusters.
HBase external tables do not support EMR HBase clusters.
HBase external tables do not support the cluster property.

Usage notes

When you read data from a source HBase table, the following rules apply: If the source table has fewer columns than the external table, the missing columns are filled with NULL values. If the source table has more columns than the external table, the extra columns are ignored.

Syntax for creating an HBase external table

When you create an external table, you must specify `HBaseStorageHandler` in the DDL statement and configure the connection properties for the ApsaraDB for HBase Standard Edition or Performance-enhanced Edition cluster. Table and column names are not case-sensitive during table creation or querying, and you cannot force case conversion. The following syntax is used to create an external table.

-- Enable Hive compatibility mode.
set odps.sql.hive.compatible = true;
-- If the HBase cluster is a Performance-enhanced Edition, you must also configure the odps.sql.hbase.version parameter to switch the HBase version to lindorm.
set odps.sql.hbase.version=lindorm;
-- Create an HBase external table.
create external table if not exists <mc_hbase_external>
(
  <col1_name> <data_type>,
  <col2_name> <data_type>,
  ......
)
-- The handler for the HBase data source.
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties (
                      'hbase.columns.mapping'='<:key,col-family:col-name1,col-family:col-name2,...>')
tblproperties (
-- The hbase.table.name and hbase.zookeeper.quorum parameters are required.
'hbase.table.name'='<hbase_table_name>',
'hbase.zookeeper.quorum'='<VPC ip1|host1:port,VPC ip2|host2:port...>',
-- If the HBase cluster is a Performance-enhanced Edition, the hbase.client.username and hbase.client.password parameters are also required.
'hbase.client.username'='****',
'hbase.client.password'='****',
-- The hbase.zookeeper.property.clientPort, mcfed.zookeeper.session.timeout, and hbase.client.retries.number parameters are optional.
-- For Standard Edition clusters, the default value of hbase.zookeeper.property.clientPort is 2181.
-- For Performance-enhanced Edition clusters, the default value of hbase.zookeeper.property.clientPort is 30020.
['hbase.zookeeper.property.clientPort'='{2181|30020}',]
['mcfed.zookeeper.session.timeout'='<value>',]
['hbase.client.retries.number'='<value>',]
'networklink'='<networklink_name>'
);

mc_hbase_external: Required. The name of the HBase external table to create.
col_name: Required. The name of a column in the HBase external table.
data_type: Required. The data type of the column.
with serdeproperties:
- 'hbase.columns.mapping'='<:key,col-family:col-name1,col-family:col-name2,...>': Required. Specifies the column mappings between the HBase external table and the table in the HBase cluster.
  - The RowKey (:key) must be the first item.
  - The column format is col-family:col-name1,col-family:col-name2. Only one column family can be specified per job.
hbase://VPC ip|host:port: Required. The ZooKeeper Master that is configured on HBase. This is the ZooKeeper endpoint for ApsaraDB for HBase Standard Edition or the Java API endpoint for ApsaraDB for HBase Performance-enhanced Edition. You can also specify the IP address, IP address range, and port of the VPC where the associated HBase table is located. Examples: 192.0.2.1-192.0.2.10:9000, 192.0.2.20:70050 or 192.0.2.20/16:2181. You can obtain this information from the ApsaraDB for HBase console. On the Clusters page, click the target cluster name. In the navigation pane on the left, click Database Connection to view the Connection Information.
- For a distributed data source, you must confirm the frontend IP address and port, and the DataNode IP addresses and ports that you need to access. For an HBase external table, you must provide the IP addresses and ports for ZooKeeper (2181 or 30020), HMaster (16000), and RegionServer (16020). The default ZooKeeper port is 2181 for ApsaraDB for HBase Standard Edition and 30020 for ApsaraDB for HBase Performance-enhanced Edition.
- Because this field supports IP address ranges and IP mapping consumes resources, the number of IP addresses mapped at a time is limited to 2,000 by default.
tblproperties:
- 'hbase.table.name'='<hbase_table_name>': Required. The name of the table in the HBase cluster to which you want to connect. You must create this table in advance using an HBase client.
- 'hbase.zookeeper.quorum'='<VPC ip1|host1:port,VPC ip2|host2:port...>': Required. The IP address or domain name of the ZooKeeper for HBase. This is the ZooKeeper endpoint for ApsaraDB for HBase Standard Edition or the Java API endpoint for ApsaraDB for HBase Performance-enhanced Edition.
- hbase.client.username: Required. This parameter is required only for ApsaraDB for HBase Performance-enhanced Edition clusters. You can obtain the username from the ApsaraDB for HBase console. On the Clusters page, click the target cluster name. In the navigation pane on the left, click Database Connection and find the Default Username in the Connection Information section.
- hbase.client.password: Required. This parameter is required only for ApsaraDB for HBase Performance-enhanced Edition clusters. You can obtain the password from the ApsaraDB for HBase console. On the Clusters page, click the target cluster name. In the navigation pane on the left, click Database Connection and find the Default Password in the Connection Information section.
- 'hbase.zookeeper.property.clientPort'='{2181|30020}': Optional. The ZooKeeper port for HBase. The default port is 2181 for ApsaraDB for HBase Standard Edition and 30020 for ApsaraDB for HBase Performance-enhanced Edition.
- 'mcfed.zookeeper.session.timeout'='<value>': Optional. The ZooKeeper session timeout period. The unit is milliseconds. A value of 30000 is recommended.
- 'hbase.client.retries.number'='<value>': Optional. The number of connection retries for the HBase client.
- 'networklink'='<networklink_name>': Required. The MaxCompute network connection for the VPC where the HBase instance is located.
Note
Configuration items that start with mcfed.zookeeper and hbase are related to ZooKeeper configuration.

Usage example

Log on to the MaxCompute client and run the CREATE EXTERNAL TABLE statement to create an HBase external table.

Sample commands:

ApsaraDB for HBase Standard Edition

set odps.sql.hive.compatible = true;
create external table if not exists mf_hbase_test_t_ext
(
  key  int,
  col1 string,
  col2 string
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties (
                      'hbase.columns.mapping'=':key,f:col1,f:col2')
tblproperties (
'hbase.table.name'='mf_hbase_test_t',
'hbase.zookeeper.quorum'='hb-2zef****-master1-001.hbase.rds.aliyuncs.com,hb-2zef****-master2-001.hbase.rds.aliyuncs.com,hb-2zef****-master3-001.hbase.rds.aliyuncs.com',
'hbase.zookeeper.property.clientPort'='2181',
'networklink'='networklink4hbase');

ApsaraDB for HBase Performance-enhanced Edition

set odps.sql.external.net.vpc=true;
set odps.sql.hive.compatible = true;
set odps.sql.hbase.version=lindorm;
CREATE EXTERNAL TABLE mf_hbase_test_t_ext
(
  key  int,
  col1 string,
  col2 string
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties (
                      'hbase.columns.mapping'=':key,f:col1,f:col2')
tblproperties (
'hbase.table.name'='mf_hbase_test_t',
'hbase.zookeeper.quorum'='hb-2zef****-master1-001.hbase.rds.aliyuncs.com,hb-2zef****-master2-001.hbase.rds.aliyuncs.com,hb-2zef****-master3-001.hbase.rds.aliyuncs.com',
'hbase.zookeeper.property.clientPort'='2181',
'networklink'='networklink4hbase');

Run the select command to view the association result and confirm that the association is successful.

Sample command:

select * from mf_hbase_test_t_ext;

The following result is returned.

+------+------------+------------+
| key  | col1       | col2       |
+------+------------+------------+
| 1    | hello      | NULL       |
+------+------------+------------+

Run the INSERT OVERWRITE statement to write data to the HBase table.
Sample command:
```
INSERT OVERWRITE TABLE mf_hbase_test_t_ext SELECT '2', 'hbase', 'mftestinsert';
```

Run the SELECT statement to verify the written data.

Sample command:

select * from mf_hbase_test_t_ext;

The following result is returned.

+------+------------+--------------------+
| key  | col1       | col2               |
+------+------------+--------------------+
| 1    | hello      | NULL               |
| 2    | hbase      | mftestinsert       |
+------+------------+--------------------+