HBase development manual

Last Updated: Nov 03, 2016

We recommend the following configurations for your better use of HBase during cluster creation:

  • Select the zone where the application servers accessing HBase are located.

  • Select four or more hardware nodes, including the master and core nodes. E-MapReduce will create namenode, datanode, journalnode, hmaster, regionserver and zookeeper roles on these nodes.

  • Select 4-core 16G/8-core 32G or higher models for servers. Low configurations may impair the stable operation of HBase clusters.

  • Select SSD cloud disks as the data disk to enjoy better cost effectiveness. You can select basic cloud disk for less-accessed businesses with a large storage volume.

  • Configure the data capacity according to your actual demand.

  • HBase cluster supports resizing.

HBase configurations

When you create a HBase cluster, you can optimize and modify the default parameter configurations of HBase on the creation page using Software Configuration based on application scenarios, as shown below:

  1. {
  2. "configurations": [
  3. {
  4. "classification": "hbase-site",
  5. "properties": {
  6. "hbase.hregion.memstore.flush.size": "268435456",
  7. "hbase.regionserver.global.memstore.size": "0.5",
  8. "hbase.regionserver.global.memstore.lowerLimit": "0.6"
  9. }
  10. }
  11. ]
  12. }

Some default configurations of HBase cluster are as follows:

key value
zookeeper.session.timeout 180000
hbase.regionserver.global.memstore.size 0.35
hbase.regionserver.global.memstore.lowerLimit 0.3
hbase.hregion.memstore.flush.size 128MB

HBase access

Note:

  • Out of considerations of network performances, we recommend you initiate access requests to HBase clusters created through E-MapReduce from ECS in the same zone.

  • The ECS accessing HBase clusters must be in the same security group with the HBase clusters, otherwise the access may fail. So please make sure to select the same security group with the HBase cluster when you create a Hadoop/Spark/Hive cluster in E-MapReduce for accessing HBase.

After you create the HBase cluster through the E-MapReduce console, you can start to use the HBase service. The procedure is as follows:

  1. Get the master IP address and cluster ZK address. On the cluster details page of the E-MapReduce console, you can view the IP address of the master node of the cluster and ZK access address (intranet IP address), as shown in the figure below:

    hbasedevelop

    For master nodes that have enabled public IP addresses, you can refer to How to Log in to Master Node to view the WEB UI of HMaster (localhost:16010).

  2. Connect to the master node of the cluster through SSH to use HBase Shell. You can connect to the master node of the cluster through SSH directly, switch to the HDFS user and visit the cluster through HBase Shell (For more about HBase Shell, see Apache HBase Official Website).

    1. [root@emr-header-1 ~]# su hdfs
    2. [hadoop@emr-header-1 root]$ hbase shell
    3. HBase Shell; enter 'help<RETURN>' for list of supported commands.
    4. Type "exit<RETURN>" to leave the HBase Shell
    5. Version 1.1.1, r374488, Fri Aug 21 09:18:22 CST 2015
    6. hbase(main):001:0>
  3. Visit the cluster using HBase Shell from another ECS node (within the same security group). Download HBase-1.x packages from the official website of Apache HBase (Download link). After unzipping the package, modify conf/hbase-site.xml and add the ZK address of the cluster, as shown below:

    1. <configuration>
    2. <property>
    3. <name>hbase.zookeeper.quorum</name>
    4. <value>$ZK_IP1,$ZK_IP2,$ZK_IP3</value>
    5. </property>
    6. </configuration>

    Then you can visit the cluster through command bin/hbase shell.

    If the ECS was created through E-MapReduce, you only need to modify /etc/emr/hbase-conf/hbase-site.xml without downloading HBase-1.x packages.

  4. Visit the HBase cluster through APIs and introduce Maven dependency.

    1. <groupId>org.apache.hbase</groupId>
    2. <artifactId>hbase-client</artifactId>
    3. <version>1.1.1</version>

    Configure the correct ZK address to connect to the cluster.

    1. Configuration config = HBaseConfiguration.create();
    2. config.set(HConstants.ZOOKEEPER_QUORUM,"$ZK_IP1,$ZK_IP2,$ZK_IP3");
    3. Connection connection = ConnectionFactory.createConnection(config);
    4. try {
    5. Table table = connection.getTable(TableName.valueOf("myLittleHBaseTable"));
    6. try {
    7. //Do table operation
    8. }finally {
    9. if (table != null) table.close();
    10. }
    11. } finally {
    12. connection.close();
    13. }

For more about development, see Apache HBase Official Website.

Example

Prerequisites

The ECS accessing the Hbase cluster must be in the same security group as the HBase cluster.

Spark access to Hbase

See spark-hbase-connector.

Hadoop access to Hbase

See HBase MapReduce examples.

Hive access to Hbase

Only Hive in clusters of E-MapReduce 1.2.0 or above can visit the Hbase cluster. Steps are as follows:

  1. Log in to the Hive cluster, modify hosts and add the line below:

    1. $zk_ip emr-cluster //$zk_ip is the ZK node IP address of the Hbase cluster
  2. For specific Hive operations, see Hive HBase Integration.

Thank you! We've received your feedback.