E-MapReduce (EMR) lets you configure HBase cluster settings at creation time and access HBase through the shell or big data frameworks.
Prerequisites
Before you begin, ensure that you have:
-
An EMR cluster with the HBase service added. See Create a cluster.
Configure an HBase cluster
Customize HBase settings when you create a cluster. On the Software Settings step, expand Advanced Settings, turn on Custom Software Settings, and provide your configuration in the following JSON format:
{
"configurations": [
{
"classification": "hbase-site",
"properties": {
"hbase.hregion.memstore.flush.size": "268435456",
"hbase.regionserver.global.memstore.size": "0.5",
"hbase.regionserver.global.memstore.lowerLimit": "0.6"
}
}
]
}
The following table lists the default values for common HBase parameters.
| Parameter | Default value | When to adjust |
|---|---|---|
zookeeper.session.timeout |
180000 |
Increase if you see frequent ZooKeeper session timeouts under heavy load. |
hbase.regionserver.global.memstore.size |
0.35 |
Increase (for example, to 0.5) for write-heavy workloads to keep more data in memory before flushing. Keep lowerLimit below this value. |
hbase.regionserver.global.memstore.lowerLimit |
0.3 |
Set to a value lower than hbase.regionserver.global.memstore.size to provide headroom before forced flushes. |
hbase.hregion.memstore.flush.size |
128MB |
Increase for workloads with short bursts of writes so that all writes stay in memory during the burst and are flushed together, reducing I/O overhead. The value in the JSON example (268435456) equals 256 MB. |
Access HBase Shell
-
Connect to the master node of your cluster over SSH. See Log on to a cluster.
-
Start HBase Shell:
hbase shellA successful start produces output similar to the following. The SLF4J binding warnings are expected and do not affect functionality.
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/apps/ecm/service/hbase/1.4.9-1.0.0/package/hbase-1.4.9-1.0.0/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/apps/ecm/service/hadoop/2.8.5-1.5.3/package/hadoop-2.8.5-1.5.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. Version 1.4.9, r8214a16c5d80f077abf1aa01bb312851511a2b15, Thu Jan 31 20:35:22 CST 2019 hbase(main):001:0> -
Run basic HBase Shell commands to verify the connection and explore your data. Create a table The following command creates a table named
contactswith two column families:personalandoffice.hbase(main):001:0> create 'contacts', 'personal', 'office'Write data
hbase(main):002:0> put 'contacts', '1000', 'personal:name', 'John Dole' hbase(main):003:0> put 'contacts', '1000', 'personal:phone', '1-425-000-0001' hbase(main):004:0> put 'contacts', '1000', 'office:phone', '1-425-000-0002' hbase(main):005:0> put 'contacts', '1000', 'office:address', '1111 San Gabriel Dr.'Read data
hbase(main):006:0> get 'contacts', '1000'Scan the table
hbase(main):007:0> scan 'contacts'Delete the table
hbase(main):008:0> disable 'contacts' hbase(main):009:0> drop 'contacts'Run
helpto see all available commands, orexitto quit HBase Shell.
Access HBase from big data frameworks
Use Spark to access HBase
Use the spark-hbase-connector library to read from and write to HBase tables in Spark jobs.
Use Hadoop to access HBase
Use MapReduce to process HBase data. See HBase MapReduce examples for complete code samples.
Use Hive to access HBase
-
Log in to the master node of the Hive cluster and add the ZooKeeper node's IP address to the hosts file. Replace
$zk_ipwith the actual IP address of the ZooKeeper node in the HBase cluster.$zk_ip emr-cluster -
Follow the Hive HBase Integration guide to complete the integration.