HBase is a column-oriented distributed storage system that has high reliability, performance, and scalability. This topic describes how to use HBase in E-MapReduce (EMR).

Background information

The following figure shows the architecture of HBase in EMR. HBase

You can use JindoFS or Object Storage Service (OSS) as the storage backend of HBase.

HBase has the following benefits:
  • Processes terabytes or even petabytes of data.
  • Provides a high throughput.
  • Supports efficient random reads of large amounts of data.
  • Provides a high scalability.
HBase consists of the following components:
  • HMaster: deployed on the master node of an EMR cluster.
    Note For a high-availability (HA) EMR cluster, two HMasters are deployed.
  • RegionServer: deployed on the core and task nodes of an EMR cluster.

For more information about Apache HBase, visit the Apache HBase official website.

Use HBase