This topic describes how to use JindoFS as the storage backend of HBase.

Background information

HBase is a real-time database in the Hadoop ecosystem and provides high write performance. In E-MapReduce (EMR), you can use JindoFS or Object Storage Service (OSS) as the storage backend of HBase. JindoFS and OSS are more flexible than HDFS.
Note We recommend that you use EMR clusters of V3.36.0 or later.

Configuration of JindoFS

In the example of this topic, a namespace named emr-jfs is created for an EMR V3.36.0 cluster, and an OSS bucket named oss-bucket is configured for the namespace:
  • jfs.namespaces=emr-jfs
  • jfs.namespaces.emr-jfs.oss.uri=oss://<oss-bucket>/oss-dir
  • jfs.namespaces.emr-jfs.mode=block

Specify a storage path for HBase

To specify a storage path for HBase, you must set the hbase.rootdir parameter in the hbase-site configuration file to a JindoFS or OSS path and set the hbase.wal.dir parameter in the file to a local HDFS path. This way, the HDFS service of your EMR cluster is used to store write-ahead logging (WAL) files. To release the cluster, you must disable the tables in the cluster and make sure that the updates in WAL files are written to HFile.

Parameter Description
hbase.rootdir The root directory of HBase in JindoFS or OSS.
Set this parameter to jfs://emr-jfs/hbase-root-dir.
Note emr-jfs is the namespace that you configured.
hbase.wal.dir The local HDFS path in which the WAL files of HBase are stored.
Value values:
  • For a high-availability (HA) cluster, set this parameter to hdfs://emr-cluster/hbase.
  • For a non-HA cluster, set this parameter to hdfs://emr-header-1:9000/hbase.

Create a cluster

When you create a cluster, turn on Custom Software Settings and add custom software configurations. For more information about how to create a cluster, see Create a cluster. Smartdata-3-6
For example, to use JindoFS as the storage backend of HBase, add the following custom configurations. You must replace oss_bucket and the OSS path with the actual OSS bucket name and OSS path.
[
  {
       "ServiceName":"SMARTDATA",
       "FileName":"namespace",
       "ConfigKey":"jfs.namespaces",
       "ConfigValue":"emr-jfs"
  },
  {
       "ServiceName":"SMARTDATA",
       "FileName":"namespace",
       "ConfigKey":"jfs.namespaces.emr-jfs.oss.uri",
       "ConfigValue":"oss://oss-bucket/jindoFS"
  },
  {
       "ServiceName":"SMARTDATA",
       "FileName":"namespace",
       "ConfigKey":"jfs.namespaces.emr-jfs.mode",
       "ConfigValue":"block"
  },
  {
       "ServiceName":"HBASE",
       "FileName":"hbase-site",
       "ConfigKey":"hbase.rootdir",
       "ConfigValue":"jfs://emr-jfs/hbase-root-dir"
  },
  {
       "ServiceName":"HBASE",
       "FileName":"hbase-site",
       "ConfigKey":"hbase.wal.dir",
       "ConfigValue":"hdfs://emr-cluster/hbase"
  }
]

FAQ

  • Problem description: When TableSnapshotInputFormat is used in a MapReduce program to read HBase data, the following error message is returned:
    java.lang.IllegalArgumentException: Wrong FS: jfs://emr-jfs/tmp/..., expected: hdfs://emr-header-1.cluster-*:9000
        at org.apache.hadoop.fs.FileSYstem.checkPath(FileSystem.java:666)
        at org.apache.hadoop.hbase.regionServer.HRegionFileSystem.createRegionOnFileSystem(HRegionFileSYstem.java:875)
  • Cause: An open source MapReduce program that is developed based on HBase has a defect. When the program reads data, it checks whether the data path of HBase is the same as the path that is specified by the fs.defaultFS parameter for HDFS. The error is returned because the paths do not match.
  • Solution:
    • You can use ExportSnapshot to read and export HBase data.
    • If you use TableSnapshotInputFormat to read HBase data, you must go to the HDFS service page of your cluster in the Alibaba Cloud EMR console and click the Configure tab. Then, change the value of the fs.defaultFS parameter on the core-site tab in the Service Configuration section to a root directory whose prefix is jfs. For example, you can change the value of the fs.defaultFS parameter to jfs://emr-jfs/, which is configured in the sample code in this topic. Core-site