OSS-HDFS (JindoFS service) is compatible with the Hadoop Distributed File System (HDFS) API, which lets HBase on an E-MapReduce (EMR) cluster store both data files and write-ahead logging (WAL) files in OSS instead of on local disks. This decouples storage from computing, so you can:
Keep HBase data persistent outside the cluster — release the cluster without losing data.
Size the cluster for compute requirements rather than storage requirements.
Store WAL files in OSS-HDFS so that a new cluster pointed at the same root directory can recover in-flight writes.
Prerequisites
Before you begin, ensure that you have:
An EMR cluster running EMR V3.42.0 or later, or EMR V5.8.0 or later. See Create a cluster.
OSS-HDFS enabled for an OSS bucket with access permissions granted. See Enable OSS-HDFS and grant access permissions.
Configure HBase to use OSS-HDFS
Step 1: Connect to the EMR cluster
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
Click the EMR cluster that you created.
Click the Nodes tab, then click the plus icon (
) on the left side of the node group.Click the ID of the ECS instance. On the Instances page, click Connect next to the instance ID.
For other connection methods (SSH key pair or SSH password on Windows or Linux), see Log on to a cluster.
Step 2: Set the HBase root directory to OSS-HDFS
In the hbase-site configuration file, set hbase.rootdir to the OSS-HDFS path of your bucket:
hbase.rootdir = oss://<bucket-name>.<endpoint>/<hbase-root-dir>Replace the placeholders with your actual values:
| Placeholder | Description | Example |
|---|---|---|
<bucket-name> | Name of your OSS bucket with OSS-HDFS enabled | my-hbase-bucket |
<endpoint> | OSS-HDFS endpoint for your bucket's region | — |
<hbase-root-dir> | Path within the bucket for the HBase root directory | hbase-root |
After the change, HBase writes WAL files to OSS-HDFS.
Before releasing the cluster, disable all HBase tables and make sure that all update operations performed on WAL files are synchronized to the HFiles to avoid data loss.