Use HBase snapshots to back up your E-MapReduce (EMR) HBase cluster data and restore it to another cluster via Object Storage Service (OSS).
How it works
HBase snapshots capture a point-in-time view of a table without copying data, so the operation completes almost instantly and has minimal impact on cluster performance. The snapshot references the underlying HFiles, and as long as the snapshot exists, those files are preserved even if the original data is later deleted.
To move a snapshot between clusters, you export it to an OSS bucket as an intermediate store. The destination cluster then imports the snapshot from OSS and restores the data.
Prerequisites
Before you begin, ensure that you have:
Two Hadoop clusters with the HBase and ZooKeeper services installed. For setup instructions, see Create a cluster.
SSH access to the master node of each cluster. For instructions, see Connect to the master node of an EMR cluster in SSH mode.
An OSS bucket accessible from both clusters via the internal endpoint.
Back up and restore an HBase cluster
Step 1: Prepare test data
Log on to the master node of the source cluster using SSH.
Open HBase Shell.
hbase shellCreate a table.
create 'test','cf'Add data to the table.
put 'test','a','cf:c1',1 put 'test','a','cf:c2',2 put 'test','b','cf:c1',3 put 'test','b','cf:c2',4 put 'test','c','cf:c1',5 put 'test','c','cf:c2',6Exit HBase Shell.
exit
Step 2: Create a snapshot
Create a snapshot of the table.
hbase snapshot create -n test_snapshot -t testOpen HBase Shell to verify the snapshot was created.
hbase shellList snapshots.
list_snapshotsThe output is similar to:
SNAPSHOT TABLE + CREATION TIME test_snapshot test (Tue Aug 18 14:35:28 +0800 2020) 1 row(s) in 0.2450 seconds => ["test_snapshot"]Exit HBase Shell.
exit
Step 3: Export the snapshot to OSS
Export the snapshot to your OSS bucket using the internal endpoint.
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot test_snapshot -copy-to oss://$accessKeyId:$accessKeySecret@$bucket.oss-cn-hangzhou-internal.aliyuncs.com/hbase/snapshot/testStep 4: Import the snapshot to the destination cluster
Log on to the master node of the destination cluster using SSH.
Import the snapshot from OSS to the local HDFS.
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot test_snapshot -copy-from oss://$accessKeyId:$accessKeySecret@$bucket.oss-cn-hangzhou-internal.aliyuncs.com/hbase/snapshot/test -copy-to /hbase/
Step 5: Restore data from the snapshot
Open HBase Shell on the destination cluster.
hbase shellRestore the table from the snapshot.
restore_snapshot 'test_snapshot'Verify the restored data.
scan 'test'The output is similar to:
ROW COLUMN+CELL a column=cf:c1, timestamp=1472992081375, value=1 a column=cf:c2, timestamp=1472992090434, value=2 b column=cf:c1, timestamp=1472992104339, value=3 b column=cf:c2, timestamp=1472992099611, value=4 c column=cf:c1, timestamp=1472992112657, value=5 c column=cf:c2, timestamp=1472992118964, value=6 3 row(s) in 0.0540 seconds
Step 6: Clone a new table from the snapshot
Use clone_snapshot to create an independent copy of the table without a full data copy.
Clone the snapshot into a new table.
clone_snapshot 'test_snapshot','test_2'Verify the data in the new table.
scan 'test_2'The output is similar to:
ROW COLUMN+CELL a column=cf:c1, timestamp=1472992081375, value=1 a column=cf:c2, timestamp=1472992090434, value=2 b column=cf:c1, timestamp=1472992104339, value=3 b column=cf:c2, timestamp=1472992099611, value=4 c column=cf:c1, timestamp=1472992112657, value=5 c column=cf:c2, timestamp=1472992118964, value=6 3 row(s) in 0.0540 seconds