Configure and recover JindoFS metadata with Raft-RocksDB-Tablestore - E-MapReduce

Available from EMR V3.27.0, Raft-RocksDB-Tablestore provides high-availability metadata storage for JindoFS Namespace Service. A three-node Raft cluster distributes metadata across three master nodes using local RocksDB. Optionally, EMR asynchronously replicates metadata to a Tablestore instance, creating an off-cluster backup that survives cluster termination and enables recovery to a new cluster.

How it works

A single Raft instance runs across three master nodes. Each node uses local RocksDB to persist metadata. The Raft protocol synchronizes writes across all three nodes, so a single node failure does not cause data loss.

When Tablestore is configured as the remote backend, EMR asynchronously uploads metadata from local RocksDB to Tablestore in real time. This creates a complete off-cluster replica.

Prerequisites

Before you begin, ensure that you have:

A Tablestore high-performance instance with the transaction feature enabled. For instructions, see Create instances.
An E-MapReduce (EMR) cluster with three master nodes. For instructions, see Create a cluster.

Configure a Raft instance as the local storage backend

Log on to the Alibaba Cloud EMR console.
In the top navigation bar, select the region where your cluster resides and select a resource group.
Click the Cluster Management tab.
Find your cluster and click Details in the Actions column.
In the left-side navigation pane, choose Cluster Service > SmartData.
From the Actions drop-down list in the upper-right corner, select Stop All Components.
Configure Namespace Service parameters. For details, see Use JindoFS in block storage mode.
Go to the namespace configuration tab:
1. In the left-side navigation pane, choose Cluster Service > SmartData.
2. Click the Configure tab.
3. In the Service Configuration section, click the namespace tab.

Configure the following parameters:

Parameter	Description	Required value
`namespace.backend.type`	Backend storage type for Namespace Service. Set to `raft`. Valid values: `rocksdb`, `ots`, `raft`. Default: `rocksdb`.	`raft`
`namespace.backend.raft.initial-conf`	Addresses of the three master nodes where the Raft instance runs. This value is fixed.	`emr-header-1:8103:0,emr-header-2:8103:0,emr-header-3:8103:0`
`jfs.namespace.server.rpc-address`	Addresses of the three master nodes for client access to the Raft instance. This value is fixed.	`emr-header-1:8101,emr-header-2:8101,emr-header-3:8101`

(Optional) Configure Tablestore as the remote asynchronous backup. Without Tablestore, metadata exists only in local RocksDB across the three master nodes. If all three nodes are lost or the cluster is terminated, metadata cannot be recovered. Configure Tablestore to create an off-cluster replica that enables recovery to a new cluster. Configure the following parameters on the namespace tab:

Warning

Set namespace.backend.raft.async.ots.enabled to true before SmartData service initialization completes. Once metadata has been written to local RocksDB, this setting has no effect.

Parameter	Description	Required value
`namespace.ots.instance`	Name of the Tablestore instance.	`emr-jfs` (example)
`namespace.ots.accessKey`	AccessKey ID used to access the Tablestore instance.	Your AccessKey ID
`namespace.ots.accessSecret`	AccessKey secret used to access the Tablestore instance.	Your AccessKey secret
`namespace.ots.endpoint`	Endpoint of the Tablestore instance. Use a VPC endpoint for lower latency.	`http://emr-jfs.cn-hangzhou.vpc.tablestore.aliyuncs.com` (example)
`namespace.backend.raft.async.ots.enabled`	Enable asynchronous upload to Tablestore. Set to `true`.	`true`

Save the configuration:
1. In the upper-right corner of the Service Configuration section, click Save.
2. In the Confirm Changes dialog box, enter a description and turn on Auto-update Configuration.
3. Click OK.
From the Actions drop-down list in the upper-right corner, select Start All Components.

Recover metadata from a Tablestore instance

When Tablestore is configured as the remote backend, a complete replica of JindoFS metadata is stored in Tablestore. After stopping or releasing the original EMR cluster, you can recover the metadata to a new cluster and regain access to the original files.

Step 1: Prepare the original cluster for shutdown

(Optional) Record the current file and folder counts for later verification:

hadoop fs -count jfs://test/

Example output:

      1596      1482809                 25 jfs://test/
(folders)      (files)

Stop all jobs running on the original cluster.
Wait 30–120 seconds for EMR to sync all metadata to Tablestore, then verify sync status:
```
jindo jfs -metaStatus -detail
```
The sync is complete when the LEADER node shows _synced=1 in the output:
```
...
LEADER   emr-header-1:8103   _synced=1
FOLLOWER emr-header-2:8103   ...
FOLLOWER emr-header-3:8103   ...
```
Stop or release the original EMR cluster. Make sure no clusters are accessing the Tablestore instance before proceeding.

Step 2: Create a new cluster and configure recovery mode

Create an EMR cluster in the same region as the Tablestore instance, then stop all SmartData components. Follow the steps in Configure a Raft instance as the local storage backend.

On the namespace tab, configure the following parameters:

Parameter	Description	Required value
`namespace.backend.raft.async.ots.enabled`	Enable asynchronous upload to Tablestore. Set to `false` during recovery.	`false`
`namespace.backend.raft.recovery.mode`	Enable metadata recovery from Tablestore. Set to `true`.	`true`

Save the configuration:
1. In the upper-right corner of the Service Configuration section, click Save.
2. In the Confirm Changes dialog box, enter a description and turn on Auto-update Configuration.
3. Click OK.
From the Actions drop-down list in the upper-right corner, select Start All Components.

Step 3: Monitor recovery progress

After SmartData starts, EMR automatically pulls metadata from Tablestore into local Raft-RocksDB. Track progress with:

jindo jfs -metaStatus -detail

Recovery is complete when the LEADER node state is FINISH:

...
LEADER   emr-header-1:8103   state=FINISH
FOLLOWER emr-header-2:8103   ...
FOLLOWER emr-header-3:8103   ...

During recovery, the cluster is read-only. Write operations return:

java.io.IOException: ErrorCode : 25021 , ErrorMsg: Namespace is under recovery mode, and is read-only.

Step 4: Verify data integrity

While the cluster is still in read-only mode, verify that file and folder counts match the original cluster:

hadoop fs -count jfs://test/

Expected output (matches the counts recorded in step 1):

      1596      1482809                 25 jfs://test/

Read files to confirm data is accessible:

hadoop fs -cat jfs://test/testfile
hadoop fs -ls jfs://test/

Step 5: Exit recovery mode and resume normal operation

On the namespace tab, update the following parameters:

Parameter	Description	Required value
`namespace.backend.raft.async.ots.enabled`	Re-enable asynchronous upload to Tablestore.	`true`
`namespace.backend.raft.recovery.mode`	Disable recovery mode.	`false`

Save the configuration using the same procedure as before.
Restart the cluster:
1. Click the Cluster Management tab.
2. Find the cluster. In the Actions column, click More and select Restart.