Namespace Service of JindoFS supports different metadata storage backends. By default, RocksDB is used. This topic describes how to configure RocksDB as the metadata storage backend.
Background information

Configure RocksDB as the metadata storage backend
- Go to the SmartData service.
- Log on to the Alibaba Cloud EMR console.
- In the top navigation bar, select the region where your cluster resides. Select the resource group as required. By default, all resources of the account appear.
- Click the Cluster Management tab.
- On the Cluster Management page that appears, find the target cluster and click Details in the Actions column.
- In the left-side navigation pane, click Cluster Service and then SmartData.
- Go to the namespace tab for the SmartData service.
- Click the Configure tab.
- Click the namespace tab in the Service Configuration section.
- Set namespace.backend.type to rocksdb.
- Save the configurations.
- In the upper-right corner of the Service Configuration section, click Save.
- In the Confirm Changes dialog box, specify Description and turn on Auto-update Configuration.
- Click OK.
- Select Restart Jindo Namespace Service from the Actions drop-down list in the upper-right corner.
- Optional:Configure a Tablestore instance as the remote asynchronous storage backend.
You can bind an EMR cluster to a Tablestore instance and use the Tablestore instance as an additional storage medium for Namespace Service of JindoFS. EMR asynchronously uploads metadata from the local RocksDB to the Tablestore instance in real time.
Configure the parameters described in the following table on the namespace tab.Parameter Description Example namespace.ots.instance The name of the Tablestore instance. emr-jfs namespace.ots.accessKey The AccessKey ID that is used to access the Tablestore instance. kkkkkk namespace.ots.accessSecret The AccessKey secret that is used to access the Tablestore instance. XXXXXX namespace.ots.endpoint The endpoint of the Tablestore instance. We recommend that you use a VPC endpoint. http://emr-jfs.cn-hangzhou.vpc.tablestore.aliyuncs.com namespace.backend.rocksdb.async.ots.enabled Specifies whether to enable asynchronous upload to Tablestore. Valid values: - true
- false
Set this parameter to true. Make sure that the initialization of the SmartData service is not completed.Note If the initialization is completed, the setting does not take effect because metadata has been generated in the local RocksDB.true - Save the configurations.
- In the upper-right corner of the Service Configuration section, click Save.
- In the Confirm Changes dialog box, specify Description and turn on Auto-update Configuration.
- Click OK.
- Select Start All Components from the Actions drop-down list in the upper-right corner.
Recover metadata from a Tablestore instance
If you have configured a Tablestore instance as the remote asynchronous storage backend for your EMR cluster, a complete replica of JindoFS metadata is stored in the Tablestore instance. After you stop or release the EMR cluster, you can recover JindoFS metadata from the Tablestore instance to a new EMR cluster. This way, you can access the original files from the new EMR cluster.
- Prepare for the recovery.
- Optional:Collect the metadata statistics of the original EMR cluster. The metadata statistics
indicate the numbers of files and folders.
Information similar to the following example is returned:hadoop fs -count jfs://test/
1596 1482809 25 jfs://test/
The number of folders is 1596, and the number of files is 1,482,809.
- Stop the jobs that are running on the original EMR cluster. You may need to wait for
30 to 120 seconds for EMR to synchronize all metadata of the cluster to the Tablestore
instance. Run the following command to view the metadata status. If the command output
contains
_synced=1
, the latest metadata is synchronized to the Tablestore instance.
Information similar to that shown in the following figure is returned.jindo jfs -metaStatus
- Stop or release the original EMR cluster and make sure that no clusters are accessing the Tablestore instance.
- Optional:Collect the metadata statistics of the original EMR cluster. The metadata statistics
indicate the numbers of files and folders.
- Create an EMR cluster.Create an EMR cluster that resides in the same region as the Tablestore instance. Suspend all SmartData components.
- Configure parameters for metadata recovery.Configure the parameters described in the following table on the namespace tab.
Parameter Description Required value namespace.backend.rocksdb.async.ots.enabled Specifies whether to enable asynchronous upload to Tablestore. Valid values: - true
- false
false namespace.backend.rocksdb.recovery.mode Specifies whether to enable metadata recovery from Tablestore. Valid values: - true
- false
true - Save the configurations.
- In the upper-right corner of the Service Configuration section, click Save.
- In the Confirm Changes dialog box, specify Description and turn on Auto-update Configuration.
- Click OK.
- Select Start All Components from the Actions drop-down list in the upper-right corner.
- After the SmartData service of the new EMR cluster is activated, EMR automatically
recovers metadata from the Tablestore instance to the local Raft-RocksDB. You can
run the following command to view the recovery progress:
jindo jfs -metaStatus
If the state is FINISH, the recovery is completed, as shown in the following figure. - Optional:Check whether the numbers of files and folders of the new EMR cluster are consistent
with those of the original EMR cluster.The new EMR cluster is in recovery mode and is read-only.
# Count the numbers of files and folders for the new EMR cluster. The results are consistent with those of the original EMR cluster. [hadoop@emr-header-1 ~]$ hadoop fs -count jfs://test/ 1596 1482809 25 jfs://test/ # Use the CAT or GET command to read data from the files. [hadoop@emr-header-1 ~]$ hadoop fs -cat jfs://test/testfile this is a test file # View file directories. [hadoop@emr-header-1 ~]$ hadoop fs -ls jfs://test/ Found 3 items drwxrwxr-x - root root 0 2020-03-25 14:54 jfs://test/emr-header-1.cluster-50087 -rw-r----- 1 hadoop hadoop 5 2020-03-25 14:50 jfs://test/haha-12096RANDOM.txt -rw-r----- 1 hadoop hadoop 20 2020-03-25 15:07 jfs://test/testfile # Remove a file. A read-only error is returned. [hadoop@emr-header-1 ~]$ hadoop fs -rm jfs://test/testfile java.io.IOException: ErrorCode : 25021 , ErrorMsg: Namespace is under recovery mode, and is read-only.
- Enable asynchronous upload to Tablestore and disable recovery from Tablestore.Configure the parameters described in the following table on the namespace tab.
Parameter Description Required value namespace.backend.rocksdb.async.ots.enabled Specifies whether to enable asynchronous upload to Tablestore. Valid values: - true
- false
true namespace.backend.rocksdb.recovery.mode Specifies whether to enable metadata recovery from Tablestore. Valid values: - true
- false
false - Restart the new EMR cluster.
- Click the Cluster Management tab.
- On the Cluster Management page, find the cluster. In the Actions column of this cluster, click More and select Restart.