Namespace Service of JindoFS supports different metadata storage backends. By default, RocksDB is used. This topic describes how to configure RocksDB as the metadata storage backend.

Background information

RocksDB cannot be configured in high availability mode. If you need to configure a metadata storage backend in high availability mode, we recommend that you use Raft instances. For more information, see Use Raft-RocksDB-Tablestore to store metadata.
The following figure shows the structure of a single RocksDB instance for Namespace Service.RocksDB

Configure RocksDB as the metadata storage backend

  1. Go to the SmartData service.
    1. Log on to the Alibaba Cloud EMR console.
    2. In the top navigation bar, select the region where your cluster resides. Select the resource group as required. By default, all resources of the account appear.
    3. Click the Cluster Management tab.
    4. On the Cluster Management page that appears, find the target cluster and click Details in the Actions column.
    5. In the left-side navigation pane, click Cluster Service and then SmartData.
  2. Go to the namespace tab for the SmartData service.
    1. Click the Configure tab.
    2. Click the namespace tab in the Service Configuration section.
      namespace
  3. Set namespace.backend.type to rocksdb.
  4. Save the configurations.
    1. In the upper-right corner of the Service Configuration section, click Save.
    2. In the Confirm Changes dialog box, specify Description and turn on Auto-update Configuration.
    3. Click OK.
  5. Select Restart Jindo Namespace Service from the Actions drop-down list in the upper-right corner.
  6. Optional:Configure a Tablestore instance as the remote asynchronous storage backend.

    You can bind an EMR cluster to a Tablestore instance and use the Tablestore instance as an additional storage medium for Namespace Service of JindoFS. EMR asynchronously uploads metadata from the local RocksDB to the Tablestore instance in real time.

    Configure the parameters described in the following table on the namespace tab.
    Parameter Description Example
    namespace.ots.instance The name of the Tablestore instance. emr-jfs
    namespace.ots.accessKey The AccessKey ID that is used to access the Tablestore instance. kkkkkk
    namespace.ots.accessSecret The AccessKey secret that is used to access the Tablestore instance. XXXXXX
    namespace.ots.endpoint The endpoint of the Tablestore instance. We recommend that you use a VPC endpoint. http://emr-jfs.cn-hangzhou.vpc.tablestore.aliyuncs.com
    namespace.backend.rocksdb.async.ots.enabled Specifies whether to enable asynchronous upload to Tablestore. Valid values:
    • true
    • false
    Set this parameter to true. Make sure that the initialization of the SmartData service is not completed.
    Note If the initialization is completed, the setting does not take effect because metadata has been generated in the local RocksDB.
    true
  7. Save the configurations.
    1. In the upper-right corner of the Service Configuration section, click Save.
    2. In the Confirm Changes dialog box, specify Description and turn on Auto-update Configuration.
    3. Click OK.
  8. Select Start All Components from the Actions drop-down list in the upper-right corner.

Recover metadata from a Tablestore instance

If you have configured a Tablestore instance as the remote asynchronous storage backend for your EMR cluster, a complete replica of JindoFS metadata is stored in the Tablestore instance. After you stop or release the EMR cluster, you can recover JindoFS metadata from the Tablestore instance to a new EMR cluster. This way, you can access the original files from the new EMR cluster.

  1. Prepare for the recovery.
    1. Optional:Collect the metadata statistics of the original EMR cluster. The metadata statistics indicate the numbers of files and folders.
      hadoop fs -count jfs://test/
      Information similar to the following example is returned:
       1596      1482809                 25 jfs://test/

      The number of folders is 1596, and the number of files is 1,482,809.

    2. Stop the jobs that are running on the original EMR cluster. You may need to wait for 30 to 120 seconds for EMR to synchronize all metadata of the cluster to the Tablestore instance. Run the following command to view the metadata status. If the command output contains _synced=1, the latest metadata is synchronized to the Tablestore instance.
      jindo jfs -metaStatus
      Information similar to that shown in the following figure is returned.check_statu
    3. Stop or release the original EMR cluster and make sure that no clusters are accessing the Tablestore instance.
  2. Create an EMR cluster.
    Create an EMR cluster that resides in the same region as the Tablestore instance. Suspend all SmartData components.
  3. Configure parameters for metadata recovery.
    Configure the parameters described in the following table on the namespace tab.
    Parameter Description Required value
    namespace.backend.rocksdb.async.ots.enabled Specifies whether to enable asynchronous upload to Tablestore. Valid values:
    • true
    • false
    false
    namespace.backend.rocksdb.recovery.mode Specifies whether to enable metadata recovery from Tablestore. Valid values:
    • true
    • false
    true
  4. Save the configurations.
    1. In the upper-right corner of the Service Configuration section, click Save.
    2. In the Confirm Changes dialog box, specify Description and turn on Auto-update Configuration.
    3. Click OK.
  5. Select Start All Components from the Actions drop-down list in the upper-right corner.
  6. After the SmartData service of the new EMR cluster is activated, EMR automatically recovers metadata from the Tablestore instance to the local Raft-RocksDB. You can run the following command to view the recovery progress:
    jindo jfs -metaStatus
    If the state is FINISH, the recovery is completed, as shown in the following figure.jfs -metaStatus
  7. Optional:Check whether the numbers of files and folders of the new EMR cluster are consistent with those of the original EMR cluster.
    The new EMR cluster is in recovery mode and is read-only.
    # Count the numbers of files and folders for the new EMR cluster. The results are consistent with those of the original EMR cluster.
    [hadoop@emr-header-1 ~]$ hadoop fs -count jfs://test/
            1596      1482809                 25 jfs://test/
    
    # Use the CAT or GET command to read data from the files.
    [hadoop@emr-header-1 ~]$ hadoop fs -cat jfs://test/testfile
    this is a test file
    # View file directories.
    [hadoop@emr-header-1 ~]$ hadoop fs -ls jfs://test/
    Found 3 items
    drwxrwxr-x   - root   root            0 2020-03-25 14:54 jfs://test/emr-header-1.cluster-50087
    -rw-r-----   1 hadoop hadoop          5 2020-03-25 14:50 jfs://test/haha-12096RANDOM.txt
    -rw-r-----   1 hadoop hadoop         20 2020-03-25 15:07 jfs://test/testfile
    
    # Remove a file. A read-only error is returned.
    [hadoop@emr-header-1 ~]$ hadoop fs -rm jfs://test/testfile
    java.io.IOException: ErrorCode : 25021 , ErrorMsg: Namespace is under recovery mode, and is read-only.
  8. Enable asynchronous upload to Tablestore and disable recovery from Tablestore.
    Configure the parameters described in the following table on the namespace tab.
    Parameter Description Required value
    namespace.backend.rocksdb.async.ots.enabled Specifies whether to enable asynchronous upload to Tablestore. Valid values:
    • true
    • false
    true
    namespace.backend.rocksdb.recovery.mode Specifies whether to enable metadata recovery from Tablestore. Valid values:
    • true
    • false
    false
  9. Restart the new EMR cluster.
    1. Click the Cluster Management tab.
    2. On the Cluster Management page, find the cluster. In the Actions column of this cluster, click More and select Restart.