This topic describes the cache mode of JindoFileSystem (JindoFS) and its scenarios.

Overview

In cache mode, JindoFS stores data files as objects in Object Storage Service (OSS) and caches data and metadata of these files in the local cluster based on the requirements for accessing these files. This accelerates read and write operations on data and metadata. In addition, the cache mode provides multiple policies for you to synchronize metadata as required.

Scenarios

The cache mode is compatible with original OSS semantics. In cache mode, JindoFS stores data files as objects in OSS and caches data and metadata in the local cluster. This guarantees that JindoFS is compatible with the OSS client, E-MapReduce OssFileSystem, and other OSS interactive applications. You can also access data that exists in OSS before you configure JindoFS, with no need to migrate or convert data. In addition, local caches can be used to accelerate read and write operations on data and metadata.

Configure JindoFS

You can set all JindoFS related-parameters in Bigboot, as shown in the following figure.

Note
  • The parameters framed in red in the preceding figure are required.
  • JindoFS supports multiple namespaces. A namespace named test is used in this topic.
Parameter Description Example
jfs.namespaces The namespace supported by JindoFS. Separate multiple namespaces with commas (,). test
jfs.namespaces.test.uri The storage back end of the test namespace. oss://oss-bucket/
Note You can set the value to a directory in an OSS bucket. In this case, this directory serves as the root directory, in which the test namespace reads and writes data. Generally, you can set the value to an OSS bucket to guarantee that the path is the same as that in OSS.
jfs.namespaces.test.mode The storage mode of the test namespace. cache
jfs.namespaces.test.oss.access.key The AccessKey ID used to access the OSS bucket that serves as the storage back end. xxxx
Note We recommend that you select an OSS bucket in the same region and under the same account as the storage back end of the E-MapReduce cluster for better performance and stability. In this case, the E-MapReduce cluster can access the OSS bucket without using the AccessKey ID and AccessKey secret.
jfs.namespaces.test.oss.access.secret The AccessKey secret used to access the OSS bucket that serves as the storage back end.

Save and deploy the JindoFS configuration. Restart Namespace Service in SmartData to use JindoFS.

Configure the metadata synchronization policy

In cache mode, you may find that some data exists in OSS before you configure JindoFS. After JindoFS is configured, data and metadata are synchronized to JindoFS for future access. At the same time, you can configure the synchronization policy so that JindoFS caches data and metadata in the local cluster. Policies for synchronizing metadata include two types: the interval policy and the loading policy.

  • Interval policy:

    You can set the namespace.sync.interval parameter to specify the synchronization interval. The default value is -1, which indicates that JindoFS does not synchronize metadata from OSS.

    • If you set this parameter to 0, JindoFS synchronizes metadata from OSS each time data is accessed.
    • If you set this parameter to a value greater than 0, JindoFS synchronizes metadata from OSS at intervals of the set value in units of seconds.
      Note For example, if you set this parameter to 5, JindoFS synchronizes metadata from OSS every 5 seconds.
  • Loading policy:

    You can set the namespace.sync.loadtype parameter to specify the loading policy. Valid values are never, once, and always. A value of never indicates that JindoFS never synchronizes metadata from OSS. A value of once indicates that JindoFS synchronizes metadata from OSS only once. This is the default value. A value of always indicates that JindoFS synchronizes metadata from OSS each time data is accessed.

    Note The namespace.sync.loadtype parameter only takes effect when you do not specify the namespace.sync.interval parameter.