This topic describes the cache mode of JindoFileSystem (JindoFS) and its use scenarios.

Overview

In cache mode, JindoFS stores data files as objects in Object Storage Service (OSS) and caches data and metadata of these files in the local cluster based on the requirements for accessing these files. This accelerates read and write operations on data and metadata. In addition, the cache mode provides multiple policies for you to synchronize metadata as required.

Scenarios

The cache mode is compatible with original OSS semantics. In cache mode, JindoFS stores data files as objects in OSS and caches data and metadata in the local cluster. This ensures that JindoFS is compatible with the OSS client, E-MapReduce (EMR) OssFileSystem, and other OSS interactive applications. You can also access data that is stored in OSS before you configure JindoFS, without the need to migrate or convert data. In addition, local caches accelerate read and write operations on data and metadata.

Configure JindoFS

You can configure all parameters related to JindoFS in Bigboot, as shown in the following figure.

Figure 1. Update a ConfigMap
server_config
Figure 2. Add Configuration Item
cong_sel
Note
  • The parameters framed in red in the preceding figure are required.
  • JindoFS supports multiple namespaces. A namespace named test is used in this topic.
Parameter Description Example
jfs.namespaces The namespace supported by JindoFS. Separate multiple namespaces with commas (,). test
jfs.namespaces.test.uri The storage backend of the test namespace. oss://oss-bucket/
Note You can set the value to a directory in an OSS bucket. In this case, this directory serves as the root directory, in which the test namespace reads and writes data. Generally, you can set the value to an OSS bucket to ensure that the path is the same as that in OSS.
jfs.namespaces.test.mode The storage mode of the test namespace. Set this parameter to cache. cache
jfs.namespaces.test.oss.access.key The AccessKey ID used to access the OSS bucket that serves as the storage backend. xxxx
Note We recommend that you store data in an OSS bucket that is in the same region and under the same account as your EMR cluster. This ensures high performance and stability. In this case, you do not need to configure the AccessKey ID and AccessKey secret because the OSS bucket allows password-free access from the EMR cluster.
jfs.namespaces.test.oss.access.secret The AccessKey secret used to access the OSS bucket that serves as the storage backend.

Save and deploy the JindoFS configuration. Restart Namespace Service in SmartData to use JindoFS.

cofig

Configure a metadata synchronization policy

In cache mode, some data may already exist in OSS before you configure JindoFS. In this scenario, after JindoFS is configured, the data and metadata are synchronized to JindoFS for future access. The data synchronization policy is that data is cached in the local cluster each time data is accessed. JindoFS supports two types of metadata synchronization policies: interval policy and loading policy.

  • Interval policy:

    You can set the namespace.sync.interval parameter to specify the synchronization interval. The default value is -1, which indicates that JindoFS does not synchronize metadata from OSS.

    • If you set this parameter to 0, JindoFS synchronizes metadata from OSS each time data is accessed.
    • If you set this parameter to a value greater than 0, JindoFS synchronizes metadata from OSS at intervals of the set value in units of seconds.
      Note For example, if you set this parameter to 5, JindoFS synchronizes metadata from OSS every 5 seconds.
  • Loading policy:

    You can set the namespace.sync.loadtype parameter to specify the loading policy. Valid values are never, once, and always. never indicates that JindoFS never synchronizes metadata from OSS. once indicates that JindoFS synchronizes metadata from OSS only once. This is the default value. always indicates that JindoFS synchronizes metadata from OSS each time data is accessed.

    Note The namespace.sync.loadtype parameter takes effect only when you do not specify the namespace.sync.interval parameter.