All Products
Search
Document Center

Object Storage Service:Use JindoFuse to access OSS-HDFS

Last Updated:Apr 25, 2024

If you want to map OSS-HDFS to a local file system and then access objects in OSS-HDFS by calling the standard HDFS API and perform operations, such as reading, writing, and deleting objects, you can use JindoFuse to access OSS-HDFS. JindoFuse is a tool that allows you to access open source distributed file systems and is compatible with POSIX. JindoFuse allows AI applications to directly use OSS-HDFS for data storage and processing.

Prerequisites

OSS-HDFS is enabled for a bucket and permissions are granted to access OSS-HDFS. For more information, see Enable OSS-HDFS and grant access permissions.

Preparations

You can use one of the following methods to access OSS-HDFS:

  • If you want to access OSS-HDFS by using an Alibaba Cloud EMR cluster, make sure that an EMR cluster whose version is 3.44.0 or later or 5.10.0 or later is created. EMR clusters that meet the preceding requirements are integrated with JindoFuse by default. For more information, see Create a cluster.

  • If you do not want to access OSS-HDFS by using an Alibaba Cloud EMR cluster, make sure that JindoSDK 4.6.2 or later is installed and deployed. For more information, see Deploy JindoSDK in an environment other than EMR.

Procedure

  1. Configure environment variables.

    • If you want to access OSS-HDFS by using an Alibaba Cloud EMR cluster, skip this step and proceed to Step 2.

    • If you do not want to access OSS-HDFS by using an Alibaba Cloud EMR cluster, perform the following steps to configure JindoFuse:

      1. Connect to the ECS instance. For more information, see Connect to an instance.

      2. Modify environment variables.

        In this example, jindosdk-x.x.x is installed in the root/ path. x.x.x indicates the version number of JindoSDK. Modify the environment variables based on the actual path in which JindoSDK is installed.

        export JINDOSDK_HOME=/root/jindosdk-x.x.x
        export HADOOP_CLASSPATH=`hadoop classpath`:${JINDOSDK_HOME}/lib/*
        export JINDOSDK_CONF_DIR=/root/jindosdk-x.x.x/conf
        export PATH=$PATH:$JINDOSDK_HOME/bin
        export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${JINDOSDK_HOME}/lib/native
      3. Configure the configuration file.

        1. Create a configuration file named jindosdk.cfg in the conf/ directory of JindoSDK.

        2. Add the following configuration items to the jindosdk.cfg configuration file:

          [common]
          logger.dir = /tmp/fuse-log
          
          [jindosdk]
          <!-- In this example, the China (Hangzhou) region is used. Specify your actual region.  -->
          fs.oss.endpoint = cn-hangzhou.oss-dls.aliyuncs.com
          <! -- Configure the AccessKey ID and AccessKey secret that are used to access OSS-HDFS.  -->
          fs.oss.accessKeyId = LTAI********   
          fs.oss.accessKeySecret = KZo1********  
  2. Mount OSS-HDFS.

    1. Run the following command to create a mount point:

      mkdir -p <mount-point>
    2. Run the following command to mount OSS-HDFS:

      jindo-fuse <mount_point> -ouri=[<oss_path>]

      You must set -ouri to the dls path that you want to map. The path can be the root directory or a subdirectory of the bucket. After you run the command, a daemon process in the background starts to mount the <oss_path> that you specified to the mount point of the local file system. The mount point is specified by <mount_point>.

      For more information about the mount options that you can configure when you mount OSS-HDFS, see Appendix 2: Mount options.

    3. Run the following command to check whether OSS-HDFS is mounted:

      ps -ef | grep jindo-fuse

      If the following result is returned, OSS-HDFS is mounted:

      root      2162     1  0 13:21 ?        00:00:00 jindo-fuse <mount_point> -ouri=[<oss_path>]
      root      2714  2640  0 13:39 pts/0    00:00:00 grep --color=auto jindo-fuse
  3. Use JindoFuse to perform read and write operations on objects in OSS-HDFS.

    • Create a directory

      mkdir /mnt/oss/dir1
    • List all subdirectories in the /mnt/oss/ directory

      ls /mnt/oss/
    • Write an object

      echo "hello world" > /mnt/oss/dir1/hello.txt
    • Read an object

      cat /mnt/oss/dir1/hello.txt
    • Delete a directory

      rm -rf /mnt/oss/dir1/
  4. Optional. Unmount OSS-HDFS.

    You can unmount OSS-HDFS by using one of the following methods:

    • Manually unmount OSS-HDFS

      umount <mount_point>
    • Automatically unmount OSS-HDFS

      -oauto_unmount

      You can run the preceding command to send SIGINT to the jindo-fuse process by using killall -9 jindo-fuse. OSS-HDFS is automatically unmounted before the process exits.

FAQ

How do I troubleshoot JindoFuse errors?

If you use JindoSDK to call API operations, you can view the details of the error messages when errors are reported. If you use JindoFuse, you can view only the preset error messages of the operating system.

ls: /mnt/oss/: Input/output error

To identify the cause of an error, you must find the jindosdk.log file in the path that is specified by the logger.dir configuration item of JindoSDK. The following message is a common authentication error message that may appear when you use JindoFuse:

EMMDD HH:mm:ss jindofs_connectivity.cpp:13] Please check your Endpoint/Bucket/RoleArn.
Failed test connectivity, operation: mkdir, errMsg: [RequestId]: 618B8183343EA53531C62B74 [HostId]: oss-cn-shanghai-internal.aliyuncs.com [ErrorMessage]: [E1010]HTTP/1.1 403 Forbidden ...

If the preceding error message appears, check whether the endpoint, the bucket, and the role ARN are properly configured. For more information, see Connect non-EMR clusters to OSS-HDFS.

If a program error occurs, submit a ticket.

Appendix 1: Supported operations

The following table describes the POSIX-based API operations that are supported by JindoFuse.

Operation

Description

getattr()

Queries the attributes of an object. This operation is similar to the ls command.

mkdir()

Creates a directory. This operation is similar to the mkdir command.

rmdir()

Deletes a directory. This operation is similar to the rm -rf command.

unlink()

Deletes an object. This operation is similar to the unlink command.

rename()

Renames an object or a directory. This operation is similar to the mv command.

read()

Reads data in sequence.

pread()

Reads data at random.

write()

Writes data in sequence.

pwrite()

Writes data at random.

flush()

Flushes data from the memory to the kernel cache.

fsync()

Flushes data from the memory to disks.

release()

Closes an object.

readdir()

Reads a directory.

create()

Creates an object.

open() O_APPEND

Opens an object by using the append mode.

open() O_TRUNC

Opens an object by using the overwrite mode.

ftruncate()

Truncates an opened object.

truncate()

Truncates a closed object. This operation is similar to the truncate -s command.

lseek()

Specifies the read and write location in an open object.

chmod()

Modifies the permissions on an object. This operation is similar to the chmod command.

access()

Queries the permissions on an object.

utimes()

Modifies the points in time at which an object is stored and modified.

setxattr()

Modifies the xattr attribute of an object.

getxattr()

Queries the xattr attribute of an object.

listxattr()

Lists the xattr attribute of an object.

removexattr()

Deletes the xattr attribute of an object.

lock()

Supports POSIX locks. This operation is similar to the fcntl command.

fallocate()

Pre-allocates physical space to an object.

symlink()

Creates a symbolic link. The symbolic link is available only in OSS-HDFS and does not support cache acceleration.

readlink()

Reads a symbolic link.

Appendix 2: Mount options

The following table describes the options that you can configure to use JindoFuse to mount objects from OSS-HDFS to a local file system.

Option

Required

Description

Example

uri

Yes

The dls path that you want to map. The path can be the root directory of the bucket, such as -ouri=oss://bucket.endpoint/. It can also be a subdirectory of the bucket, such as -ouri=oss://bucket.endpoint/subdir.

-ouri=oss://examplebucket.cn-beijing.oss-dls.aliyuncs.com/

f

No

Starts the JindoFuse process. By default, a daemon process is used to start the JindoFuse process in the background. If you use this option, we recommend that you enable terminal logs.

-f

d

No

Enables the debug mode. If you enable the debug mode, the JindoFuse process starts in the foreground. If you use this option, we recommend that you enable terminal logs.

-d

auto_unmount

No

Automatically unmounts the mount point after the JindoFuse process exits.

-oauto_unmount

ro

No

Mounts objects from OSS-HDFS in read-only mode. After you enable this option, you cannot perform write operations.

-oro

direct_io

No

Allows object read and write without the need for page cache.

-odirect_io

kernel_cache

No

Uses the kernel cache to optimize read performance.

-okernel_cache

auto_cache

No

Enables automatic caching by default. Unlike kernel-cache, auto-cache enables automatic flushing of the cache if the object size or the time at which the object is modified changes.

-oauto_cache

entry_timeout

No

The retention period of the object name in the cache when the object is read. Unit: seconds. This option is used to optimize performance. The value 0 specifies that the object name is not cached. Default value: 0.1.

-oentry_timeout=60

attr_timeout

No

The retention period of the object attributes in the cache. Unit: seconds. This option is used to optimize performance. The value 0 specifies that the object attributes are not cached. Default value: 0.1.

-oattr_timeout=60

negative_timeout

No

The retention period of the object name in the cache if the object fails to be read. Unit: seconds. This option is used to optimize performance. The value 0 specifies that the object name is not cached. Default value: 0.1.

-onegative_timeout=0

jindo_entry_size

No

The number of directories that are cached. This option is used to optimize readdir performance. The value of 0 indicates that the directories are not cached. Default value: 5000.

-ojindo_entry_size=5000

jindo_attr_size

No

The number of object attributes that are cached. This option is used to optimize getattr performance. The value 0 specifies that the object attributes are not cached. Default value: 50000.

-ojindo_attr_sizet=50000

max_idle_threads

No

The maximum number of idle threads. Default value: 10.

-omax_idle_threads=10

metrics_port

No

Enables the HTTP port to output metrics, such as http://localhost:9090/brpc_metrics. Default value: 9090.

-ometrics_port=9090

enable_pread

No

Calls the pread operation to read objects.

-oenable_pread

Appendix 3: Configuration items

Item

Configuration node

Description

logger.dir

common

The directory in which logs are stored. Default value: /tmp/jindodata-log.

logger.sync

common

The mode in which logs are returned. Valid values:

  • true: returns logs in synchronous mode.

  • false (default): returns logs in asynchronous mode.

logger.consolelogger

common

Specifies whether to display logs. Valid values:

  • true: displays logs.

  • false (default): does not display logs.

logger.level

common

Returns logs whose levels are greater than or equal to the value of this configuration item.

  • Enable terminal logs

    Valid values for log levels: 0 to 6. The following items show the mappings between the values of this configuration item and the log levels:

    • 0: TRACE

    • 1: DEBUG

    • 2 (default): INFO

    • 3: WARN

    • 4: ERROR

    • 5: CRITICAL

    • 6: OFF

  • Disable terminal logs

    A log level that is less than or equal to 1 indicates WARN. A log level that is greater than 1 indicates INFO.

logger.verbose

common

Returns Verbose logs whose levels are greater than or equal to the value of this configuration item. Valid values: 0 to 99. Default value: 0. The value 0 specifies that no Verbose logs are returned.

logger.cleaner.enable

common

Specifies whether to enable log cleanup. Valid values:

  • true: enables log cleanup.

  • false (default): disables log cleanup.

fs.oss.endpoint

jindosdk

The endpoint that is used to access OSS-HDFS. Example: cn-hangzhou.oss-dls.aliyuncs.com.

fs.oss.accessKeyId

jindosdk

The AccessKey ID that is used to access OSS-HDFS.

fs.oss.accessKeySecret

jindosdk

The AccessKey secret that is used to access OSS-HDFS.