All Products
Search
Document Center

Object Storage Service:Quick start: Connect a non-EMR cluster to the OSS-HDFS service

Last Updated:Mar 20, 2026

The OSS-HDFS service (also known as the JindoFS service) is fully compatible with Hadoop Distributed File System (HDFS) interfaces and supports directory-level operations. With the Jindo software development kit (SDK), Apache Hadoop applications—including MapReduce, Hive, Spark, and Flink—can read from and write to OSS-HDFS directly.

This guide walks you through deploying JindoSDK on an ECS instance and running basic file operations against the OSS-HDFS service.

If you use an Alibaba Cloud EMR cluster, see Quick start for connecting to the OSS-HDFS service from an EMR cluster instead.

Prerequisites

Before you begin, ensure that you have:

Set up JindoSDK

Step 1: Connect to the ECS instance

Connect to your ECS instance.

Step 2: Download and decompress JindoSDK

  1. Download the JindoSDK JAR package. For the download link, see JindoSDK download on GitHub.

  2. Decompress the package. The following example uses jindosdk-x.x.x-linux.tar.gz, where x.x.x is the version number. Replace it with the actual filename.

       tar zxvf jindosdk-x.x.x-linux.tar.gz

Step 3: Configure environment variables

Set JINDOSDK_HOME to the directory where you decompressed the package, then add the JindoSDK libraries to HADOOP_CLASSPATH.

export JINDOSDK_HOME=/usr/lib/jindosdk-x.x.x-linux
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${JINDOSDK_HOME}/lib/*
Important

Deploy the installation directory and set these environment variables on all required nodes.

Step 4: Configure the OSS-HDFS implementation class

Open core-site.xml:

vim /usr/local/hadoop/etc/hadoop/core-site.xml

Add the following properties to register JindoSDK as the OSS file system implementation:

<configuration>
    <property>
        <name>fs.AbstractFileSystem.oss.impl</name>
        <value>com.aliyun.jindodata.oss.JindoOSS</value>
        <description>Registers JindoSDK as the AbstractFileSystem implementation for the oss:// scheme.</description>
    </property>

    <property>
        <name>fs.oss.impl</name>
        <value>com.aliyun.jindodata.oss.JindoOssFileSystem</value>
        <description>Registers JindoSDK as the FileSystem implementation for the oss:// scheme.</description>
    </property>
</configuration>

Step 5: Configure authentication

Add your AccessKey pair to core-site.xml. For the permissions required, see Grant a RAM user the permissions to connect to the OSS-HDFS service from a non-EMR cluster.

<configuration>
    <property>
        <name>fs.oss.accessKeyId</name>
        <value>xxx</value>
        <description>AccessKey ID for authenticating requests to OSS-HDFS.</description>
    </property>

    <property>
        <name>fs.oss.accessKeySecret</name>
        <value>xxx</value>
        <description>AccessKey Secret for authenticating requests to OSS-HDFS.</description>
    </property>
</configuration>

Step 6: Configure the endpoint

You must configure an endpoint to access an OSS bucket. Use the following path format:

oss://<Bucket>.<Endpoint>/<Object>

For example:

oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txt

JindoSDK uses the endpoint in the access path to access the corresponding OSS-HDFS service API.

Run basic operations

Use HDFS Shell commands to read and write files on the OSS-HDFS service.

Upload a file

The following example uploads examplefile.txt from the local root directory to examplebucket:

hdfs dfs -put examplefile.txt oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/

Download a file

The following example downloads exampleobject.txt from examplebucket to the local /tmp directory:

hdfs dfs -get oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txt /tmp/

For more HDFS Shell commands, see Access the OSS-HDFS service using Hadoop shell commands.

Appendix: Performance tuning

The following configuration items are supported in JindoSDK 4.0 and later. Add them to core-site.xml to tune performance for your workload.

ParameterDefaultDescription
fs.oss.tmp.data.dirs/tmp/Directories for temporary files written by the client. Separate multiple directories with commas. In a multi-user environment, set read and write permissions on each directory.
fs.oss.retry.count5Number of retries after a failed request to OSS.
fs.oss.timeout.millisecond30000Timeout for OSS requests, in milliseconds.
fs.oss.connection.timeout.millisecond3000Timeout for connecting to OSS, in milliseconds.
fs.oss.upload.thread.concurrency5Number of concurrent threads for uploading a single file to OSS.
fs.oss.upload.queue.size5Queue size for concurrent upload tasks to OSS.
fs.oss.upload.max.pending.tasks.per.stream16Maximum number of concurrent upload tasks per process.
fs.oss.download.queue.size5Queue size for concurrent download tasks from OSS.
fs.oss.download.thread.concurrency16Maximum number of concurrent download tasks per process.
fs.oss.read.readahead.buffer.size1048576Read-ahead buffer size, in bytes.
fs.oss.read.readahead.buffer.count4Number of concurrent read-ahead buffers.

To apply any of these settings, add the corresponding <property> block to core-site.xml:

<configuration>
    <property>
        <name>fs.oss.retry.count</name>
        <value>5</value>
        <description>Number of retries after a failed request to OSS.</description>
    </property>
</configuration>