The OSS-HDFS service (also known as the JindoFS service) is fully compatible with Hadoop Distributed File System (HDFS) interfaces and supports directory-level operations. With the Jindo software development kit (SDK), Apache Hadoop applications—including MapReduce, Hive, Spark, and Flink—can read from and write to OSS-HDFS directly.
This guide walks you through deploying JindoSDK on an ECS instance and running basic file operations against the OSS-HDFS service.
If you use an Alibaba Cloud EMR cluster, see Quick start for connecting to the OSS-HDFS service from an EMR cluster instead.
Prerequisites
Before you begin, ensure that you have:
Permissions: An Alibaba Cloud account has access by default. If you use a RAM user, grant the RAM user the required permissions first. For details, see Grant a RAM user the permissions to connect to the OSS-HDFS service from a non-EMR cluster
An ECS instance: Purchase and create an ECS instance if you don't already have one
A Hadoop environment: Set up a Hadoop runtime environment on the instance. For details, see Create a Hadoop runtime environment
OSS-HDFS enabled: Enable the OSS-HDFS service for the target bucket and authorize access. For details, see Enable the OSS-HDFS service
Set up JindoSDK
Step 1: Connect to the ECS instance
Step 2: Download and decompress JindoSDK
Download the JindoSDK JAR package. For the download link, see JindoSDK download on GitHub.
Decompress the package. The following example uses
jindosdk-x.x.x-linux.tar.gz, wherex.x.xis the version number. Replace it with the actual filename.tar zxvf jindosdk-x.x.x-linux.tar.gz
Step 3: Configure environment variables
Set JINDOSDK_HOME to the directory where you decompressed the package, then add the JindoSDK libraries to HADOOP_CLASSPATH.
export JINDOSDK_HOME=/usr/lib/jindosdk-x.x.x-linux
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${JINDOSDK_HOME}/lib/*Deploy the installation directory and set these environment variables on all required nodes.
Step 4: Configure the OSS-HDFS implementation class
Open core-site.xml:
vim /usr/local/hadoop/etc/hadoop/core-site.xmlAdd the following properties to register JindoSDK as the OSS file system implementation:
<configuration>
<property>
<name>fs.AbstractFileSystem.oss.impl</name>
<value>com.aliyun.jindodata.oss.JindoOSS</value>
<description>Registers JindoSDK as the AbstractFileSystem implementation for the oss:// scheme.</description>
</property>
<property>
<name>fs.oss.impl</name>
<value>com.aliyun.jindodata.oss.JindoOssFileSystem</value>
<description>Registers JindoSDK as the FileSystem implementation for the oss:// scheme.</description>
</property>
</configuration>Step 5: Configure authentication
Add your AccessKey pair to core-site.xml. For the permissions required, see Grant a RAM user the permissions to connect to the OSS-HDFS service from a non-EMR cluster.
<configuration>
<property>
<name>fs.oss.accessKeyId</name>
<value>xxx</value>
<description>AccessKey ID for authenticating requests to OSS-HDFS.</description>
</property>
<property>
<name>fs.oss.accessKeySecret</name>
<value>xxx</value>
<description>AccessKey Secret for authenticating requests to OSS-HDFS.</description>
</property>
</configuration>Step 6: Configure the endpoint
You must configure an endpoint to access an OSS bucket. Use the following path format:
oss://<Bucket>.<Endpoint>/<Object>For example:
oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txtJindoSDK uses the endpoint in the access path to access the corresponding OSS-HDFS service API.
Run basic operations
Use HDFS Shell commands to read and write files on the OSS-HDFS service.
Upload a file
The following example uploads examplefile.txt from the local root directory to examplebucket:
hdfs dfs -put examplefile.txt oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/Download a file
The following example downloads exampleobject.txt from examplebucket to the local /tmp directory:
hdfs dfs -get oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txt /tmp/For more HDFS Shell commands, see Access the OSS-HDFS service using Hadoop shell commands.