OSS-HDFS (JindoFS) is fully compatible with Hadoop Distributed File System (HDFS) API operations and supports directory-level operations. JindoSDK allows Apache Hadoop-based computing and analysis applications, such as MapReduce, Hive, Spark, and Flink, to access HDFS. This topic describes how to deploy JindoSDK on an Elastic Compute Service (ECS) instance and then perform basic operations related to OSS-HDFS.
Prerequisites
An Elastic Compute Service (ECS) instance is created. For more information, see Create an instance.
A Hadoop environment is created. For more information about how to install Hadoop, see Step 2: Create a Hadoop runtime environment.
OSS-HDFS is enabled for a bucket and permissions are granted to access OSS-HDFS. For more information, see Enable OSS-HDFS and grant access permissions.
Video tutorial
The following video provides an example on how to connect non-EMR clusters to OSS-HDFS and perform common operations.
Procedure
Connect to the ECS instance. For more information, see Connect to an instance.
Download and decompress the JindoSDK JAR package. For more information, visit GitHub.
Decompress the JindoSDK JAR package.
The following sample code provides an example on how to decompress a package named
jindosdk-x.x.x-linux.tar.gz
. If you use another version of JindoSDK, replace the package name with the name of the corresponding JAR package.tar zxvf jindosdk-x.x.x-linux.tar.gz
Notex.x.x indicates the version number of the JAR package.
Configure environment variables.
Configure
JINDOSDK_HOME
.The following sample code provides an example on how to decompress the package to the /usr/lib/jindosdk-x.x.x-linux directory:
export JINDOSDK_HOME=/usr/lib/jindosdk-x.x.x-linux
Configure
HADOOP_CLASSPATH
.export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${JINDOSDK_HOME}/lib/*
ImportantSpecify the installation directory of the package and configure environment variables on all required nodes.
Configure the implementation class of OSS-HDFS and specify the AccessKey pair that you want to use to access the bucket.
Run the following command to go to the Hadoop configuration file core-site.xml:
vim /usr/local/hadoop/etc/hadoop/core-site.xml
Configure the JindoSDK DLS implementation class in the core-site.xml file.
<configuration> <property> <name>fs.AbstractFileSystem.oss.impl</name> <value>com.aliyun.jindodata.oss.JindoOSS</value> </property> <property> <name>fs.oss.impl</name> <value>com.aliyun.jindodata.oss.JindoOssFileSystem</value> </property> </configuration>
Configure the AccessKey ID and AccessKey secret that are used to access the bucket for which OSS-HDFS is enabled in the core-site.xml file.
<configuration> <property> <name>fs.oss.accessKeyId</name> <value>xxx</value> </property> <property> <name>fs.oss.accessKeySecret</name> <value>xxx</value> </property> </configuration>
Configure the endpoint of OSS-HDFS.
You must configure the endpoint when you use OSS-HDFS to access buckets in Object Storage Service (OSS). We recommend that you configure the access path in the following format:
oss://<Bucket>.<Endpoint>/<Object>
. Example:oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txt
. After you configure the access path, JindoSDK accesses the corresponding OSS-HDFS operation based on the specified endpoint in the access path.You can also configure the endpoint of OSS-HDFS by using other methods. The endpoints configured by using different methods take effect in a specific order of precedence. For more information, see the Appendix 1: Other methods used to configure the endpoint of OSS-HDFS section of this topic.
Run HDFS Shell commands to perform common operations that are related to OSS-HDFS.
Upload local files
Run the following command to upload a local file named examplefile.txt in the local root directory to a bucket named examplebucket:
hdfs dfs -put examplefile.txt oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/
Create directories
Run the following command to create a directory named dir/ in a bucket named examplebucket:
hdfs dfs -mkdir oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/dir/
Query objects or directories
Run the following command to query the objects or directories in a bucket named examplebucket:
hdfs dfs -ls oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/
Query the size of objects or directories
Run the following command to query the size of all objects or directories in a bucket named examplebucket:
hdfs dfs -du oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/
Query the content of an object
Run the following command to query the content of an object named localfile.txt in a bucket named examplebucket:
hdfs dfs -cat oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/localfile.txt
ImportantThe content of the queried object is displayed on the screen in plain text. If the content is encoded, use the HDFS API for Java to read and decode the content.
Copy objects or directories
Run the following command to copy the root directory named subdir1 in a bucket named examplebucket to a directory named subdir2 in the same bucket. In addition, the position of the subdir1 root directory, the objects in the subdir1 root directory, and the structure and content of subdirectories in the subdir1 root directory remain unchanged.
hdfs dfs -cp oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/subdir1 oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/subdir2/subdir1
Move objects or directories
Run the following command to move the root directory named srcdir in a bucket named examplebucket and the objects and subdirectories in the root directory to another root directory named destdir:
hdfs dfs -mv oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/srcdir oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/destdir
Download objects
Run the following command to download an object named exampleobject.txt from a bucket named examplebucket to the root directory named /tmp on your computer:
hdfs dfs -get oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txt /tmp/
Delete objects or directories
Run the following command to delete a directory named destfolder/ and all objects in the directory from a bucket named examplebucket:
hdfs dfs -rm -r oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/destfolder/