When you need to manage files in Object Storage Service (OSS) or OSS-HDFS from an EMR cluster, JindoSDK lets you use the same standard Hadoop Shell commands for both — only the endpoint in the path changes.
Prerequisites
Before you begin, ensure that you have:
An OSS bucket with the appropriate access permissions
JindoSDK available in your environment (see Environment setup)
Environment setup
| Environment | Setup required |
|---|---|
| EMR cluster | JindoSDK is pre-installed. No additional setup needed. |
| Non-EMR environment | Install JindoSDK first. See Deploy JindoSDK in an environment other than EMR. |
OSS-HDFS version requirements:
| Environment | Minimum version |
|---|---|
| EMR cluster | EMR V3.42.0 or later (minor version), or EMR V5.8.0 or later (minor version) |
| Non-EMR environment | JindoSDK V4.X or later |
URI format reference
The commands for OSS and OSS-HDFS are identical. The only difference is the endpoint embedded in the path.
| Storage | Endpoint pattern | Example path |
|---|---|---|
| OSS-HDFS | <region>.oss-dls.aliyuncs.com | oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/ |
All examples in this topic use the OSS-HDFS endpoint format.
Rename operations in object storage are proportional to the amount of data involved. Commands such as-put,-cp, and-mvcan be significantly slower on large directories compared to HDFS.
Commands
Upload a file
hadoop fs -put examplefile.txt oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/This uploads examplefile.txt from the local root directory to examplebucket.
Useful options:
| Option | Description |
|---|---|
-f | Overwrite the destination if it already exists |
-d | Skip the intermediate ._COPYING_ temp file. Use this flag when uploading to object storage to avoid unnecessary rename operations |
Tip: Always use-dwhen uploading to OSS or OSS-HDFS. Without it, Hadoop creates a._COPYING_temp file and renames it on completion, which adds latency proportional to file size.
Create a directory
hadoop fs -mkdir oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/dir/This creates a directory named dir/ in examplebucket.
List files or directories
hadoop fs -ls oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/This lists all files and directories in examplebucket.
Useful options:
| Option | Description |
|---|---|
-R | Recursively list all subdirectories |
-h | Display file sizes in human-readable format (KB, MB, GB) |
Check disk usage
hadoop fs -du oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/This reports the sizes of all files and directories in examplebucket.
-du on large buckets can be slow when used against object storage. Avoid running it frequently on buckets with many objects.View file content
hadoop fs -cat oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/localfile.txtThis displays the content of localfile.txt in plain text.
If the file content is encoded, use the HDFS API for Java to read and decode it. The -cat command does not decode encoded content.
Copy files or directories
hadoop fs -cp oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/subdir1 \
oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/subdir2/subdir1This copies subdir1 to subdir2/subdir1, preserving the directory structure, file locations, and content of all subdirectories.
-cp on large directories is slow against object storage because rename operations are proportional to the amount of data moved.Move files or directories
hadoop fs -mv oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/srcdir \
oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/destdirThis moves srcdir and all its files and subdirectories to destdir.
-mv can be slow for large directories in object storage.Download a file
hadoop fs -get oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampleobject.txt /tmp/This downloads exampleobject.txt from examplebucket to /tmp/ on the local machine.
Delete files or directories
hadoop fs -rm oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/destfolder/This deletes destfolder/ and all files within it from examplebucket.
Useful options:
| Option | Description |
|---|---|
-r | Recursively delete a non-empty directory and its contents |
-skipTrash | Permanently delete without moving to the trash directory |