OSS-HDFS (JindoFS) is a cloud-native data lake storage service that is fully compatible with the Hadoop Distributed File System (HDFS) API. It supports cache-based acceleration, Ranger authentication, and POSIX. Use it as the backend storage for your E-MapReduce (EMR) Hive or Spark workloads to improve performance in big data extract, transform, and load (ETL) scenarios and to smoothly migrate data from HDFS to OSS-HDFS.
OSS-HDFS is available on EMR V3.42 or later minor versions and EMR V5.8.0 or later minor versions.
Prerequisites
Before you begin, make sure you have:
An EMR cluster. See Create a cluster.
OSS-HDFS enabled on your bucket, with access permissions granted. See Enable OSS-HDFS and grant access permissions.
Access control
OSS-HDFS access from an EMR cluster is controlled through RAM roles. The default role AliyunECSInstanceForEMRRole already has the oss:PostDataLakeStorageFileOperation permission through its attached policy AliyunECSInstanceForEMRRolePolicy — no additional configuration is needed.
If your cluster uses a custom role instead of the default role, grant the custom role the oss:PostDataLakeStorageFileOperation permission before proceeding.
OSS-HDFS endpoint format
All Hive and Spark commands that reference OSS-HDFS use the following URI format:
oss://<bucket-name>.<oss-hdfs-endpoint>/<path>For example:
oss://my-bucket.cn-hangzhou.oss-dls.aliyuncs.com/warehouse/Get the endpoint from the Overview page of your bucket in the OSS console.

Use OSS-HDFS in Hive
The following steps show how to create a Hive database and table backed by OSS-HDFS, insert data, and verify the result. The same Hive URI format applies to Spark — replace <bucket-name>.<oss-hdfs-endpoint> with your actual endpoint in any Spark job that reads from or writes to OSS-HDFS.
Log on to the EMR cluster. See Log on to a cluster.
Open the Hive CLI:
hiveCreate a database in OSS-HDFS:
CREATE DATABASE IF NOT EXISTS dw LOCATION 'oss://<bucket-name>.<oss-hdfs-endpoint>/<path>';Replace the following placeholders:
Placeholder Description <bucket-name>Name of your OSS bucket <oss-hdfs-endpoint>OSS-HDFS endpoint obtained from the OSS console, for example cn-hangzhou.oss-dls.aliyuncs.com<path>Directory path within the bucket Switch to the new database:
USE dw;Create a Hive table in the database:
CREATE TABLE IF NOT EXISTS employee ( eid INT, name STRING, salary STRING, destination STRING ) COMMENT 'Employee details';Verify that the table location points to OSS-HDFS:
DESC FORMATTED employee;The
Locationfield in the output confirms that the table resides in OSS-HDFS:Location: oss://****.cn-hangzhou.oss-dls.aliyuncs.com/dw/employee Table Type: MANAGED_TABLEInsert a row:
INSERT INTO employee (eid, name, salary, destination) VALUES (1, 'liu hua', '100.0', '');EMR generates a job to execute the insert.
Query the inserted data:
SELECT * FROM employee WHERE eid = 1;Expected output:
OK 1 liu hua 100.0 Time taken: 12.379 seconds, Fetched: 1 row(s)
What's next
To learn more about OSS-HDFS capabilities, see Overview.
To use OSS-HDFS with other endpoint formats, see Appendix 1: Other methods used to configure the endpoint of OSS-HDFS.