OSS-HDFS supports RootPolicy, which maps a custom hdfs:// namespace to an OSS-HDFS bucket. Without RootPolicy, migrating existing Spark or Hive jobs to OSS-HDFS requires updating every task that references an hdfs:// path. With RootPolicy configured, jobs continue to use hdfs://<custom-namespace>/ with no code changes — OSS-HDFS transparently routes all reads and writes to oss://<bucket>.<endpoint>/.
Prerequisites
Before you begin, ensure that you have:
A Serverless Spark workspace. See Create a workspace.
An EMR on ECS cluster with OSS-HDFS enabled. See Create a cluster.
Configure RootPolicy
If RootPolicy is already configured for your cluster's OSS-HDFS service, skip this section and go to Use RootPolicy.
Step 1: Set up JindoSDK configuration
Connect to an ECS instance. See Connect to an ECS instance.
Go to the
bindirectory of the installed JindoSDK JAR file.cd jindosdk-x.x.x/bin/Replace
x.x.xwith the actual version number of your JindoSDK JAR file.Create a configuration file named
jindosdk.cfgwith the following content.[common] logger.dir = /tmp/jindo/ logger.sync = false logger.consolelogger = false logger.level = 0 logger.verbose = 0 logger.cleaner.enable = true hadoopConf.enable = false [jindosdk] # Replace with the endpoint for your region. This example uses China (Hangzhou). fs.oss.endpoint = cn-hangzhou.oss-dls.aliyuncs.com # AccessKey credentials for accessing OSS-HDFS. fs.oss.accessKeyId = <your-access-key-id> fs.oss.accessKeySecret = <your-access-key-secret>Export the
JINDOSDK_CONF_DIRenvironment variable, pointing to the directory that containsjindosdk.cfg.export JINDOSDK_CONF_DIR=<absolute-path-to-jindosdk-cfg-directory>
Step 2: Run SetRootPolicy
Run the SetRootPolicy command to register an hdfs:// namespace for a bucket:
jindo admin -setRootPolicy oss://<bucket_name>.<dls_endpoint>/ hdfs://<your_ns_name>/| Parameter | Description | Example |
|---|---|---|
bucket_name | Name of the bucket with OSS-HDFS enabled | my-bucket |
dls_endpoint | OSS-HDFS endpoint for the bucket's region | cn-hangzhou.oss-dls.aliyuncs.com |
your_ns_name | Custom namespace for the hdfs:// prefix. Supports any non-empty string. The current version supports only the root directory. | test |
Optional: avoid repeating the endpoint in every command
Add fs.oss.endpoint to core-site.xml so you don't need to include <dls_endpoint> each time.
Method 1 — applies to all buckets:
<configuration>
<property>
<name>fs.oss.endpoint</name>
<value><dls_endpoint></value>
</property>
</configuration>Method 2 — applies to a specific bucket:
<configuration>
<property>
<name>fs.oss.bucket.<bucket_name>.endpoint</name>
<value><dls_endpoint></value>
</property>
</configuration>To enable access policies for multiple buckets, separate the oss:// addresses with commas in the fs.accessPolicies.discovery value.
Step 3: Configure access policies in core-site.xml
Add the following properties to core-site.xml:
<configuration>
<property>
<name>fs.accessPolicies.discovery</name>
<value>oss://<bucket_name>.<dls_endpoint>/</value>
</property>
<property>
<name>fs.AbstractFileSystem.hdfs.impl</name>
<value>com.aliyun.jindodata.hdfs.HDFS</value>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>com.aliyun.jindodata.hdfs.JindoHdfsFileSystem</value>
</property>
</configuration>Step 4: Verify the configuration
Run the following command to confirm that RootPolicy is working:
hadoop fs -ls hdfs://<your_ns_name>/A successful configuration returns output similar to:
drwxr-x--x - hdfs hadoop 0 2023-01-05 12:27 hdfs://<your_ns_name>/apps
drwxrwxrwx - spark hadoop 0 2023-01-05 12:27 hdfs://<your_ns_name>/spark-history
drwxrwxrwx - hdfs hadoop 0 2023-01-05 12:27 hdfs://<your_ns_name>/tmp
drwxrwxrwx - hdfs hadoop 0 2023-01-05 12:27 hdfs://<your_ns_name>/userStep 5: Restart dependent services
Restart services such as Hive and Spark. After the restart, jobs can access OSS-HDFS using the hdfs://<your_ns_name>/ prefix.
How routing works
Once RootPolicy is active, any path under hdfs://<your_ns_name>/ is transparently routed to the corresponding path in oss://<bucket_name>.<dls_endpoint>/. For example:
A job writes to
hdfs://<your_ns_name>/user/data/file.parquet— OSS-HDFS receives the write atoss://<bucket_name>.<dls_endpoint>/user/data/file.parquet.A Hive table at
hdfs://<your_ns_name>/user/hive/warehouse/my_table/reads fromoss://<bucket_name>.<dls_endpoint>/user/hive/warehouse/my_table/.
The original oss:// URI remains valid after RootPolicy is configured. Both URI forms access the same data, so you can mix them across different jobs or tools without data inconsistency.
The !hadoop fs command does not currently support RootPolicy. Use the original oss:// address with !hadoop fs instead.
Manage RootPolicy
List all registered namespaces for a bucket
jindo admin -listAccessPolicies oss://<bucket_name>.<dls_endpoint>/Delete a registered namespace
jindo admin -unsetRootPolicy oss://<bucket_name>.<dls_endpoint>/ hdfs://<your_ns_name>/Use RootPolicy
All three session types require the same Spark configuration to enable RootPolicy. Add the following properties to the Spark Configuration field when creating a session or task:
spark.hadoop.fs.accessPolicies.discovery oss://<bucket_name>.cn-<region>.oss-dls.aliyuncs.com
spark.hadoop.fs.AbstractFileSystem.hdfs.impl com.aliyun.jindodata.hdfs.v3.HDFS
spark.hadoop.fs.hdfs.impl com.aliyun.jindodata.hdfs.v3.JindoDistributedFileSystemScenario 1: Notebook session
On the EMR Serverless Spark page, click Session Manager in the left-side navigation pane.
On the Notebook Sessions page, click Create Notebook Session.
On the Create Notebook Session page, add the Spark configuration above.
In a Data Development Notebook task, use the
hdfs://<ns_name>/prefix directly in your SQL statements.# Create table spark.sql("""CREATE TABLE default.my_orc_table ( id INT, name STRING, age INT ) location 'hdfs://<ns_name>/user/hive/warehouse/ads_user_info_1d_emr/my_orc_table/'""") # Insert data spark.sql("""INSERT INTO table default.my_orc_table(id, name, age) VALUES (1, 'Alice', 30)""") # Query spark.sql("SELECT * FROM default.my_orc_table").show()Replace
<ns_name>with the custom namespace you set in theSetRootPolicycommand.
The !hadoop fs command in Notebook does not currently support RootPolicy. Use the original oss:// address with !hadoop fs instead.
Scenario 2: SQL session
On the EMR Serverless Spark page, click Sessions in the left-side navigation pane.
On the SQL Session page, click Create SQL Session.
On the Create SQL Session page, add the Spark configuration above.
In a Data Development SparkSQL task, use the
hdfs://<ns_name>/prefix directly in your SQL statements.-- Create table CREATE TABLE default.my_orc_table1 ( id INT, name STRING, age INT ) location "hdfs://<ns_name>/user/hive/warehouse/ads_user_info_1d_emr/my_orc_table1/"; -- Insert data INSERT INTO table default.my_orc_table1(id, name, age) VALUES (1, 'Alice', 30); -- Query SELECT * FROM default.my_orc_table1;Replace
<ns_name>with the custom namespace you set in theSetRootPolicycommand.
Scenario 3: Batch job
Upload the SQL file. This example uses the sample SQL file test.sql.
On the EMR Serverless Spark page, click Artifacts in the left-side navigation pane.
On the Managed File Directory page, click Upload File.
In the Upload File dialog box, click the upload area to select a local file, or drag and drop the file directly.
Create and configure a batch task.
On the EMR Serverless Spark page, click Development in the left-side navigation pane.
On the Catalog tab, click the
(Create) icon.In the dialog box, enter a Name, select SQL under the Application (Batch) type, and then click OK.
On the Task Configuration page, add the Spark configuration above.
Click Run. The results appear as shown below.
