All Products
Search
Document Center

E-MapReduce:Access OSS-HDFS through RootPolicy

Last Updated:Mar 26, 2026

OSS-HDFS supports RootPolicy, which maps a custom hdfs:// namespace to an OSS-HDFS bucket. Without RootPolicy, migrating existing Spark or Hive jobs to OSS-HDFS requires updating every task that references an hdfs:// path. With RootPolicy configured, jobs continue to use hdfs://<custom-namespace>/ with no code changes — OSS-HDFS transparently routes all reads and writes to oss://<bucket>.<endpoint>/.

Prerequisites

Before you begin, ensure that you have:

Configure RootPolicy

Note

If RootPolicy is already configured for your cluster's OSS-HDFS service, skip this section and go to Use RootPolicy.

Step 1: Set up JindoSDK configuration

  1. Connect to an ECS instance. See Connect to an ECS instance.

  2. Go to the bin directory of the installed JindoSDK JAR file.

    cd jindosdk-x.x.x/bin/

    Replace x.x.x with the actual version number of your JindoSDK JAR file.

  3. Create a configuration file named jindosdk.cfg with the following content.

    [common]
    logger.dir = /tmp/jindo/
    logger.sync = false
    logger.consolelogger = false
    logger.level = 0
    logger.verbose = 0
    logger.cleaner.enable = true
    hadoopConf.enable = false
    
    [jindosdk]
    # Replace with the endpoint for your region. This example uses China (Hangzhou).
    fs.oss.endpoint = cn-hangzhou.oss-dls.aliyuncs.com
    # AccessKey credentials for accessing OSS-HDFS.
    fs.oss.accessKeyId = <your-access-key-id>
    fs.oss.accessKeySecret = <your-access-key-secret>
  4. Export the JINDOSDK_CONF_DIR environment variable, pointing to the directory that contains jindosdk.cfg.

    export JINDOSDK_CONF_DIR=<absolute-path-to-jindosdk-cfg-directory>

Step 2: Run SetRootPolicy

Run the SetRootPolicy command to register an hdfs:// namespace for a bucket:

jindo admin -setRootPolicy oss://<bucket_name>.<dls_endpoint>/ hdfs://<your_ns_name>/
ParameterDescriptionExample
bucket_nameName of the bucket with OSS-HDFS enabledmy-bucket
dls_endpointOSS-HDFS endpoint for the bucket's regioncn-hangzhou.oss-dls.aliyuncs.com
your_ns_nameCustom namespace for the hdfs:// prefix. Supports any non-empty string. The current version supports only the root directory.test

Optional: avoid repeating the endpoint in every command

Add fs.oss.endpoint to core-site.xml so you don't need to include <dls_endpoint> each time.

Method 1 — applies to all buckets:

<configuration>
  <property>
    <name>fs.oss.endpoint</name>
    <value><dls_endpoint></value>
  </property>
</configuration>

Method 2 — applies to a specific bucket:

<configuration>
  <property>
    <name>fs.oss.bucket.<bucket_name>.endpoint</name>
    <value><dls_endpoint></value>
  </property>
</configuration>

To enable access policies for multiple buckets, separate the oss:// addresses with commas in the fs.accessPolicies.discovery value.

Step 3: Configure access policies in core-site.xml

Add the following properties to core-site.xml:

<configuration>
    <property>
        <name>fs.accessPolicies.discovery</name>
        <value>oss://<bucket_name>.<dls_endpoint>/</value>
    </property>
    <property>
        <name>fs.AbstractFileSystem.hdfs.impl</name>
        <value>com.aliyun.jindodata.hdfs.HDFS</value>
    </property>
    <property>
        <name>fs.hdfs.impl</name>
        <value>com.aliyun.jindodata.hdfs.JindoHdfsFileSystem</value>
    </property>
</configuration>

Step 4: Verify the configuration

Run the following command to confirm that RootPolicy is working:

hadoop fs -ls hdfs://<your_ns_name>/

A successful configuration returns output similar to:

drwxr-x--x   - hdfs  hadoop          0 2023-01-05 12:27 hdfs://<your_ns_name>/apps
drwxrwxrwx   - spark hadoop          0 2023-01-05 12:27 hdfs://<your_ns_name>/spark-history
drwxrwxrwx   - hdfs  hadoop          0 2023-01-05 12:27 hdfs://<your_ns_name>/tmp
drwxrwxrwx   - hdfs  hadoop          0 2023-01-05 12:27 hdfs://<your_ns_name>/user

Step 5: Restart dependent services

Restart services such as Hive and Spark. After the restart, jobs can access OSS-HDFS using the hdfs://<your_ns_name>/ prefix.

How routing works

Once RootPolicy is active, any path under hdfs://<your_ns_name>/ is transparently routed to the corresponding path in oss://<bucket_name>.<dls_endpoint>/. For example:

  • A job writes to hdfs://<your_ns_name>/user/data/file.parquet — OSS-HDFS receives the write at oss://<bucket_name>.<dls_endpoint>/user/data/file.parquet.

  • A Hive table at hdfs://<your_ns_name>/user/hive/warehouse/my_table/ reads from oss://<bucket_name>.<dls_endpoint>/user/hive/warehouse/my_table/.

The original oss:// URI remains valid after RootPolicy is configured. Both URI forms access the same data, so you can mix them across different jobs or tools without data inconsistency.

Note

The !hadoop fs command does not currently support RootPolicy. Use the original oss:// address with !hadoop fs instead.

Manage RootPolicy

List all registered namespaces for a bucket

jindo admin -listAccessPolicies oss://<bucket_name>.<dls_endpoint>/

Delete a registered namespace

jindo admin -unsetRootPolicy oss://<bucket_name>.<dls_endpoint>/ hdfs://<your_ns_name>/

Use RootPolicy

All three session types require the same Spark configuration to enable RootPolicy. Add the following properties to the Spark Configuration field when creating a session or task:

spark.hadoop.fs.accessPolicies.discovery      oss://<bucket_name>.cn-<region>.oss-dls.aliyuncs.com
spark.hadoop.fs.AbstractFileSystem.hdfs.impl  com.aliyun.jindodata.hdfs.v3.HDFS
spark.hadoop.fs.hdfs.impl                     com.aliyun.jindodata.hdfs.v3.JindoDistributedFileSystem

Scenario 1: Notebook session

  1. On the EMR Serverless Spark page, click Session Manager in the left-side navigation pane.

  2. On the Notebook Sessions page, click Create Notebook Session.

  3. On the Create Notebook Session page, add the Spark configuration above.

  4. In a Data Development Notebook task, use the hdfs://<ns_name>/ prefix directly in your SQL statements.

    # Create table
    spark.sql("""CREATE TABLE default.my_orc_table (
        id INT,
        name STRING,
        age INT
    )
    location 'hdfs://<ns_name>/user/hive/warehouse/ads_user_info_1d_emr/my_orc_table/'""")
    
    # Insert data
    spark.sql("""INSERT INTO table default.my_orc_table(id, name, age) VALUES (1, 'Alice', 30)""")
    
    # Query
    spark.sql("SELECT * FROM default.my_orc_table").show()

    Replace <ns_name> with the custom namespace you set in the SetRootPolicy command.

    image

Note

The !hadoop fs command in Notebook does not currently support RootPolicy. Use the original oss:// address with !hadoop fs instead.

Scenario 2: SQL session

  1. On the EMR Serverless Spark page, click Sessions in the left-side navigation pane.

  2. On the SQL Session page, click Create SQL Session.

  3. On the Create SQL Session page, add the Spark configuration above.

  4. In a Data Development SparkSQL task, use the hdfs://<ns_name>/ prefix directly in your SQL statements.

    -- Create table
    CREATE TABLE default.my_orc_table1 (
        id INT,
        name STRING,
        age INT
    )
    location "hdfs://<ns_name>/user/hive/warehouse/ads_user_info_1d_emr/my_orc_table1/";
    
    -- Insert data
    INSERT INTO table default.my_orc_table1(id, name, age)
    VALUES (1, 'Alice', 30);
    
    -- Query
    SELECT * FROM default.my_orc_table1;

    Replace <ns_name> with the custom namespace you set in the SetRootPolicy command.

    image

Scenario 3: Batch job

  1. Upload the SQL file. This example uses the sample SQL file test.sql.

    1. On the EMR Serverless Spark page, click Artifacts in the left-side navigation pane.

    2. On the Managed File Directory page, click Upload File.

    3. In the Upload File dialog box, click the upload area to select a local file, or drag and drop the file directly.

  2. Create and configure a batch task.

    1. On the EMR Serverless Spark page, click Development in the left-side navigation pane.

    2. On the Catalog tab, click the image (Create) icon.

    3. In the dialog box, enter a Name, select SQL under the Application (Batch) type, and then click OK.

    4. On the Task Configuration page, add the Spark configuration above.

  3. Click Run. The results appear as shown below.

    image

FAQ

!hadoop fs command fails with a namespace configuration error in Notebook

  • Issue: In a Notebook task, when executing the !hadoop fs -ls hdfs://<ns_name>/ command, the following error message appears:

    init failed, Caused by error 30004: Invalid argument: Neither fs.hdfs.test.dfs.ha.namenodes nor dfs.ha.namenodes.test is configured for HA namenodes
    ERROR: code=1002, message=ERROR: failed to init filesystem.
  • Cause: Currently, the !hadoop fs command in Notebook does not support RootPolicy.

  • Solution: For the !hadoop fs command in Notebook, it is recommended to use the original OSS-HDFS address.