All Products
Search
Document Center

E-MapReduce:Access OSS-HDFS using RootPolicy

Last Updated:Jun 24, 2026

The OSS-HDFS service supports RootPolicy. With RootPolicy, you can set a custom prefix for the OSS-HDFS service. This feature allows Serverless Spark to directly operate on data in OSS-HDFS without modifying existing jobs that use the hdfs:// prefix.

Prerequisites

  • You have created a Serverless Spark workspace. For more information, see Create a workspace.

  • You have created an EMR on ECS cluster with the OSS-HDFS service enabled. For more information, see Create a cluster.

Configure RootPolicy

Note

If you have already configured RootPolicy for the OSS-HDFS service on your cluster, skip this section and proceed to Using RootPolicy.

  1. Configure environment variables.

    1. Connect to an ECS instance. For more information, see Connect to an ECS instance.

    2. Go to the bin directory of the installed JindoSDK JAR package.

      cd jindosdk-x.x.x/bin/
      Note

      x.x.x indicates the version number of the JindoSDK JAR package.

    3. Create a configuration file named jindosdk.cfg, and then add the following parameters to the configuration file.

      [common] Retain the following default configurations. 
      logger.dir = /tmp/jindo/
      logger.sync = false
      logger.consolelogger = false
      logger.level = 0
      logger.verbose = 0
      logger.cleaner.enable = true
      hadoopConf.enable = false
      
      [jindosdk] Specify the following parameters. 
      <!-- In this example, the China (Hangzhou) region is used. Specify your actual region.  -->
      fs.oss.endpoint = cn-hangzhou.oss-dls.aliyuncs.com
      <! -- Configure the AccessKey ID and AccessKey secret that is used to access OSS-HDFS.  -->
      fs.oss.accessKeyId = yourAccessKeyId    
      fs.oss.accessKeySecret = yourAccessKeySecret                                        
    4. Configure environment variables.

      export JINDOSDK_CONF_DIR=<JINDOSDK_CONF_DIR>

      Set <JINDOSDK_CONF_DIR> to the absolute path of the jindosdk.cfg configuration file.

  2. Configure RootPolicy.

    Run the following SetRootPolicy command to specify a registered address that contains a custom prefix for a bucket:

    jindo admin -setRootPolicy oss://<bucket_name>.<dls_endpoint>/ hdfs://<your_ns_name>/

    The following table describes the parameters in the SetRootPolicy command.

    Parameter

    Description

    bucket_name

    The name of the bucket for which OSS-HDFS is enabled.

    dls_endpoint

    The endpoint of the region in which the bucket for which OSS-HDFS is enabled. Example: cn-hangzhou.oss-dls.aliyuncs.com.

    If you do not want to repeatedly add the <dls_endpoint> parameter to the SetRootPolicy command each time you run RootPolicy, you can use one of the following methods to add configuration items to the core-site.xml file of Hadoop:

    • Method 1:

      <configuration>
          <property>
              <name>fs.oss.endpoint</name>
              <value><dls_endpoint></value>
          </property>
      </configuration>
    • Method 2:

      <configuration> 
       <property>
              <name>fs.oss.bucket.<bucket_name>.endpoint</name>
              <value><dls_endpoint></value>
          </property>
      </configuration>

    your_ns_name

    The custom nsname that is used to access OSS-HDFS. A non-empty string is supported, such as test. The current version supports only the root directory.

  3. Configure Access Policy discovery address and Scheme implementation class.

    You must configure the following parameters in the core-site.xml file of Hadoop:

    <configuration>
        <property>
            <name>fs.accessPolicies.discovery</name>
            <value>oss://<bucket_name>.<dls_endpoint>/</value>
        </property>
        <property>
            <name>fs.AbstractFileSystem.hdfs.impl</name>
            <value>com.aliyun.jindodata.hdfs.HDFS</value>
        </property>
        <property>
            <name>fs.hdfs.impl</name>
            <value>com.aliyun.jindodata.hdfs.JindoHdfsFileSystem</value>
        </property>
    </configuration>

    If you want to configure Access Policy discovery addresses and Scheme implementation classes for multiple buckets, separate the buckets with commas (,).

  4. Run the following command to check whether RootPolicy is successfully configured:

    hadoop fs -ls hdfs://<your_ns_name>/

    If the following results are returned, RootPolicy is successfully configured:

    drwxr-x--x   - hdfs  hadoop          0 2023-01-05 12:27 hdfs://<your_ns_name>/apps
    drwxrwxrwx   - spark hadoop          0 2023-01-05 12:27 hdfs://<your_ns_name>/spark-history
    drwxrwxrwx   - hdfs  hadoop          0 2023-01-05 12:27 hdfs://<your_ns_name>/tmp
    drwxrwxrwx   - hdfs  hadoop          0 2023-01-05 12:27 hdfs://<your_ns_name>/user
  5. Use a custom prefix to access OSS-HDFS.

    After you restart services such as Hive and Spark, you can access OSS-HDFS by using a custom prefix.

  6. Optional. Use RootPolicy for other purposes.

    • List all registered addresses that contain a custom prefix specified for a bucket

      Run the following listAccessPolicies command to list all registered addresses that contain a custom prefix specified for a bucket:

      jindo admin -listAccessPolicies oss://<bucket_name>.<dls_endpoint>/
    • Delete all registered addresses that contain a custom prefix specified for a bucket:

      Run the following unsetRootPolicy command to delete all registered addresses that contain a custom prefix specified for a bucket:

      jindo admin -unsetRootPolicy oss://<bucket_name>.<dls_endpoint>/ hdfs://<your_ns_name>/

Using RootPolicy

Scenario 1: Use in a Notebook session

  1. Configure Spark settings.

    1. On the EMR Serverless Spark page, click Sessions in the left-side navigation pane.

    2. On the Notebook Session page, click Create Notebook Session.

    3. On the Create Notebook Session page, configure the following Spark configuration .

    spark.hadoop.fs.accessPolicies.discovery      oss://<bucketname>.cn-<region>.oss-dls.aliyuncs.com
    spark.hadoop.fs.AbstractFileSystem.hdfs.impl  com.aliyun.jindodata.hdfs.v3.HDFS
    spark.hadoop.fs.hdfs.impl                     com.aliyun.jindodata.hdfs.v3.JindoDistributedFileSystem
  2. In a Data Development Notebook task, use the custom prefix set by RootPolicy for the OSS-HDFS service.

    -- Create a table
    spark.sql("""CREATE TABLE default.my_orc_table (
        id INT,
        name STRING,
        age INT
    ) 
    location 'hdfs://<ns_name>/user/hive/warehouse/ads_user_info_1d_emr/my_orc_table/'""")
    -- Insert data
    spark.sql("""INSERT INTO table default.my_orc_table(id, name, age) VALUES (1, 'Alice', 30)""");
    -- Query data
    spark.sql("SELECT * FROM default.my_orc_table").show()
    Note

    In the CREATE TABLE statement, replace <ns_name> with the custom prefix that you use to access the OSS-HDFS service.

Note

The !hadoop fs command in Notebook does not support RootPolicy.

Scenario 2: Use in an SQL session

  1. Configure Spark settings.

    1. On the EMR Serverless Spark page, click Sessions in the left-side navigation pane.

    2. On the SQL Session page, click Connect to SQL Session.

    3. On the Connect to SQL Session page, configure the following Spark Configuration.

    spark.hadoop.fs.accessPolicies.discovery      oss://<bucketname>.cn-<region>.oss-dls.aliyuncs.com
    spark.hadoop.fs.AbstractFileSystem.hdfs.impl  com.aliyun.jindodata.hdfs.v3.HDFS
    spark.hadoop.fs.hdfs.impl                     com.aliyun.jindodata.hdfs.v3.JindoDistributedFileSystem
  2. In a Data Development SparkSQL task, use the custom prefix set by RootPolicy for the OSS-HDFS service.

    -- Create a table
    CREATE TABLE default.my_orc_table1 (
        id INT,
        name STRING,
        age INT
    ) 
    location "hdfs://<ns_name>/user/hive/warehouse/ads_user_info_1d_emr/my_orc_table1/";
    -- Insert data
    INSERT INTO table default.my_orc_table1(id, name, age)
    VALUES (1, 'Alice', 30);
    -- Query data
    select * from default.my_orc_table1;
    Note

    In the CREATE TABLE statement, replace <ns_name> with the custom prefix that you use to access the OSS-HDFS service.

Scenario 3: Use in a batch job

  1. Upload a file. This topic uses the test.sql file as an example.

    1. On the EMR Serverless Spark page, click Artifacts in the left-side navigation pane.

    2. On the Managed File Directory page, click Upload File.

    3. In the Upload File dialog box, click the upload area to select a local file, or drag the file to the upload area.

  2. Configure Spark settings.

    1. On the EMR Serverless Spark page, click Development in the left-side navigation pane.

    2. On the Development tab, click the image (New) icon.

    3. In the dialog box that appears, enter a Name, select SQL Job under Application, and then click OK.

    4. On the Task Configuration page, configure the following Spark Configuration.

      spark.hadoop.fs.accessPolicies.discovery      oss://<bucketname>.cn-<region>.oss-dls.aliyuncs.com
      spark.hadoop.fs.AbstractFileSystem.hdfs.impl  com.aliyun.jindodata.hdfs.v3.HDFS
      spark.hadoop.fs.hdfs.impl                     com.aliyun.jindodata.hdfs.v3.JindoDistributedFileSystem
  3. On the Data Development page, click Run. Set the engine version to esr-4.3.0 (Spark 3.5.2, Scala 2.12) and disable Fusion Acceleration. For resource configuration, set driver.cores to 2 CPU, driver.memory to 8 GB, executor.cores to 1 CPU, and executor.memory to 4 GB. After you submit the job, check the run history to confirm that the job status is Succeeded.

FAQ

The !hadoop fs command fails when you use a custom prefix set by RootPolicy.

  • Symptom: When you run the !hadoop fs -ls hdfs://<ns_name>/ command in a Notebook task, the following error message is returned:

    init failed, Caused by error 30004: Invalid argument: Neither fs.hdfs.test.dfs.ha.namenodes nor dfs.ha.namenodes.test is configured for HA namenodes
    ERROR: code=1002, message=ERROR: failed to init filesystem.
  • Cause: The !hadoop fs command in a notebook does not currently support RootPolicy.

  • Solution: When using the !hadoop fs command in a notebook, use the original OSS-HDFS address instead.