All Products
Search
Document Center

E-MapReduce:FAQ

Last Updated:Nov 29, 2025

This topic answers frequently asked questions about EMR Serverless Spark.

DLF compatibility

What do I do if a "java.net.UnknownHostException" error occurs when I read data?

  • Symptom

    When you run an SQL query in Development to read data from a Data Lake Formation (DLF) 1.0 data table, an UnknownHostException error occurs.

    image

  • Cause

    This error usually occurs because the system cannot find the specified host and therefore fails to query the data table.

  • Solution

    The configuration method depends on whether the Hadoop Distributed File System (HDFS) cluster is configured in high availability (HA) mode.

    • Accessing HDFS paths without HA

      When the location of a table points to an HDFS path that does not have high availability (HA) configured, you only need to ensure that the domain name in the location is accessible. By default, master-1-1.<cluster-id>.<region>.emr.aliyuncs.com is directly accessible. For other domain names, you must add mappings. For more information, see Manage Domain Names.

    • Accessing HDFS paths with HA

      If the table location points to an HDFS path where HA is enabled, you must first configure the domain name mappings. Then, create a configuration file named hdfs-site.xml and save it to the /etc/spark/conf path. For more information, see Manage custom configurations. This step ensures that the Java Runtime or Fusion Runtime can access the data. The following is a sample file. The content must be based on the hdfs-site.xml file in the EMR on ECS cluster.

      <?xml version="1.0"?>
      <configuration>
        <property>
          <name>dfs.nameservices</name>
          <value>hdfs-cluster</value>
        </property>
        <property>
          <name>dfs.ha.namenodes.hdfs-cluster</name>
          <value>nn1,nn2,nn3</value>
        </property>
        <property>
          <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
          <value>master-1-1.<cluster-id>.<region-id>.emr.aliyuncs.com:<port></value>
        </property>
        <property>
          <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
          <value>master-1-2.<cluster-id>.<region-id>.emr.aliyuncs.com:<port></value>
        </property>
        <property>
          <name>dfs.namenode.rpc-address.hdfs-cluster.nn3</name>
          <value>master-1-3.<cluster-id>.<region-id>.emr.aliyuncs.com:<port></value>
        </property>
        <property>
          <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
          <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
      </configuration>

OSS compatibility

How do I access OSS resources across accounts?

When you run Spark jobs in EMR Serverless Spark, you can use two methods to access Object Storage Service (OSS) resources that belong to a different Alibaba Cloud account. You can grant permissions at the workspace level or configure settings at the task or session level.

  • Workspace level

    Configure the bucket policy for the target OSS bucket to grant the execution role of your Serverless Spark workspace read and write permissions. Follow these steps:

    1. Go to the bucket policy page of the OSS console.

      1. Log on to the OSS console.

      2. In the left-side navigation pane, click Buckets. On the Buckets page, find and click the desired bucket.

      3. In the navigation pane on the left, choose Permission Control > Bucket Policy.

    2. On the Bucket Policy page, on the Add in GUI tab, click Authorize.

    3. In the Authorize panel, configure the parameters, and then click OK.

      Parameter

      Description

      Applied To

      Select Whole Bucket.

      Authorized User

      Select Other Accounts. Set Principal to arn:sts::<uid>:assumed-role/<role-name>/*. In this value:

      • Replace <uid> with the Alibaba Cloud account ID.

      • Replace <role-name> with the name of the execution role for the Serverless Spark workspace. The name is case-sensitive. To view the execution role, go to the EMR Serverless Spark workspace list page and click Details in the Actions column of the target workspace. The default role is AliyunEMRSparkJobRunDefaultRole.

      Configure other parameters as needed. For more information, see Configure a bucket policy using the GUI.

  • Task and session level

    When you create a task or session, add the following configurations in the Spark Configurations section to access the target OSS bucket.

    spark.hadoop.fs.oss.bucket.<bucketName>.endpoint <endpoint>
    spark.hadoop.fs.oss.bucket.<bucketName>.credentials.provider com.aliyun.jindodata.oss.auth.SimpleCredentialsProvider
    spark.hadoop.fs.oss.bucket.<bucketName>.accessKeyId <accessID>
    spark.hadoop.fs.oss.bucket.<bucketName>.accessKeySecret <accessKey>

    Replace the following information as needed.

    • <bucketName>: The name of the OSS bucket that you want to access.

    • <endpoint>: The endpoint of OSS.

    • <accessID>: The AccessKey ID of the Alibaba Cloud account used to access the OSS data.

    • <accessKey>: The AccessKey secret of the Alibaba Cloud account used to access the OSS data.

S3 compatibility

How do I access S3?

When you create a task or session, add the following configurations in the Spark Configurations section to access S3.

spark.hadoop.fs.s3.impl com.aliyun.jindodata.s3.JindoS3FileSystem
spark.hadoop.fs.AbstractFileSystem.s3.impl com.aliyun.jindodata.s3.S3
spark.hadoop.fs.s3.bucket.<bucketName>.accessKeyId <accessID>
spark.hadoop.fs.s3.bucket.<bucketName>.accessKeySecret <accessKey> 
spark.hadoop.fs.s3.bucket.<bucketName>.endpoint <endpoint>
spark.hadoop.fs.s3.credentials.provider com.aliyun.jindodata.s3.auth.SimpleCredentialsProvider

Replace the following information as needed.

  • <bucketName>: The name of the S3 bucket that you want to access.

  • <endpoint>: The endpoint of S3.

  • <accessID>: The AccessKey ID of the account used to access the S3 data.

  • <accessKey>: The AccessKey secret of the account used to access the S3 data.

OBS compatibility

How do I access OBS?

When you create a task or session, add the following configurations in the Spark Configurations section to access OBS.

spark.hadoop.fs.obs.impl com.aliyun.jindodata.obs.JindoObsFileSystem
spark.hadoop.fs.AbstractFileSystem.obs.impl com.aliyun.jindodata.obs.OBS
spark.hadoop.fs.obs.bucket.<bucketName>.accessKeyId <accessID>
spark.hadoop.fs.obs.bucket.<bucketName>.accessKeySecret <accessKey> 
spark.hadoop.fs.obs.bucket.<bucketName>.endpoint <endpoint>
spark.hadoop.fs.obs.credentials.provider com.aliyun.jindodata.obs.auth.SimpleCredentialsProvider

Replace the following information as needed.

  • <bucketName>: The name of the OBS bucket that you want to access.

  • <endpoint>: The endpoint of OBS.

  • <accessID>: The AccessKey ID of the account used to access the OBS data.

  • <accessKey>: The AccessKey secret of the account used to access the OBS data.