All Products
Search
Document Center

E-MapReduce:Manage custom configuration files

Last Updated:Nov 29, 2025

The custom configuration file feature allows you to create personalized configurations based on specific requirements and flexibly control the task execution environment. It supports multiple file formats (such as XML and JSON), ensures the security and consistency of configurations, and can be directly applied to various tasks (such as batch processing and sessions).

Prerequisites

You have created a workspace. For more information, see Workspace management.

Create a custom configuration file

  1. Go to the configuration management page.

    1. Log on to the E-MapReduce console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. On the Spark page, click the name of the target workspace.

    4. On the EMR Serverless Spark page, click Configuration Management in the left-side navigation pane.

  2. On the Configuration page, click the Custom Configuration Files tab.

  3. Click Create Custom Configuration File.

  4. On the Create Custom Configuration File page, configure the parameters and click Create.

    Parameter

    Description

    Path

    Select the file storage path.

    File Name

    Define the file name and extension. Select .txt, .xml, or .json based on the file type.

    File Content

    Enter the specific configuration content. Make sure that the content complies with the format requirements of the selected file type.

    Description

    Add a description of the file purpose to facilitate subsequent management and maintenance.

    Note

    The system predefines some key configuration files. The names and content of these files are maintained by the system. You cannot directly modify their names or overwrite their content. The file names that cannot be changed include the following: spark-defaults.conf, kyuubi-defaults.conf, executorPodTemplate.yaml, spark-pod-template.yaml, driver_log4j.xml, executor_log4j.xml, session_log4j.xml, spark.properties, and syncer_log4j.xml.

After you create a custom profile, you can click Edit or Delete in the Actions column to modify or remove the custom profile.

Use custom profiles

Example 1: Enable Ranger authentication in Spark Thrift Server

This topic uses enabling Ranger authentication for Serverless Spark Thrift Server as an example to demonstrate how to create and use custom configuration file.

  1. Create a custom configuration file.

    Create a configuration file named ranger-spark-security.xml and save it to the path /etc/spark/conf. The following example shows the data in the file.

    <configuration>
      <property>
        <name>ranger.plugin.spark.policy.cache.dir</name>
        <value>/opt/emr-hive/policycache</value>
      </property>
      <property>
        <name>ranger.plugin.spark.ambari.cluster.name</name>
        <value>serverless-spark</value>
      </property>
      <property>
        <name>ranger.plugin.spark.service.name</name>
        <value>emr-hive</value>
      </property>
      <property>
        <name>ranger.plugin.spark.policy.rest.url</name>
        <value>http://<ranger_admin_ip>:<ranger_admin_port></value>
      </property>
      <property>
        <name>ranger.plugin.spark.policy.source.impl</name>
        <value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
      </property>
      <property>
        <name>ranger.plugin.spark.super.users</name>
        <value>root</value>
      </property>
    </configuration>

    <ranger_admin_ip> and <ranger_admin_port> specify the internal IP address and port number of Ranger Admin. Configure the parameters based on your business requirements. If you connect to the Ranger service that is deployed in an EMR on ECS cluster, set the <ranger_admin_ip> parameter to the internal IP address of the master node and the <ranger_admin_port> parameter to 6080.

  2. Configure Spark Thrift Server.

    For Spark Thrift Server sessions that need to enable Ranger authentication, you must first stop the session. Select the created connection name from the Network Connection drop-down list and add the following configuration items in Spark Configuration. After you modify the Spark Thrift Server, you must restart it to make the modification take effect.

    spark.emr.serverless.user.defined.jars     /opt/ranger/ranger-spark.jar
    spark.sql.extensions                       org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension
  3. Test the connection.

    Use Spark Beeline to test the connection. For more connection methods, see Connect to Spark Thrift Server. If you access a database, table, or other resource without permission, the system will return a permission failure error message.

    Error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [test] does not have [update] privilege on [database=default/table=students/column=name]
    	at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:46)
    	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:262)
    	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:166)
    	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79)
    	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:63)
    	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:41)
    	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:166)
    	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:161)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at javax.security.auth.Subject.doAs(Subject.java:422)
    	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:175)
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	at java.lang.Thread.run(Thread.java:750)

Example 2: EMR on ECS migration tasks accessing OSS files

When migrating from EMR on ECS to Serverless Spark, if non-SQL type tasks need to access Alibaba Cloud OSS/OSS-HDFS, and the task code does not initialize the FileSystem through SparkContext#hadoopConfiguration, you must manually configure the implementation class and related access parameters for the OSS/OSS-HDFS file system. Otherwise, because Serverless Spark does not inject the core-site.xml file by default, the migrated tasks will throw errors such as UnsupportedFileSystemException due to the lack of support for OSS/OSS-HDFS.

On the Configuration page, click the Custom Configuration Files tab, create a configuration file named core-site.xml, and save it to the path /etc/spark/conf. The following example shows the code in the file:

<?xml version="1.0" ?>
<configuration>
    <property>
        <name>fs.AbstractFileSystem.oss.impl</name>
        <value>com.aliyun.jindodata.oss.OSS</value>
    </property>
    <property>
        <name>fs.oss.endpoint</name>
        <value>oss-cn-<region>-internal.aliyuncs.com</value>
    </property>
    <property>
        <name>fs.oss.impl</name>
        <value>com.aliyun.jindodata.oss.JindoOssFileSystem</value>
    </property>
    <property>
        <name>fs.oss.credentials.provider</name>
        <value>com.aliyun.jindodata.oss.auth.SimpleCredentialsProvider</value>
    </property>
    <property>
        <name>fs.oss.accessKeyId</name>
        <value>The AccessKey ID that is used to access OSS or OSS-HDFS.</value>
    </property>
    <property>
        <name>fs.oss.accessKeySecret</name>
        <value>The AccessKey secret that is used to access OSS or OSS-HDFS.</value>
    </property>
</configuration>

Replace <region> with the actual OSS region you are using, such as hangzhou.