Custom configuration files let you control the runtime environment for EMR Serverless Spark jobs and sessions. Use them when a framework requires an XML file at a specific path at runtime, or when you are migrating non-SQL jobs from EMR on ECS and the job code does not initialize the file system through SparkContext#hadoopConfiguration.
For key-value Spark properties that do not require a dedicated file, add them directly in the Spark Configuration field instead.
Prerequisites
Before you begin, make sure you have:
A workspace. See Workspace management
Create a custom configuration file
Log on to the E-MapReduce console.
In the left-side navigation pane, choose .
On the Spark page, click the target workspace name.
On the EMR Serverless Spark page, click Configuration Management in the left-side navigation pane.
On the Configuration page, click the Custom Configuration Files tab, and then click Create Custom Configuration File.
Configure the parameters and click Create.
NoteThe system predefines a set of key configuration files whose names and content are managed internally. You cannot rename or overwrite them. The locked file names are:
spark-defaults.conf,kyuubi-defaults.conf,executorPodTemplate.yaml,spark-pod-template.yaml,driver_log4j.xml,executor_log4j.xml,session_log4j.xml,spark.properties, andsyncer_log4j.xml.Parameter Description Path The storage path for the file. File name The file name and extension. Select .txt,.xml, or.jsonbased on the file type.File content The configuration content. Make sure the content complies with the format requirements of the selected file type. Description A description of the file's purpose, to help with ongoing management.
After the file is created, click Edit or Delete in the Actions column to modify or remove it.
Examples
The following examples show two common scenarios for custom configuration files. Both create an XML file at /etc/spark/conf, which Serverless Spark picks up at job startup.
Example 1: Enable Ranger authentication for Spark Thrift Server
This example configures Ranger authentication for a Spark Thrift Server session.
Step 1: Create the Ranger security configuration file
Create a configuration file named ranger-spark-security.xml and save it to /etc/spark/conf. Use the following content:
<configuration>
<property>
<name>ranger.plugin.spark.policy.cache.dir</name>
<value>/opt/emr-hive/policycache</value>
</property>
<property>
<name>ranger.plugin.spark.ambari.cluster.name</name>
<value>serverless-spark</value>
</property>
<property>
<name>ranger.plugin.spark.service.name</name>
<value>emr-hive</value>
</property>
<property>
<name>ranger.plugin.spark.policy.rest.url</name>
<value>http://<ranger_admin_ip>:<ranger_admin_port></value>
</property>
<property>
<name>ranger.plugin.spark.policy.source.impl</name>
<value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
</property>
<property>
<name>ranger.plugin.spark.super.users</name>
<value>root</value>
</property>
</configuration>Replace the placeholders with values from your environment:
| Placeholder | Description |
|---|---|
<ranger_admin_ip> | Internal IP address of Ranger Admin. If Ranger is deployed in an EMR on ECS cluster, use the internal IP address of the master node. |
<ranger_admin_port> | Port number of Ranger Admin. For EMR on ECS deployments, use 6080. |
Step 2: Configure the Spark Thrift Server session
Stop the Spark Thrift Server session before making changes. Select the connection name from the Network Connection drop-down list, and add the following entries in Spark Configuration:
spark.emr.serverless.user.defined.jars /opt/ranger/ranger-spark.jar
spark.sql.extensions org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtensionRestart the Spark Thrift Server for the changes to take effect.
Step 3: Verify the configuration
Use Spark Beeline to connect and run a query. For connection instructions, see Connect to Spark Thrift Server.
If a user accesses a resource without sufficient privilege, Ranger returns a permission error similar to the following:
Error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [test] does not have [update] privilege on [database=default/table=students/column=name]
at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:46)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:262)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:166)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79)
at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:63)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:41)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:166)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:161)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)Example 2: Access OSS or OSS-HDFS after migrating from EMR on ECS
Problem: When migrating non-SQL jobs from EMR on ECS to Serverless Spark, jobs that access OSS or OSS-HDFS fail with UnsupportedFileSystemException. Serverless Spark does not inject core-site.xml by default, so the OSS and OSS-HDFS file system implementations are not registered—unless the job code initializes the file system through SparkContext#hadoopConfiguration.
Solution: Create a configuration file named core-site.xml and save it to /etc/spark/conf:
<?xml version="1.0" ?>
<configuration>
<property>
<name>fs.AbstractFileSystem.oss.impl</name>
<value>com.aliyun.jindodata.oss.OSS</value>
</property>
<property>
<name>fs.oss.endpoint</name>
<value>oss-cn-<region>-internal.aliyuncs.com</value>
</property>
<property>
<name>fs.oss.impl</name>
<value>com.aliyun.jindodata.oss.JindoOssFileSystem</value>
</property>
<property>
<name>fs.oss.credentials.provider</name>
<value>com.aliyun.jindodata.oss.auth.SimpleCredentialsProvider</value>
</property>
<property>
<name>fs.oss.accessKeyId</name>
<value>The AccessKey ID used to access OSS or OSS-HDFS.</value>
</property>
<property>
<name>fs.oss.accessKeySecret</name>
<value>The AccessKey secret used to access OSS or OSS-HDFS.</value>
</property>
</configuration>Replace <region> with your OSS region, for example, hangzhou.