Limits
-
Currently, only engine versions esr-2.4.1 (Spark 3.3.1, Scala 2.12) and later, along with esr-3.0.1 (Spark 3.4.3, Scala 2.12) and subsequent versions, support configuring Ranger authentication for Spark Thrift Server sessions.
-
Engine versions esr-3.1.0 and later support Method 1: Use the built-in Ranger plugin.
Notes
Ranger primarily provides authentication services, but user authentication requires additional services such as LDAP. For more information, see Configure and Enable LDAP Authentication for a Spark Thrift Server.
Prerequisites
You have created a Spark Thrift Server session. For more information, see Manage Spark Thrift Server Sessions.
Procedure
1. Network preparation
Before starting the configuration, ensure network connectivity is configured so Serverless Spark can communicate with your virtual private cloud (VPC). This allows the Ranger plugin to connect to the Ranger Admin service and retrieve permissions. For detailed instructions, see Network Connectivity Between EMR Serverless Spark and Other VPCs.
2. Configure Ranger Plugin
To enable Ranger authentication for a Spark Thrift Server session, first stop the session. In the Network Connectivity drop-down list, select the established connection. In Spark Configuration, add the necessary configuration items. After editing, restart the session to apply the changes.
Method 1: use the built-in Ranger plugin
This method is applicable only to engine versions esr-3.1.0 and later.
spark.ranger.plugin.enabled true
spark.jars /opt/ranger/ranger-spark.jar
ranger.plugin.spark.policy.rest.url http://<ranger_admin_ip>:<ranger_admin_port>
Replace <ranger_admin_ip>
and <ranger_admin_port>
with the internal IP address and port of your Ranger Admin service. For Alibaba Cloud EMR on ECS clusters, use the internal IP of the Master node and port 6080 for <ranger_admin_ip>
and <ranger_admin_port>
.
Method 2: use a custom Ranger plugin
To customize the Ranger plugin, upload the custom plugin to OSS and specify the JAR and class names using the parameters below.
spark.jars oss://<bucket>/path/to/user-ranger-spark.jar
spark.ranger.plugin.class <class_name>
spark.ranger.plugin.enabled true
ranger.plugin.spark.policy.rest.url http://<ranger_admin_ip>:<ranger_admin_port>
Update the following information based on your situation:
-
spark.jars
: Enter the OSS path to the custom JAR. -
spark.ranger.plugin.class
: Enter the class name of the Spark extension for the custom Ranger plugin. -
<ranger_admin_ip>
and<ranger_admin_port>
: Enter the internal IP address and port for the Ranger Admin service. When connecting to the Ranger service on an Alibaba Cloud EMR ECS cluster, use the internal IP address of the Master node for<ranger_admin_ip>
, and set<ranger_admin_port>
to 6080.
3. (Optional) Configure Ranger Audit
Ranger allows you to choose a service to store audit logs, such as Solr or HDFS. By default, Serverless Spark does not enable Ranger audit. If required, add Ranger audit parameters in Spark Configuration.
For instance, to configure a connection to Solr in EMR, include the following configuration in Spark Configuration.
xasecure.audit.is.enabled true
xasecure.audit.destination.solr true
xasecure.audit.destination.solr.urls http://<solr_ip>:<solr_port>/solr/ranger_audits
xasecure.audit.destination.solr.user <user>
xasecure.audit.destination.solr.password <password>
The parameters are as follows:
-
xasecure.audit.is.enabled
: Indicates whether Ranger audit is enabled. -
xasecure.audit.destination.solr
: Determines if audit logs are stored in Solr. -
xasecure.audit.destination.solr.urls
: This parameter specifies the URL of the Solr service. Replace<solr_ip>
with the Solr service's IP address and<solr_port>
with the corresponding port. Fill in other URL details as required by your setup. -
If Solr has basic authentication enabled, provide the username and password with
xasecure.audit.destination.solr.user
andxasecure.audit.destination.solr.password
.For EMR on ECS, find the configuration for
xasecure.audit.destination.solr.urls
,xasecure.audit.destination.solr.user
, andxasecure.audit.destination.solr.password
in the Ranger-plugin service's ranger-spark-audit.xml file.
After successful configuration, audit logs for user access can be viewed on the Ranger UI's Access tab. For instructions on accessing the Ranger UI, see Access the Web Interface of Open Source Components Through the Console.
Audit logs can only be viewed on the Ranger UI if stored in Solr. If stored in HDFS or by other methods not supported by the Ranger UI, these logs cannot be accessed through the UI.
4. Connection test
When accessing resources without the necessary permissions via Spark Beeline, an error like the one below will occur.
0: jdbc:hive2://pre-emr-spark-gateway-cn-hang> create table test(id int);
Error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [test] does not have [create] privilege on [database=testdb/table=test]
at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:44)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:325)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:230)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79)
at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:63)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:230)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:225)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:239)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
-
When verifying permissions, consider the default permissions in Ranger. For example, all users can switch and create databases, and resource owners have full access to their resources. To accurately test permissions, verify User B's permissions on resources created by User A, not their own resources.
-
Incorrect Ranger Admin service configuration may result in SQL commands executing without permission errors, indicating an ineffective configuration.