All Products
Search
Document Center

E-MapReduce:Configure Ranger authentication for a Spark Thrift Server

Last Updated:Mar 13, 2025

Apache Ranger is a centralized framework for permission management. Using Apache Ranger and its integrated plugin with Spark, you can implement fine-grained access control over databases, tables, and columns accessed through Spark SQL, thereby enhancing data security. Spark Thrift Server sessions can be configured with a Ranger plugin to enforce data access control.

Limits

  • Currently, only engine versions esr-2.4.1 (Spark 3.3.1, Scala 2.12) and later, along with esr-3.0.1 (Spark 3.4.3, Scala 2.12) and subsequent versions, support configuring Ranger authentication for Spark Thrift Server sessions.

  • Engine versions esr-3.1.0 and later support Method 1: Use the built-in Ranger plugin.

Notes

Ranger primarily provides authentication services, but user authentication requires additional services such as LDAP. For more information, see Configure and Enable LDAP Authentication for a Spark Thrift Server.

Prerequisites

You have created a Spark Thrift Server session. For more information, see Manage Spark Thrift Server Sessions.

Procedure

1. Network preparation

Before starting the configuration, ensure network connectivity is configured so Serverless Spark can communicate with your virtual private cloud (VPC). This allows the Ranger plugin to connect to the Ranger Admin service and retrieve permissions. For detailed instructions, see Network Connectivity Between EMR Serverless Spark and Other VPCs.

2. Configure Ranger Plugin

To enable Ranger authentication for a Spark Thrift Server session, first stop the session. In the Network Connectivity drop-down list, select the established connection. In Spark Configuration, add the necessary configuration items. After editing, restart the session to apply the changes.

Method 1: use the built-in Ranger plugin

Important

This method is applicable only to engine versions esr-3.1.0 and later.

spark.ranger.plugin.enabled                true
spark.jars                                 /opt/ranger/ranger-spark.jar
ranger.plugin.spark.policy.rest.url        http://<ranger_admin_ip>:<ranger_admin_port>

Replace <ranger_admin_ip> and <ranger_admin_port> with the internal IP address and port of your Ranger Admin service. For Alibaba Cloud EMR on ECS clusters, use the internal IP of the Master node and port 6080 for <ranger_admin_ip> and <ranger_admin_port>.

Method 2: use a custom Ranger plugin

To customize the Ranger plugin, upload the custom plugin to OSS and specify the JAR and class names using the parameters below.

spark.jars                                 oss://<bucket>/path/to/user-ranger-spark.jar
spark.ranger.plugin.class                  <class_name>
spark.ranger.plugin.enabled                true
ranger.plugin.spark.policy.rest.url        http://<ranger_admin_ip>:<ranger_admin_port>

Update the following information based on your situation:

  • spark.jars: Enter the OSS path to the custom JAR.

  • spark.ranger.plugin.class: Enter the class name of the Spark extension for the custom Ranger plugin.

  • <ranger_admin_ip> and <ranger_admin_port>: Enter the internal IP address and port for the Ranger Admin service. When connecting to the Ranger service on an Alibaba Cloud EMR ECS cluster, use the internal IP address of the Master node for <ranger_admin_ip>, and set <ranger_admin_port> to 6080.

3. (Optional) Configure Ranger Audit

Ranger allows you to choose a service to store audit logs, such as Solr or HDFS. By default, Serverless Spark does not enable Ranger audit. If required, add Ranger audit parameters in Spark Configuration.

For instance, to configure a connection to Solr in EMR, include the following configuration in Spark Configuration.

xasecure.audit.is.enabled                  true
xasecure.audit.destination.solr            true
xasecure.audit.destination.solr.urls       http://<solr_ip>:<solr_port>/solr/ranger_audits
xasecure.audit.destination.solr.user       <user>
xasecure.audit.destination.solr.password   <password>

The parameters are as follows:

  • xasecure.audit.is.enabled: Indicates whether Ranger audit is enabled.

  • xasecure.audit.destination.solr: Determines if audit logs are stored in Solr.

  • xasecure.audit.destination.solr.urls: This parameter specifies the URL of the Solr service. Replace <solr_ip> with the Solr service's IP address and <solr_port> with the corresponding port. Fill in other URL details as required by your setup.

  • If Solr has basic authentication enabled, provide the username and password with xasecure.audit.destination.solr.user and xasecure.audit.destination.solr.password.

    For EMR on ECS, find the configuration for xasecure.audit.destination.solr.urls, xasecure.audit.destination.solr.user, and xasecure.audit.destination.solr.password in the Ranger-plugin service's ranger-spark-audit.xml file.

    image

After successful configuration, audit logs for user access can be viewed on the Ranger UI's Access tab. For instructions on accessing the Ranger UI, see Access the Web Interface of Open Source Components Through the Console.

Note

Audit logs can only be viewed on the Ranger UI if stored in Solr. If stored in HDFS or by other methods not supported by the Ranger UI, these logs cannot be accessed through the UI.

image

4. Connection test

When accessing resources without the necessary permissions via Spark Beeline, an error like the one below will occur.

0: jdbc:hive2://pre-emr-spark-gateway-cn-hang> create table test(id int);
Error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [test] does not have [create] privilege on [database=testdb/table=test]
	at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:44)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:325)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:230)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79)
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:63)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:230)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:225)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:239)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Note
  • When verifying permissions, consider the default permissions in Ranger. For example, all users can switch and create databases, and resource owners have full access to their resources. To accurately test permissions, verify User B's permissions on resources created by User A, not their own resources.

  • Incorrect Ranger Admin service configuration may result in SQL commands executing without permission errors, indicating an ineffective configuration.