This topic describes how to integrate Spark with Ranger and how to configure related permissions.

Prerequisites

An EMR cluster is created. For more information, see Create a cluster.
Note You must select an EMR version later than V3.26.0 when you create the cluster.

Background information

You can integrate Spark with Ranger and use Ranger to control related permissions. This integration applies only when Spark Thrift Server is used to execute Spark SQL queries. For example, based on the integration, you can connect to Spark Thrift Server over the Beeline client or JDBC of Spark and submit a Spark SQL job.

Integrate Spark SQL with Ranger

  1. Integrate Hive with Ranger.
    Spark SQL and Hive share permission configurations in Ranger. To control Spark SQL permissions by using Ranger, you must integrate Hive with Ranger. For more information, see Hive.
  2. Enable Spark in Ranger.
    1. On the Cluster Management page, find the target cluster and click Details in the Actions column.
    2. In the left-side navigation pane, click Cluster Service and then RANGER.
    3. Select EnabledSpark from the Actions drop-down list in the upper-right corner.
      ranger_spark
    4. In the Cluster Activities dialog box that appears, set related parameters and click OK.
      Click History in the upper-right corner to view the task progress.
  3. After the task is complete, restart Spark Thrift Server.
    1. In the left-side navigation pane, click Cluster Service and then Spark.
    2. Select Restart ThriftServer from the Actions drop-down list in the upper-right corner.
    3. In the Cluster Activities dialog box that appears, set related parameters and click OK.
      Click History in the upper-right corner to view the task progress.

Permission configuration example

Grant user foo the Select permission on column a of the testdb.test table.

  1. Click emr-hive.
    Permission configuration example
  2. Click Add New Policy in the upper-right corner.
  3. Configure permissions.
    Configure permissions
    Parameter Description
    Policy Name The name of the policy. You can customize a name.
    database The name of the Hive database, such as testdb.
    table The name of the table, such as test.
    Hive Column The name of the column. You can set this parameter to an asterisk (*) to indicate all columns.
    Select Group The user group to which you want to add this policy.
    Select User The user to whom you want to add this policy.
    Permissions The permissions to be granted.
  4. Click Add.
    After the policy is added, authorization is completed. User foo can access the testdb.test table.
    Note After you add, remove, or modify a policy, it can take up to one minute for the configuration to take effect.