All Products
Search
Document Center

E-MapReduce:Enable Spark in Ranger and configure related permissions

Last Updated:Oct 30, 2023

This topic describes how to enable Spark in Ranger and how to configure related permissions.

Prerequisites

A DataLake cluster is created and the Ranger service is selected for the cluster. For more information about how to create a cluster, see Create a cluster.

Precautions

  • After you enable Spark in Ranger, the Spark plug-in of Ranger is loaded to Spark Thrift Server. Permission verification is triggered only when you submit Spark SQL jobs by using Spark Thrift Server. If you submit Spark jobs by using other methods, permission verification is not triggered.

    • Access methods that require permission verification

      • Use the Beeline client to access Spark Thrift Server.

      • Use a JDBC URL to access Spark Thrift Server.

    • Access methods that do not require permission verification

      • Use the Spark SQL client to submit Spark SQL jobs.

      • Use Spark-Submit to submit Spark jobs.

      • Use the PySpark client to submit Spark SQL jobs.

  • This topic does not apply to high-security clusters, which refer to clusters for which Kerberos authentication is enabled.

  • When you use Kyuubi to access data lake tables, you cannot configure Spark permissions in Ranger.

  • For data lake tables, you can use only Hudi and Delta to configure Spark permissions in Ranger when the clusters are of the following versions:

    • EMR V3.42.0 or a later minor version

    • EMR V5.8.0 or a later minor version

Enable Spark in Ranger

  1. Go to the Services tab.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.

    3. On the EMR on ECS page, find the desired cluster and click Services in the Actions column.

  2. Enable Spark in Ranger.

    1. On the Services tab of the page that appears, click Status in the Ranger-plugin section.

    2. In the Service Overview section of the Status tab, turn on enableSpark3.

      Note

      In this example, Spark 3 is used.

    3. In the Confirm message, click OK.

  3. Restart Spark Thrift Server.

    1. On the Services tab, click Status in the Spark3 section.

    2. In the Components section, find SparkThriftServer and click Restart in the Actions column.

    3. In the dialog box that appears, configure the Execution Reason parameter and click OK.

    4. In the Confirm message, click OK.

Configure permissions

To configure permissions in Ranger, you must access the web UI of Ranger first. For more information about how to access the web UI of Ranger, see Access the web UI of Ranger. Then, you can click emr-hive in the HADOOP SQL section to configure Spark permissions.

Spark in Ranger and Hive in Ranger use the same emr-hive service for permission management and use the same method for permission configuration. For more information about permission configuration, see Enable Hive in Ranger and configure related permissions.