This topic describes how to integrate Spark with Ranger and how to configure related permissions.

Prerequisites

An E-MapReduce (EMR) Hadoop cluster is created, and Ranger is selected from the optional services during the cluster creation. For more information about how to create a cluster, see Create a cluster.

Background information

You can integrate Spark with Ranger and use Ranger to control related permissions. This integration applies only when Spark Thrift Server is used to execute Spark SQL queries. For example, you can connect to Spark Thrift Server over the Beeline client or JDBC of Spark and submit a Spark SQL job.

Integrate Spark SQL with Ranger

  1. In the Alibaba Cloud EMR console, integrate Hive with Ranger. For more information, see Hive.
    Spark SQL and Hive share permission configurations in Ranger. To share the permission configurations of Hive to Spark SQL, you must first integrate Hive with Ranger.
  2. Enable Spark in Ranger.
    1. Go to the Cluster Management page in the Alibaba Cloud EMR console. Find your cluster and click Details in the Actions column.
    2. In the left-side navigation pane, choose Cluster Service > RANGER.
    3. On the page that appears, choose Actions > EnabledSpark in the upper-right corner.
      ranger_spark
    4. In the Cluster Activities dialog box that appears, set related parameters and click OK.
      Click History in the upper-right corner to view the task progress.
  3. After the task is completed, restart Spark Thrift Server.
    1. In the left-side navigation pane, choose Cluster Service > Spark.
    2. On the page that appears, choose Actions > Restart ThriftServer in the upper-right corner.
    3. In the Cluster Activities dialog box that appears, set related parameters and click OK.
      Click History in the upper-right corner to view the task progress.

Example of permission configuration

For example, you can perform the following steps to grant user foo the Select permission on column a of the testdb.test table:

  1. Log on to Ranger. For more information, see Overview.
  2. Click emr-hive.
    Example of permission configuration
  3. Click Add New Policy in the upper-right corner.
  4. Configure permissions.
    Configure permissions
    Parameter Description
    Policy Name The name of the policy. You can customize a name.
    database The name of the Hive database, such as testdb.
    table The name of the table, such as test.
    Hive Column The name of the column. You can set this parameter to an asterisk (*) to indicate all columns.
    Select Group The user group to which you want to add this policy.
    Select User The user to whom you want to add this policy.
    Permissions The permissions to be granted.
  5. Click Add.
    After the policy is added, authorization is completed. User foo can access the testdb.test table.
    Note After you add, remove, or modify a policy, it can take up to one minute for the configuration to take effect.