This topic describes how to integrate Spark with Ranger and how to configure related permissions.

Background information

You can integrate Spark with Ranger and use Ranger to control related permissions. This integration applies only when Spark Thrift Server is used to execute Spark SQL queries. For example, you can connect to Spark Thrift Server over the Beeline client or Java Database Connectivity (JDBC) of Spark and submit a Spark SQL job.

Prerequisites

An E-MapReduce (EMR) Hadoop cluster is created, and Ranger is selected from the optional services when you create the cluster. For more information, see Create a cluster.

Usage notes

This topic does not apply to high-security clusters, which refer to clusters with Kerberos authentication enabled.

Integrate Spark SQL with Ranger

  1. In the Alibaba Cloud EMR console, integrate Hive with Ranger. For more information, see Hive.
    Spark SQL and Hive share permission configurations in Ranger. To share the permission configurations of Hive to Spark SQL, you must first integrate Hive with Ranger.
  2. Enable Spark in Ranger.
    1. Go to the Cluster Management page in the Alibaba Cloud EMR console. Find your cluster and click Details in the Actions column.
    2. In the left-side navigation pane, choose Cluster Service > RANGER.
    3. On the page that appears, choose Actions > EnabledSpark in the upper-right corner.
      ranger_spark

  3. After the task is completed, restart Spark Thrift Server.
    1. In the left-side navigation pane, choose Cluster Service > Spark.
    2. On the page that appears, choose Actions > Restart ThriftServer in the upper-right corner.

Example of permission configuration

For example, you can perform the following steps to grant user foo the Select permission on column a of the testdb.test table:

  1. Log on to Ranger. For more information, see Overview.
  2. Click emr-hive.
    Example of permission configurations
    The following figure shows the web UI of Ranger 2.1.0.Ranger-2.1.0-2
  3. Click Add New Policy in the upper-right corner.
  4. Configure permissions.
    Configure permissions
    Parameter Description
    Policy Name The name of the policy. You can customize a name.
    database The name of the Hive database, such as testdb.
    table The name of the table, such as test.
    Hive Column The name of the column. You can set this parameter to an asterisk (*) to indicate all columns.
    Select Group The user group to which you want to add this policy.
    Select User The user to whom you want to add this policy.
    Permissions The permissions to be granted.
  5. Click Add.
    After the policy is added, authorization is completed. User foo can access the testdb.test table.
    Note After you add, remove, or modify a policy, it takes about one minute for the configuration to take effect.