This topic describes how to integrate Hive with Ranger and how to configure related permissions.

Prerequisites

An EMR cluster is created. For more information, see Create a cluster.
Note You must select an EMR version later than V3.26.0 when you create the cluster.

Hive access modes

You can access Hive data in three modes: HiveServer2, Hive Client, and HDFS.
  • HiveServer2
    • Scenario: Access Hive data by using HiveServer2.
    • Method: Use the Beeline client or Java Database Connectivity (JDBC) code to run Hive scripts.
    • Permission settings:

      The built-in authorization mechanism of Hive can be used for access control in this scenario. For more information about the authorization mechanism, see Hive.

      You can also grant table- or column-level permissions in Ranger. If you access Hive data by using Hive Client or HDFS, additional permissions are required.

  • Hive Client
    • Scenario: Access Hive data by using Hive Client.
    • Method: Use Hive Client to access Hive data.
    • Permission settings:

      In this scenario, Hive Client sends DDL requests such as ALTER TABLE ADD COLUMNS to Hive metastore. Hive Client also submits MapReduce jobs to read data from HDFS.

      The built-in authorization mechanism of Hive can be used for access control in this scenario. Hive checks whether you can perform DDL or DML operations, such as ALTER TABLE test ADD COLUMNS(b STRING), based on your read or write permissions on the HDFS path of a specific table in the SQL statement. For more information about the authorization mechanism, see Hive.

      You can configure permissions on HDFS paths of Hive tables in Ranger and configure storage-based authorization for Hive metastore. In this way, you can achieve access control in the scenario where Hive Client is used.

      Note In this scenario, DDL operation permissions depend on HDFS permissions. If you have HDFS permissions, you can perform DDL operations on tables, such as DROP TABLE and ALTER TABLE.
  • HDFS
    • Scenario: Access Hive data by using HDFS.
    • Method: Use an HDFS client or run HDFS code to access Hive data.
    • Permission settings:

      You must configure permissions on HDFS paths of Hive tables.

      You can use Ranger to configure the permissions. For more information, see Example of permission configuration.

Integrate Hive with Ranger

  1. Enable Hive in Ranger.
    1. Log on to the EMR console.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Cluster Management tab.
    4. On the Cluster Management page, find your cluster and click Details in the Actions column.
    5. In the left-side navigation pane, click Cluster Service and then RANGER.
    6. On the page that appears, choose Actions > EnabledHive in the upper-right corner.
      start_hive
    7. In the Cluster Activities dialog box that appears, set related parameters and click OK.
      Click History in the upper-right corner to view the task progress.View operation history
      Note In the HiveServer2 scenario, after Hive is restarted, you must configure Hive permissions in Ranger. In the Hive Client scenario, you must use HDFS permissions for access control. For more information about how to configure HDFS permissions, see HDFS.
  2. Add the Hive service on the web UI of Ranger.
    1. Log on to Ranger. For more information, see Overview.
    2. Add the Hive service.
      Ranger UI
    3. Configure relevant parameters.
      Add the Hive service
      Parameter Description
      Service Name Set the value to emr-hive.
      Username Set the value to hadoop.
      Password Enter a custom password.
      jdbc.driverClassName The class name of the JDBC driver. Default value: org.apache.hive.jdbc.HiveDriver. Use the default value.
      jdbc.url
      • Enter jdbc:hive2://emr-header-1:10000/ for a standard cluster.
      • Enter jdbc:hive2://${master1_fullhost}:10000/;principal=hive/${master1_fullhost}@EMR.$id.COM for a high-security cluster.
      Note ${master1_fullhost} indicates the long domain name of master 1. You can log on to master 1 and run the hostname command to obtain the value of ${master1_fullhost}. The number in ${master1_fullhost} is the value of $id.
      Add New Configurations
      • Name: Set the value to policy.download.auth.users.
      • Value: Set the value to hadoop for a standard cluster and hive for a high-security cluster.
    4. Click Add.
  3. Restart Hive.
    Restart Hive for the preceding settings to take effect.
    1. In the left-side navigation pane, choose Cluster Service > Hive.
    2. On the page that appears, choose Actions > Restart All Components in the upper-right corner.
    3. In the Cluster Activities dialog box that appears, set related parameters and click OK.
      Click History in the upper-right corner to view the task progress.

Example of permission configuration

For example, you can perform the following steps to grant user foo the Select permission on column a of the testdb.test table:

  1. Log on to Ranger. For more information, see Overview.
  2. Click emr-hive.
    Example of permission configuration
  3. Click Add New Policy in the upper-right corner.
  4. Configure permissions.
    Configure permissions
    Parameter Description
    Policy Name The name of the policy. You can customize a name.
    database The name of the Hive database, such as testdb.
    table The name of the table, such as test.
    Hive Column The name of the column. You can set this parameter to an asterisk (*) to indicate all columns.
    Select Group The user group to which you want to add this policy.
    Select User The user to whom you want to add this policy.
    Permissions The permissions to be granted.
  5. Click Add.
    After the policy is added, authorization is completed. User foo can access the testdb.test table.
    Note After you add, remove, or modify a policy, it can take up to one minute for the configuration to take effect.