This topic describes how to integrate Hive with Ranger and how to configure related permissions.

Hive access modes

You can access Hive data in three modes: HiveServer2, Hive Client, and HDFS.
  • HiveServer2
    • Scenario: Access Hive data by using HiveServer2.
    • Method: Use the Beeline client or Java Database Connectivity (JDBC) code to run Hive scripts.
    • Permission settings:

      The built-in authorization mechanism of Hive can be used for access control in this scenario. For more information about the authorization mechanism, see Hive authorization.

      You can also grant table- or column-level permissions in Ranger. If you access Hive data by using Hive Client or HDFS, additional permissions are required.

  • Hive Client
    • Scenario: Access Hive data by using Hive Client.
    • Method: Use Hive Client to access Hive data.
    • Permission settings:

      In this scenario, Hive Client sends DDL requests such as ALTER TABLE ADD COLUMNS to Hive metastore. Hive Client also submits MapReduce jobs to read data from HDFS.

      The built-in authorization mechanism of Hive can be used for access control in this scenario. Hive checks whether you can perform DDL or DML operations, such as ALTER TABLE test ADD COLUMNS(b STRING), based on your read or write permissions on the HDFS path of the target table in the SQL statement. For more information about the authorization mechanism, see Hive authorization.

      You can configure permissions on HDFS paths of Hive tables in Ranger and configure Storage Based Authorization for Hive metastore. In this way, you can achieve access control in the scenario where Hive Client is used.

      Note In this scenario, DDL operation permissions depend on HDFS permissions. If you have HDFS permissions, you can perform DDL operations on tables, such as DROP TABLE and ALTER TABLE.
  • HDFS
    • Scenario: Access Hive data by using HDFS.
    • Method: Use an HDFS client or run HDFS code to access Hive data.
    • Permission settings:

      You must configure permissions on HDFS paths of Hive tables.

      You can use Ranger to configure the permissions. For more information, see Permission configuration example.

Integrate Hive with Ranger

  1. Enable Hive in Ranger.
    1. Log on to the Alibaba Cloud E-MapReduce console.
    2. In the top navigation bar, select the region where your cluster resides. Select the resource group as required. By default, all resources of the account appear.
    3. Click the Cluster Management tab.
    4. On the Cluster Management page that appears, find the target cluster and click Details in the Actions column.
    5. In the left-side navigation pane, click Cluster Service and then RANGER.
    6. Select EnabledHive from the Actions drop-down list in the upper-right corner.
      start_hive
    7. In the Cluster Activities dialog box that appears, set related parameters and click OK.
      Click History in the upper-right corner to view the task progress.View operation history
      Note After Hive is restarted, in the HiveServer2 scenario, you need to configure Hive permissions in Ranger. In the Hive Client scenario, you need to use HDFS permissions to control Hive permissions. For more information about how to configure HDFS permissions, see HDFS.
  2. Add the Hive service on the web UI of Ranger.
    1. Log on to Ranger. For more information, see Overview.
    2. Add the Hive service.
      Ranger UI
    3. Configure required parameters.
      Add the Hive service
      Parameter Description
      Service Name Set the value to emr-hive.
      Username Set the value to hadoop.
      Password Enter a custom password.
      jdbc.driverClassName The class name of the JDBC driver. Default value: org.apache.hive.jdbc.HiveDriver. Use the default value.
      jdbc.url
      • Enter jdbc:hive2://emr-header-1:10000/ for a standard cluster.
      • Enter jdbc:hive2://${master1_fullhost}:10000/;principal=hive/${master1_fullhost}@EMR.$id.COM for a high-security cluster.
      Note ${master1_fullhost} indicates the long domain name of master 1. You can log on to master 1 and run the hostname command to obtain the value of${master1_fullhost}. The number in ${master1_fullhost} is the value of $id.
      Add New Configurations
      • Name: Set the value to policy.download.auth.users.
      • Value: Set the value to hadoop for a standard cluster and hive for a high-security cluster.
      Note If the connectivity test fails, you can ignore the failure.
    4. Click Add.
  3. Restart Hive.
    Restart Hive for the preceding settings to take effect.
    1. In the left-side navigation pane, click Cluster Service and then Hive.
    2. Select Restart All Components from the Actions drop-down list in the upper-right corner.
    3. In the Cluster Activities dialog box that appears, set related parameters and click OK.
      Click History in the upper-right corner to view the task progress.

Permission configuration example

After Hive is integrated with Ranger, you can configure Hive permissions in Ranger. For example, you can perform the following steps to grant user foo the Select permission on column a of the testdb.test table.

  1. Click emr-hive.
    Permission configuration example
  2. Click Add New Policy in the upper-right corner.
  3. Configure permissions.
    Configure permissions
    Parameter Description
    Policy Name The name of the policy. You can customize a name.
    database The name of the Hive database, such as testdb.
    table The name of the table, such as test.
    Hive Column The name of the column. You can set this parameter to an asterisk (*) to indicate all columns.
    Select Group The user group to which you want to add this policy.
    Select User The user to whom you want to add this policy.
    Permissions The permissions to be granted.
  4. Click Add.
    After the policy is added, authorization is completed. User foo can access the testdb.test table.
    Note After you add, remove, or modify a policy, it can take up to one minute for the configuration to take effect.