This topic describes how to integrate Hive with Ranger and how to configure related permissions.

Prerequisites

An E-MapReduce (EMR) Hadoop cluster is created, and Ranger is selected from the optional services when you create the cluster. For more information, see Create a cluster.

Hive data access methods

You can use HiveServer2, Hive Client, and HDFS.
  • HiveServer2
    • Scenario: Access Hive data by using HiveServer2.
    • Method: Use the Beeline client or Java Database Connectivity (JDBC) code to run Hive scripts.
    • Permission settings:

      In this scenario, you can use the built-in authorization mechanism of Hive to configure access control policies. For more information about the authorization mechanism, see Hive.

      You can grant table-level or column-level permissions in Ranger. If you want to access Hive data by using Hive Client or HDFS, you must configure additional permissions.

  • Hive Client
    • Scenario: Access Hive data by using Hive Client.
    • Method: Use Hive Client to access Hive data.
    • Permission settings:

      In this scenario, Hive Client sends DDL requests such as ALTER TABLE ADD COLUMNS to Hive metastore. Hive Client also submits MapReduce jobs to read data from HDFS.

      In this scenario, you can use the built-in authorization mechanism of Hive to configure access control policies. Hive checks whether you can perform DDL operations or DML operations, such as ALTER TABLE test ADD COLUMNS(b STRING), based on the read or write permissions that you granted on the HDFS path of a specific table in the SQL statement. For more information about the authorization mechanism, see Hive.

      You can configure permissions on HDFS paths of Hive tables in Ranger and you can configure storage-based authorization for Hive metastore. This way, you can manage access control policies when you use Hive Client to access Hive data.

      Note In this scenario, DDL operation permissions depend on HDFS permissions. To perform DDL operations such as DROP TABLE and ALTER TABLE, you must have HDFS permissions.
  • HDFS
    • Scenario: Access Hive data by using HDFS.
    • Method: Use an HDFS client or run HDFS code to access Hive data.
    • Permission settings:

      You must configure permissions on HDFS paths of Hive tables.

      You can use Ranger to configure the permissions. For more information, see Examples of permission configurations.

Integrate Hive with Ranger

  1. Enable Hive in Ranger.
    1. Log on to the Alibaba Cloud EMR console.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Cluster Management tab.
    4. On the Cluster Management page, find your cluster and click Details in the Actions column.
    5. In the left-side navigation pane, click Cluster Service and then RANGER.
    6. On the page that appears, choose Actions > EnabledHive in the upper-right corner.
      start_hive
    7. Perform the following operations on the cluster:
      Click History in the upper-right corner to view the task progress. View operation history
      Note When you use HiveServer2, you must configure Hive permissions in Ranger after Hive is started in Ranger. When you use Hive Client, you must use HDFS permissions to perform access control. For more information about how to configure HDFS permissions, see HDFS.
  2. Add the Hive service to the web UI of Ranger.
    1. Log on to Ranger. For more information, see Overview.
    2. Add the Hive service.
      Ranger UI
      The following figure shows the web UI of Ranger 2.1.0.Ranger-2.1.0
    3. Configure the parameters.
      Add the Hive service
      Parameter Description
      Service Name Set the value to emr-hive.
      Username Set the value to hadoop.
      Password Enter a custom password.
      jdbc.driverClassName The class name of the JDBC driver. Default value: org.apache.hive.jdbc.HiveDriver. Use the default value.
      jdbc.url
      • Enter jdbc:hive2://emr-header-1:10000/ for a standard cluster.
      • Enter jdbc:hive2://${master1_fullhost}:10000/;principal=hive/${master1_fullhost}@EMR.$id.COM for a high-security cluster.
      Note ${master1_fullhost} specifies the long domain name of master 1. You can log on to master 1 and run the hostname command to obtain the value of ${master1_fullhost}. The value of ${master1_fullhost} is equal to the value of $id.
      Add New Configurations
      • Name: Set the value to policy.download.auth.users.
      • Value: Set the value to hadoop for a standard cluster and set the value to hive for a high-security cluster.
    4. Click Add.
  3. Restart Hive.
    Restart Hive for the preceding configurations to take effect.
    1. In the left-side navigation pane, choose Cluster Service > Hive.
    2. On the page that appears, choose Actions > Restart All Components in the upper-right corner.
    3. Perform the following operations on the cluster:
      1. In the Cluster Activities dialog box, specify Description and click OK.
      2. In the Confirm message, click OK.
      3. Click History in the upper-right corner to view the task progress.

Examples of permission configurations

  • Example 1: Grant user foo the Select permission on column a of the testdb.test table
    1. Log on to Ranger. For more information, see Overview.
    2. Click emr-hive. Example of permission configurations
      The following figure shows the web UI of Ranger 2.1.0.Ranger-2.1.0-2
    3. Click Add New Policy in the upper-right corner.
    4. Configure permissions Configure permissions
      Parameter Description
      Policy Name The name of the policy. You can customize a name.
      database The name of the Hive database, such as testdb.
      table The name of the table, such as test.
      Hive Column The name of the column. You can set the value of this parameter to an asterisk (*) to indicate all columns.
      Select Group The user group to which you want to add this policy.
      Select User The user to whom you want to add this policy.
      Permissions The permissions that you want to grant.
    5. Click Add.
      When you add the policy, the authorization is completed. User foo can access the testdb.test table.
      Note After you add, remove, or modify a policy, it takes about one minute for the configuration to take effect.
  • Example 2: Configure permissions on a URL
    Note The web UI of Ranger varies based on different versions. In this example, Ranger 2.1.0 is used.
    After you configure access control policies for Hive in Ranger, permissions on the URL of an external data source are verified when you use an external table to access data in the external data source, such as Object Storage Service (OSS). You can perform the following steps to configure a policy for permissions on a URL: In the Policy Details section of the Create Policy dialog box, switch from database to url, specify the path that stores external data in the url field, and then grant read and write permissions to a user. Configure permissions on a URL
    If you do not want to configure URL authentication for external data, the default value all - url is used for the Policy Name parameter in the Policy Details section. Select public from the Select Group drop-down list in the Allow Conditions section. The configurations indicate that all users can pass the URL authentication. All Policy