This topic describes how to integrate Hive with Ranger and how to configure related permissions.
Prerequisites
An E-MapReduce (EMR) Hadoop cluster is created, and Ranger is selected from the optional services when you create the cluster. For more information, see Create a cluster.
Hive data access methods
- HiveServer2
- Scenario: Access Hive data by using HiveServer2.
- Method: Use the Beeline client or Java Database Connectivity (JDBC) code to run Hive scripts.
- Permission settings:
In this scenario, you can use the built-in authorization mechanism of Hive to configure access control policies. For more information about the authorization mechanism, see Hive.
You can grant table-level or column-level permissions in Ranger. If you want to access Hive data by using Hive Client or HDFS, you must configure additional permissions.
- Hive Client
- Scenario: Access Hive data by using Hive Client.
- Method: Use Hive Client to access Hive data.
- Permission settings:
In this scenario, Hive Client sends DDL requests such as
ALTER TABLE ADD COLUMNS
to Hive metastore. Hive Client also submits MapReduce jobs to read data from HDFS.In this scenario, you can use the built-in authorization mechanism of Hive to configure access control policies. Hive checks whether you can perform DDL operations or DML operations, such as
ALTER TABLE test ADD COLUMNS(b STRING)
, based on the read or write permissions that you granted on the HDFS path of a specific table in the SQL statement. For more information about the authorization mechanism, see Hive.You can configure permissions on HDFS paths of Hive tables in Ranger and you can configure storage-based authorization for Hive metastore. This way, you can manage access control policies when you use Hive Client to access Hive data.
Note In this scenario, DDL operation permissions depend on HDFS permissions. To perform DDL operations such as DROP TABLE and ALTER TABLE, you must have HDFS permissions.
- HDFS
- Scenario: Access Hive data by using HDFS.
- Method: Use an HDFS client or run HDFS code to access Hive data.
- Permission settings:
You must configure permissions on HDFS paths of Hive tables.
You can use Ranger to configure the permissions. For more information, see Examples of permission configurations.
Integrate Hive with Ranger
Examples of permission configurations
- Example 1: Grant user foo the Select permission on column a of the testdb.test table
- Log on to Ranger. For more information, see Overview.
- Click emr-hive. The following figure shows the web UI of Ranger 2.1.0.
- Click Add New Policy in the upper-right corner.
- Configure permissions
Parameter Description Policy Name The name of the policy. You can customize a name. database The name of the Hive database, such as testdb. table The name of the table, such as test. Hive Column The name of the column. You can set the value of this parameter to an asterisk (*) to indicate all columns. Select Group The user group to which you want to add this policy. Select User The user to whom you want to add this policy. Permissions The permissions that you want to grant. - Click Add.
When you add the policy, the authorization is completed. User foo can access the testdb.test table.Note After you add, remove, or modify a policy, it takes about one minute for the configuration to take effect.
- Example 2: Configure permissions on a URL
Note The web UI of Ranger varies based on different versions. In this example, Ranger 2.1.0 is used.After you configure access control policies for Hive in Ranger, permissions on the URL of an external data source are verified when you use an external table to access data in the external data source, such as Object Storage Service (OSS). You can perform the following steps to configure a policy for permissions on a URL: In the Policy Details section of the Create Policy dialog box, switch from database to url, specify the path that stores external data in the url field, and then grant read and write permissions to a user.If you do not want to configure URL authentication for external data, the default value all - url is used for the Policy Name parameter in the Policy Details section. Select public from the Select Group drop-down list in the Allow Conditions section. The configurations indicate that all users can pass the URL authentication.