Hive provides two authorization mechanisms: storage-based authorization and SQL standards-based authorization.

Background information

If the users of an E-MapReduce (EMR) cluster access Hive data by using HDFS or Hive Client, you must perform access control on the Hive data in HDFS. This way, you can manage Hive SQL-related operation permissions. For more information, see the Hive documentation.

Note

The two authorization mechanisms are compatible with each other. You can configure them at the same time.

The storage-based authorization mechanism applies to Hive metastore.

Go to the Configure tab for Hive

  1. Log on to the Alibaba Cloud EMR console.
  2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
  3. Click the Cluster Management tab.
  4. On the Cluster Management page, find your cluster and click Details in the Actions column.
  5. In the left-side navigation pane, choose Cluster Service > Hive.
  6. Click the Configure tab.

Add authorization configuration parameters

  1. In the Service Configuration section, click the hive-site tab.
  2. Click Custom Configuration in the upper-right corner and add the configuration parameters described in the following table.
    Key Value
    hive.metastore.pre.event.listeners org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
    hive.security.metastore.authorization.manager org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider
    hive.security.metastore.authenticator.manager org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator

Restart Hive metastore

  1. In the left-side navigation pane, choose Cluster Service > Hive. On the page that appears, choose Actions > Restart Hive MetaStore in the upper-right corner.
  2. In the Cluster Activities dialog box, configure relevant parameters and click OK. In the Confirm message, click OK.
    Click History in the upper-right corner to view the task progress.

HDFS access control

For an EMR cluster deployed in Kerberos security mode, HDFS permissions on a warehouse folder are configured for Hive.

For other clusters, perform the following steps to configure basic HDFS permissions for Hive:

  • Enable HDFS access control.
  • Configure permissions on a warehouse folder for Hive.
    hadoop fs -chmod 1771 /user/hive/warehouse
    You can also run the following command. 1 in the first place of 1777 denotes a sticky bit, which indicates that the files or folders created by other users cannot be deleted.
    hadoop fs -chmod 1777 /user/hive/warehouse

After you configure the basic permissions, you can grant permissions on the warehouse folder to related users or user groups. For example, you can grant the permissions to create tables, read data from tables, and write data into tables in the warehouse folder.

sudo su has
      # Grant user test the rwx permissions on the warehouse folder.
      hadoop fs -setfacl -m user:test:rwx /user/hive/warehouse
      # Grant group hivegrp the rwx permissions on the warehouse folder.
      hadoo fs -setfacl -m group:hivegrp:rwx /user/hive/warehouse

Users can access only the data in Hive tables created by themselves in HDFS.

Verification

  • Create a table named testtbl as user test.
    hive> create table testtbl(a string);
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=test, access=WRITE, inode="/user/hive/warehouse/testtbl":hadoop:hadoop:drwxrwx--t
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:320)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)

    The preceding error occurs, which indicates that user test has no permissions.

    # Switch from user root to user has.
    su has
    # Configure an ACL to grant user test the rwx permissions on the warehouse folder.
     hadoop fs -setfacl -m user:test:rwx /user/hive/warehouse

    Create a database as user test.

    hive> create table testtbl(a string);
    OK
    Time taken: 1.371 seconds
    #View the directory where the testtbl table is stored in HDFS. The data in a table created by user test can be read only by user test and the users in group hadoop.
    hadoop fs -ls /user/hive/warehouse
    drwxr-x---   - test hadoop          0 2017-11-25 14:51 /user/hive/warehouse/testtbl
    # Insert a piece of data.
    hive>insert into table testtbl select "hz"
  • Access table testtbl as user foo.
    #drop table
    hive> drop table testtbl;
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Permission denied: user=foo, access=READ, inode="/user/hive/warehouse/testtbl":test:hadoop:drwxr-x---
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:320)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
    #alter table
    hive> alter table testtbl add columns(b string);
    FAILED: SemanticException Unable to fetch table testtbl. java.security.AccessControlException: Permission denied: user=foo, access=READ, inode="/user/hive/warehouse/testtbl":test:hadoop:drwxr-x---
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:320)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720)
    #select
    hive> select * from testtbl;
    FAILED: SemanticException Unable to fetch table testtbl. java.security.AccessControlException: Permission denied: user=foo, access=READ, inode="/user/hive/warehouse/testtbl":test:hadoop:drwxr-x---
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:320)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219)

    The preceding information shows that user foo cannot perform any operations on the table created by user test. If you want to grant operation permissions to user foo, HDFS authorization is required.

    su has
    # Grant only the Read permission. You can also grant a write permission such as Alter based on your requirements.
    # Note: -R is used to allow the user to read the files in the testtbl folder.
    hadoop fs -setfacl -R -m user:foo:r-x /user/hive/warehouse/testtbl
    # The select operation is successful.
    hive> select * from testtbl;
    OK
    hz
    Time taken: 2.134 seconds, Fetched: 1 row(s)
    Note In most cases, you can create a Hive user group based on your requirements. Then, you can grant permissions to this group and add users to this group. Users in this group have the same permissions on data access.

SQL standards-based authorization

  • Scenario

    If the users of an EMR cluster cannot use the HDFS or Hive Client to access Hive data, they can only use HiveServer2 (Beeline client or JDBC code) to run Hive scripts. In this case, the SQL standards-based authorization mechanism is required.

    For more information, see the Hive documentation.

  • Add authorization configuration parameters

    The parameter settings are provided for HiveServer2.

    1. In the Service Configuration section of the Configure tab, click hive-site.
    2. Click Custom Configuration in the upper-right corner and add the configuration parameters described in the following table.
      Key Value
      hive.security.authorization.enabled true
      hive.users.in.admin.role hive
      hive.security.authorization.createtable.owner.grants ALL
  • Restart HiveServer2
    1. In the left-side navigation pane, choose Cluster Service > Hive. On the page that appears, choose Actions > Restart HiveServer2 in the upper-right corner.
    2. In the Cluster Activities dialog box, configure relevant parameters and click OK. In the Confirm message, click OK.

      Click History in the upper-right corner to view the task progress.

  • Access control commands

    For more information, see the Hive documentation.

  • Verification
    • Access the testtbl table created by user test over the Beeline client as user foo.
      2: jdbc:hive2://emr-header-1.cluster-xxx:10> select * from testtbl;
      Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: Principal [name=foo, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.testtbl]] (state=42000,code=40000)
    • Grant permissions.
      Switch to user test and run the grant command to grant user foo the select operation permission.
      hive> grant select on table testtbl to user foo;
       OK
       Time taken: 1.205 seconds
    • User foo can perform the select operation.
      0: jdbc:hive2://emr-header-1.cluster-xxxxx:10> select * from testtbl;
      INFO  : OK
      +------------+--+
      | testtbl.a  |
      +------------+--+
      | hz         |
      +------------+--+
      1 row selected (0.787 seconds)
    • Revoke permissions.
      Switch back to user test and revoke the select operation permission granted to user foo.
        hive> revoke select from user foo;
          OK
          Time taken: 1.094 seconds
    • User foo cannot perform the select operation on the data in the testtbl table.
      0: jdbc:hive2://emr-header-1.cluster-xxxxx:10> select * from testtbl;
      Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: Principal [name=foo, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.testtbl]] (state=42000,code=40000)