Ranger supports row-level filtering on Hive data. You can filter the return results of SELECT statements by row to display only the rows that meet the specified conditions. This topic describes how to filter Hive data by row. E-MapReduce (EMR) V4.9.0 is used in this example. Ranger 2.1.0 is used in EMR V4.9.0.

Prerequisites

  • An EMR cluster is created, and Ranger is selected from the optional services when you create the cluster. For more information, see Create a cluster.
  • A table whose data can be filtered by row is created.

Limits

This feature applies only to HiveServer2. For example, you can filter the return results of SELECT statements that are executed by using Beeline, Java Database Connectivity (JDBC), or Hue.

Configure a row-level filtering policy.

  1. Integrate Hive with Ranger and configure related permissions. For more information, see Hive.
  2. On the Ranger web UI, click emr-hive.
    emr-hive
  3. Click the Row Level Filter tab.
    Row Level Filter
  4. Click Add New Policy in the upper-right corner.
  5. In the Create Policy dialog box, configure the parameters.
    Create Policy
    Parameter Description Example
    Policy Name The name of a row-level data filtering policy. You can customize a policy name. test-row-filter
    Hive Database The name of a Hive database. default
    Hive Table The name of a table. test_row_filter
    Select User The user for which the row-level filtering policy is configured. testc
    Access Types The permissions to be granted. select
    Row Level Filter The function that is used to filter data. id>=10
  6. Click Add.

Filter test data

  • Scenario: The testc user executes a SELECT statement to query data in the default.test_row_filter table. Only the rows that meet the specified filter conditions are displayed.
  • Procedure
    1. Users that have the permissions to access the table but are not configured with a row-level filtering policy, such as the hadoop user, use Beeline to connect to HiveServer2 and execute the select * from default.test_row_filter; statement to query data in the default.test_row_filter table. All rows in the table can be displayed. table_all
    2. The testc user uses Beeline to connect to HiveServer2 and executes the select * from default.test_row_filter; statement to query data in the default.test_row_filter table. Only the rows that meet the condition of id ≥ 10 can be displayed. table_id