Ranger supports row-level filtering on Hive data. You can filter the return results of SELECT statements by row to display only the rows that meet the specified conditions. This topic describes how to filter Hive data by row.

Prerequisites

  • An E-MapReduce (EMR) cluster is created, and Ranger is selected from the optional services when you create the cluster. For more information, see Create a cluster.
  • A table whose data can be filtered by row is created.

Procedure

Note The web UI of Ranger varies based on the Ranger version. In this example, Ranger 2.1.0 is used.
  1. Integrate Hive with Ranger and configure related permissions. For more information, see Integrate Hive with Ranger.
  2. On the web UI of Ranger, click emr-hive.
    Ranger-2
  3. Create a row-level filtering policy.
    1. Click the Row Level Filter tab.
    2. Click Add New Policy in the upper-right corner.
    3. On the Create Policy page, configure the parameters. The following table describes the parameters.
      Parameter Description Example
      Policy Name The name of a row-level filtering policy. You can customize a policy name. test-row-filter
      Hive Database The name of a Hive database. default
      Hive Table The name of a Hive table. test_row_filter
      Select User The user to whom you want to attach the row-level filtering policy. testc
      Access Types The permissions that you want to grant. select
      Row Level Filter The function that is used to filter data. id>=10
    4. Click Add.
  4. Optional:Test row-level filtering.
    For example, if the testc user executes the select * from default.test_row_filter; statement to query data in the default.test_row_filter table, only the rows whose ID is greater than or equal to 10 are displayed. hive-row