Ranger supports Hive data masking. You can configure a data masking policy to mask the return values of SELECT statements to hide sensitive information from users.

Background information

This feature applies only to HiveServer2 scenarios. For example, you can mask the return values of SELECT statements that are executed by using Beeline, JDBC, or Hue.

Configure a data masking policy

You can configure a data masking policy on the emr-hive tab of the Ranger web UI. Pay attention to the following points:
  • Multiple data masking methods are supported. For example, you can choose to show only the first or last four characters or use a hashing algorithm to process data.
  • Wildcards are not supported. For example, you are not allowed to use asterisks (*) when you configure a table or column in a data masking policy.
  • Each data masking policy applies to only one column. If you want to mask data in multiple columns, configure multiple data masking policies.
  1. Integrate Hive with Ranger and configure related permissions. For more information, see Hive.
  2. On the Ranger web UI, click emr-hive.
    Permission configuration example
  3. Click the Masking tab.
    Configure a data masking policy
  4. Click Add New Policy in the upper-right corner.
  5. Configure the parameters that are described in the following table.
    edit_ploicy
    Parameter Description Example
    Policy Name The name of a data masking policy. You can customize a policy name. test_mask
    database The name of a Hive database. testdb1
    table The name of a table. testtb1
    Hive Column The name of a column. a
    Access Types The permissions to be granted. SELECT
    Select Masking Option A data masking method. show first 4
  6. Click Add.

Mask test data

  • Scenario

    User test executes a SELECT statement to query data in column a of the testdb1.testtbl table. Only the first four characters of values in column a are shown.

  • Procedure
    1. Configure a data masking policy.

      For more information about the configuration steps, see Configure a data masking policy.

    2. Verify the data masking effect.
      User test uses Beeline to connect to HiveServer2 and executes the select a from testdb1.testtbl; statement. Verify the data masking effect

      As shown in the preceding figure, only the first four characters of values in column a are shown. The other characters are replaced with x.