All Products
Search
Document Center

AnalyticDB for PostgreSQL:Dynamic data masking

Last Updated:Apr 09, 2024

AnalyticDB for PostgreSQL provides the dynamic data masking feature and various masking functions. This feature allows you to mask table columns and configure masking policies for specific users. Different masking functions can be used to show masked data to different users. This protects sensitive data and improves data security. This topic describes how to use the dynamic data masking feature.

Background information

Data masking is a data anonymization method that protects database privacy by preventing sensitive data from being revealed to unauthorized users. Dynamic data masking uses custom masking policies to efficiently limit exposure of sensitive data to non-privileged users without changing the original data.

To use the dynamic data masking feature, you must configure a masking function for each sensitive column. AnalyticDB for PostgreSQL provides common masking functions that are suitable for most data masking scenarios and allows you to configure custom masking functions. After you configure a masking function for a sensitive column, the query results of the column values are shown to other users as the outputs of the masking function. For example, the mobile phone number 13900001111 is stored in a database in plaintext. A privileged user such as the table owner or a database account that has the RDS_SUPERUSER permission can view the original value, but other users can view only the masked result. Example: 139****1111.

You can use the CREATE REDACTION POLICY statement to configure masking functions for specific sensitive columns of a table and specify an expression to determine the masking conditions. After you create a masking policy, you can use the ALTER REDACTION POLICY statement to modify the policy or use the DROP REDACTION POLICY statement to delete the policy. You can use the REDACTION_POLICIES and REDACTION_COLUMNS system views to query information about the masking policies and masked columns.

Prerequisites

An AnalyticDB for PostgreSQL V6.0 instance of V6.6 or later or an AnalyticDB for PostgreSQL V7.0 instance of V7.0.3 or later is created. For information about how to query and update the minor version of an AnalyticDB for PostgreSQL instance, see Query the minor engine version and Update the minor engine version.

Note

AnalyticDB for PostgreSQL instances in Serverless mode do not support dynamic data masking.

Limits

  • Only table owners and database accounts that have the RDS_SUPERUSER permission can create, modify, and delete masking policies.

  • Table owners and database accounts that have the RDS_SUPERUSER permission can always view the plaintext data regardless of masking policies.

  • You can use the COPY TO or INSERT INTO statement to export masked data. However, calculation operations on masked data are meaningless because data masking is irreversible.

  • You can create dynamic data masking policies for standard tables, but not for system tables, external tables, temporary tables, unlogged tables, views, or materialized views.

  • Each masking policy corresponds to a table. Each masking policy defines the masking functions that you want to use for all masked columns of a table. You can configure different masking functions and masking rules for different columns of a table. You can configure only one masking function for each column.

  • Calculation operations on masked data are meaningless. We recommend that you perform masking only on necessary sensitive columns.

Usage

CREATE REDACTION POLICY

You can use the CREATE REDACTION POLICY statement to create a masking policy for a table.

Syntax

CREATE REDACTION POLICY <policy_name> ON <table_name>
    [ WHEN (<when_expression>) ]
    [ ADD COLUMN <column_name> WITH <mask_function_name> ( [ argument [, ...] ] )] [, ... ];

Parameters

Parameter

Description

policy_name

The name of the masking policy.

table_name

The name of the table for which you want to create the masking policy.

WHEN (<when_expression>)

Defines an expression based on which the masking policy takes effect. The masking policy takes effect only when the expression is set to true.

The following limits apply when you use the WHERE clause to specify the range of effective users for masking policies:

  • The result of the expression must be of the Boolean type. An expression that is similar to 1+1 is invalid.

  • You can use AND or OR to combine multiple sub-expressions.

  • You can use IN or NOT IN sub-expressions.

  • If you want a masking policy to take effect on all users, we recommend that you use the 1=1 expression to create the masking policy.

  • If you do not specify the WHEN clause, the masking policy does not take effect. You must specify the WHEN clause.

column_name

The name of the column that you want to mask in the table.

mask_function_name

The name of the masking function.

The masking function determines how to handle the masked column. The return value of the masking function must be of the same type as the masked column.

argument

The arguments of the masking function.

Note

AnalyticDB for PostgreSQL provides various masking functions. You can also use SQL or PL/pgSQL to configure custom masking functions.

Examples

Create a table. Create a custom function that masks one column of the table. Configure a masking policy that takes effect only on bob.

-- Create a table. 
CREATE TABLE test_mask(id INT, name TEXT);

-- Create a custom function. 
CREATE OR REPLACE FUNCTION mask_text(input_text text) RETURNS text AS $$
BEGIN
  RETURN REPEAT('*', CHAR_LENGTH(input_text));
END;
$$ LANGUAGE plpgsql IMMUTABLE;

-- Configure a masking policy. 
CREATE REDACTION POLICY test_mask_policy ON test_mask
  WHEN (CURRENT_USER = 'bob')
  ADD COLUMN name WITH mask_text(name);

You can also use a built-in masking function of AnalyticDB for PostgreSQL to configure a masking policy. In this example, the WHEN expression is set to true, which specifies that the masking policy takes effect on all users. For more information about built-in masking functions, see the "Appendix: Masking functions" section of this topic.

CREATE REDACTION POLICY test_mask_policy ON test_mask
  WHEN (true)
  ADD COLUMN name WITH mask_full_str(name);

ALTER REDACTION POLICY

You can use the ALTER REDACTION POLICY statement to modify the masking policy of a table.

Syntax

  • Modify the expression based on which the masking policy takes effect.

    ALTER REDACTION POLICY <policy_name> ON <table_name> WHEN (<new_when_expression>);
  • Enable or disable the masking policy of a table.

    ALTER REDACTION POLICY <policy_name> ON <table_name> ENABLE | DISABLE;
  • Rename a masking policy.

    ALTER REDACTION POLICY <policy_name> ON <table_name> RENAME TO <new_policy_name>;
  • Add, modify, or remove a masked column.

    ALTER REDACTION POLICY policy_name ON table_name 
        Action;

    In the preceding statement, Action can be one of the following clauses:

    ADD COLUMN <column_name> WITH <function_name> ( arguments )
      | ALTER COLUMN <column_name> WITH <function_name> ( arguments )
      | DROP COLUMN <column_name>

Parameters

Parameter

Description

policy_name

The name of the masking policy that you want to modify.

table_name

The name of the table whose masking policy you want to modify.

new_when_expression

The new expression based on which the masking policy takes effect.

ENABLE | DISABLE

Specifies whether to enable the masking policy. Valid values:

  • ENABLE: enables the masking policy.

  • DISABLE: disables the masking policy.

new_policy_name

The name of the new masking policy.

column_name

The name of the column that you want to mask in the table.

  • If you want to add a masked column, specify a column that is not configured with a masking function.

  • If you want to modify or remove a masked column, specify an existing masked column.

function_name

The name of the masking function.

argument

The arguments of the masking function.

Examples

  • Modify the expression to allow a masking policy to take effect on specific roles.

    ALTER REDACTION POLICY mask_emp ON employees WHEN(current_user in ('alice', 'bob'));
  • Modify the expression to allow a masking policy to take effect on all users. In this case, only the table owner can view the plaintext data.

    ALTER REDACTION POLICY mask_emp ON employees WHEN(1=1);
  • Enable or disable a masking policy.

    ALTER REDACTION POLICY mask_emp ON employees DISABLE;
    ALTER REDACTION POLICY mask_emp ON employees ENABLE;
  • Rename a masking policy.

    ALTER REDACTION POLICY mask_emp ON employees RENAME TO new_mask_emp;
  • Add a masked column.

    ALTER REDACTION POLICY mask_emp_new ON emp ADD COLUMN name WITH mask_none(name);
  • Modify the masking function of a masked column.

    ALTER REDACTION POLICY mask_emp_new ON emp ALTER COLUMN name WITH mask_none(name);
  • Remove a masked column.

    ALTER REDACTION POLICY mask_emp_new ON emp DROP COLUMN name;

DROP REDACTION POLICY

You can use the DROP REDACTION POLICY statement to delete the masking policy of a table.

Syntax

DROP REDACTION POLICY [ IF EXISTS ] <policy_name> ON <table_name>;

Parameter

Parameter

Description

IF EXISTS

Specifies whether the masking policy that you want to delete exists. If the masking policy does not exist, NOTICE is returned instead of an error message.

policy_name

The name of the masking policy that you want to delete.

table_name

The name of the table whose masking policy you want to delete.

SQL statements suitable for masked users

SELECT

Dynamic data masking is transparent to users. After you configure a masking policy, the involved sensitive columns are automatically masked. You do not need to modify the business logic.

For example, you configure a masking policy for the phone column of the employees table. If the masking conditions for the phone column are met, the SELECT phone FROM employees; statement is automatically rewritten to a statement such as SELECT mask_func(phone) FROM employees;.

Calculation operations, such as equivalence condition and JOIN operations, on masked columns are performed on the masking results of the masked columns. We recommend that you do not perform calculation operations on masked columns. Sample SELECT statement:

CREATE TABLE t1(phone TEXT);
SELECT employees.phone FROM employees JOIN t1 ON t1.phone = employees.phone;

The preceding SELECT statement is rewritten to the following statement:

-- Equivalent query after data masking. 
SELECT mask_func(employees.phone) FROM employees JOIN t1 ON t1.phone = mask_func(employees.phone);

In this example, the result of the JOIN operation is not as expected. To ensure correct calculation results, you can specify the redaction.scope parameter. For more information, see the "Data masking scopes" section of this topic.

INSERT/UPDATE/DELETE

In most cases, a user does not have the INSERT, UPDATE, or DELETE permissions on a table if the table is masked to the user. This is because only masking results are obtained even if plaintext data is written to the table. We recommend that you grant masked users only the SELECT permission.

  • INSERT

    If you use the INSERT statement to insert plaintext data to a table, you can query only masked data from the table.

    -- For example, the phone column of the employees table is masked to the current user. 
    INSERT INTO employees (id, name, ssn, phone, birthday, salary, email) 
      VALUES (1, 'Sally Sample', '020-78-9345', '13900001111', '1961-02-02', 51234.34,
              'sally.sample@alibaba.com');
    
    -- Query the table after you perform an INSERT operation. 
    postgres=> SELECT * FROM employees;
     id |     name     |     ssn     |    phone    |  birthday  |   salary   |          email
    ----+--------------+-------------+-------------+------------+------------+--------------------------
      1 | Sally Sample | 020-78-9345 | 139****1111 | 1961-02-02 | $51,234.34 | sally.sample@alibaba.com
    (1 row)

    In this example, the masked user obtains the masking results of the phone column even if plaintext data is written to the table.

  • UPDATE

    In most cases, the UPDATE statement involves calculating the WHERE condition. However, the calculation results that involve masked columns may be incorrect. To ensure correct calculation results, you can specify the redaction.scope parameter. For more information, see the "Data masking scopes" section of this topic.

    -- For example, the phone column of the employees table is masked to the current user. 
    postgres=> UPDATE employees SET id = 2 WHERE phone = '13900001111';
    UPDATE 0
    
    -- Specify a data masking scope. 
    postgres=> SET redaction.scope = 'query_except_equal';
    SET
    
    -- Perform an UPDATE operation. 
    postgres=> UPDATE employees SET id = 2 WHERE phone = '13900001111';
    UPDATE 1
  • DELETE

    In most cases, the DELETE statement involves calculating the WHERE condition. If you want to execute a DELETE statement that involves a masked column, you can specify the redaction.scope parameter. For more information, see the "Data masking scopes" section of this topic.

    -- For example, the phone column of the employees table is masked to the current user. 
    postgres=> DELETE FROM employees WHERE phone = '13900001111';
    DELETE 0
    
    postgres=> SET redaction.scope = 'query_except_equal';
    SET
    
    postgres=> DELETE FROM employees WHERE phone = '13900001111';
    DELETE 1

View masking policies

You can use the REDACTION_POLICIES and REDACTION_COLUMNS system views to query information about the masking policies and masked columns in the current database.

  • The REDACTION_POLICIES view contains the information about all masking policies in the current database.

    The following table describes the fields in the REDACTION_POLICIES view.

    Field

    Type

    Description

    schema_name

    name

    The name of the schema of the masked table.

    table_name

    name

    The name of the masked table.

    policy_name

    name

    The name of the masking policy.

    enable

    boolean

    The status of the masking policy.

    expression

    text

    The expression based on which the masking policy takes effect.

  • The REDACTION_COLUMNS view contains information about all masked columns in the current database.

    The following table describes the fields in the REDACTION_COLUMNS view.

    Field

    Type

    Description

    table_name

    name

    The name of the masked table.

    column_name

    name

    The name of the masked column.

    column_type

    name

    The type of the masked column.

    function_info

    text

    The information about the masking function.

Data masking scopes

AnalyticDB for PostgreSQL supports the query and query_except_equal data masking scopes. You can use the redaction.scope parameter to select a data masking scope. By default, the query data masking scope is used. AnalyticDB for PostgreSQL masks all masked columns that are involved in a query statement.

For example, in the SELECT name, email, phone FROM employees WHERE email = 'sally.***@alibaba.com'; statement, the email and phone columns are masked columns. Dynamic data masking rewrites the preceding statement to a query statement that is equal to SELECT name, masked(email), masked(phone) FROM employees WHERE masked(email) = 'sally.***@alibaba.com';. This is similar to performing a rewrite on the email column by using the UPDATE employees SET email = masked(email); statement. Masked values are not suitable for calculation operations. To meet the calculation requirements of specific users, you can execute the following statement to set the data masking scope to query_except_equal. In this case, columns that are involved in equivalence conditions are not masked.

SET redaction.scope = 'query_except_equal';
Note

The SET redaction.scope = 'query_except_equal'; statement configures the data masking scope at the session level. To configure the data masking scope at the database level, execute the ALTER DATABASE xxx SET redaction.scope = 'query_except_equal'; statement.

After you set the data masking scope to query_except_equal, the email column in the WHERE clause is not masked. Dynamic data masking rewrites the original query statement to a query statement that is equal to SELECT name, masked(email), masked(phone) FROM employees WHERE email = 'sally.sample@alibaba.com';.

Examples

In the following examples, a table named employees is created for a company. alice is the owner of the table. bob and eve are users of the table. The table contains privacy data such as the employee name, mobile phone number, email, social security number (SSN), and salary.

  1. After you use the administrator account to connect to the database, create three roles named alice, bob, and eve.

    CREATE ROLE alice PASSWORD 'password';
    CREATE ROLE bob PASSWORD 'password';
    CREATE ROLE eve PASSWORD 'password';
  2. Grant the schema permission of the database to alice, bob, and eve.

    GRANT ALL ON SCHEMA PUBLIC TO alice,bob,eve;
  3. Switch to alice. Create a table named employees and insert data into the table.

    SET role alice;
    
    CREATE TABLE employees (id SERIAL PRIMARY KEY, name varchar(40) NOT NULL, ssn varchar(11) NOT NULL,
                            phone varchar(11), birthday date, salary money, email varchar(100));
    
    INSERT INTO employees (name, ssn, phone, birthday, salary, email)
    VALUES
    ( 'Sally Sample', '020-78-9345', '13900001111', '1961-02-02', 51234.34,
    'sally.***@alibaba.com'),
    ( 'Jane Doe', '123-33-9345', '13900002222', '1963-02-14', 62500.00,
    'jane.***@gmail.com'),
    ( 'Bill Foo', '123-89-9345', '13900003333','1963-02-14', 45350.00,
    'william.***@163.com');
  4. Grant the read permission on the employees table to bob and eve.

    GRANT SELECT ON employees to bob,eve;
  5. Create a masking policy named mask_emp to allow alice to query all employee information and prohibit bob and eve from querying the phone, salary, and email columns. Configure masking functions for the following columns:

    • phone: of the character type. Use the built-in masking function mask_partial to mask four digits of the phone number in the middle with asterisks (*).

    • salary: of the numeric type. Use the built-in masking function mask_full_num to mask the salary as 0.

    • email: of the character type. Use the built-in masking function mask_email to mask the content before @.

      CREATE REDACTION POLICY mask_emp ON employees WHEN (current_user IN ('bob', 'eve'))
        ADD COLUMN phone WITH mask_partial(phone, '****', 3, 4),
        ADD COLUMN salary WITH mask_full_num(salary),
        ADD COLUMN email WITH mask_email(email);
  6. Switch to bob and eve and query the employees table.

    SET ROLE bob;
    SELECT * FROM employees;
     id |     name     |     ssn     |    phone    |  birthday  | salary |          email
    ----+--------------+-------------+-------------+------------+--------+--------------------------
      2 | Jane Doe     | 123-33-9345 | 139****1111 | 1963-02-14 |  $0.00 | xxxxxxxx@gmail.com
      3 | Bill Foo     | 123-89-9345 | 139****2222 | 1963-02-14 |  $0.00 | xxxxxxxxxxx@163.com
      1 | Sally Sample | 020-78-9345 | 139****3333 | 1961-02-02 |  $0.00 | xxxxxxxxxxxx@alibaba.com
    (3 rows)
    
    SET ROLE eve;
    SELECT * FROM employees;
     id |     name     |     ssn     |    phone    |  birthday  | salary |          email
    ----+--------------+-------------+-------------+------------+--------+--------------------------
      1 | Sally Sample | 020-78-9345 | 139****1111 | 1961-02-02 |  $0.00 | xxxxxxxxxxxx@alibaba.com
      2 | Jane Doe     | 123-33-9345 | 139****2222 | 1963-02-14 |  $0.00 | xxxxxxxx@gmail.com
      3 | Bill Foo     | 123-89-9345 | 139****3333 | 1963-02-14 |  $0.00 | xxxxxxxxxxx@163.com
    (3 rows)
  7. To allow bob to query all employee information, modify the mask_emp masking policy.

    SET ROLE alice;
    ALTER REDACTION POLICY mask_emp ON employees WHEN(current_user = 'bob');
  8. Switch to bob and eve and query the employees table.

    SET ROLE bob;
    SELECT * FROM employees;
     id |     name     |     ssn     |    phone    |  birthday  |   salary   |          email
    ----+--------------+-------------+-------------+------------+------------+--------------------------
      2 | Jane Doe     | 123-33-9345 | 13900001111 | 1963-02-14 | $62,500.00 | jane.***@gmail.com
      3 | Bill Foo     | 123-89-9345 | 13900002222 | 1963-02-14 | $45,350.00 | william.***@163.com
      1 | Sally Sample | 020-78-9345 | 13900003333 | 1961-02-02 | $51,234.34 | sally.***@alibaba.com
    (3 rows)
    
    SET ROLE eve;
    SELECT * FROM employees;
     id |     name     |     ssn     |    phone    |  birthday  | salary |          email
    ----+--------------+-------------+-------------+------------+--------+--------------------------
      1 | Sally Sample | 020-78-9345 | 139****1111 | 1961-02-02 |  $0.00 | xxxxxxxxxxxx@alibaba.com
      2 | Jane Doe     | 123-33-9345 | 139****2222 | 1963-02-14 |  $0.00 | xxxxxxxx@gmail.com
      3 | Bill Foo     | 123-89-9345 | 139****3333 | 1963-02-14 |  $0.00 | xxxxxxxxxxx@163.com
    (3 rows)
  9. Use the REDACTION_POLICIES and REDACTION_COLUMNS system views to query information about the mask_emp masking policy.

    SELECT * FROM redaction_policies;
     schema_name | table_name | policy_name | enable |            expression
    -------------+------------+-------------+--------+----------------------------------
     public      | employees  | mask_emp    | t      | ("current_user"() = 'eve'::name)
    (1 row)
    
    SELECT * FROM redaction_columns;
     table_name | column_name | column_type |              function_info
    ------------+-------------+-------------+-----------------------------------------
     employees  | phone       | varchar     | mask_partial(phone, '****'::text, 3, 4)
     employees  | salary      | money       | mask_full_num(salary)
     employees  | email       | varchar     | mask_email(email)
    (4 rows)
  10. To mask the character-type ssn column as ###-##-####, you can use the mask_regexp masking function or configure a custom masking function. In this example, SQL is used to configure a custom masking function named mask_ssn. When you add the masked column, specify the masking function name and arguments.

    SET ROLE alice;
    
    CREATE FUNCTION mask_ssn() RETURNS varchar AS
    $$
      SELECT '###-##-####'::varchar;
    $$
    LANGUAGE SQL;
    
    ALTER REDACTION POLICY mask_emp ON employees ADD COLUMN ssn WITH mask_ssn();
    
    SET ROLE eve;
    SELECT * FROM employees;
     id |     name     |     ssn     |    phone    |  birthday  | salary |          email
    ----+--------------+-------------+-------------+------------+--------+--------------------------
      1 | Sally Sample | ###-##-#### | 139****1111 | 2001-01-01 |  $0.00 | xxxxxxxxxxxx@alibaba.com
      2 | Jane Doe     | ###-##-#### | 139****2222 | 2001-01-01 |  $0.00 | xxxxxxxx@gmail.com
      3 | Bill Foo     | ###-##-#### | 139****3333 | 2001-01-01 |  $0.00 | xxxxxxxxxxx@163.com
    (3 rows)
  11. If data masking is not required for the employees table, delete the mask_emp masking policy.

    SET ROLE alice;
    DROP REDACTION POLICY mask_emp ON employees;

Appendix: Masking functions

Function

Description

Parameter

Return value type

mask_none(column anyelement)

Performs no data masking. This function is used only for internal tests.

column: of any data type.

Same as the data type of the column argument

mask_tonull(column anyelement)

Masks all data to NULL.

column: of any data type.

NULL

mask_full_num(column anyelement)

Masks all numeric values as 0.

column: of the numeric type such as INT, BIGINT, FLOAT, or NUMERIC.

Same as the data type of the column argument

mask_full_str(column anyelement)

Masks all characters to the [redact] string.

column: of the fixed-length or variable-length character type such as TEXT, VARCHAR, or CHAR.

Same as the data type of the column argument

mask_full_time(column anyelement)

Masks all time data as 00:00:00.

column: of the TIME or TIME WITH TIME ZONE type.

Same as the data type of the column argument

mask_full_timestamp(column anyelement)

Masks all timestamp and date data as 2001.01.01 00:00.

column: of the DATE, TIMESTAMP, or TIMESTAMP WITH TIME ZONE type.

Same as the data type of the column argument

mask_full_bytea()

Masks all BYTEA data to a fixed BYTEA value.

None

BYTEA

mask_email(column anyelement, letter char default 'x')

Masks email addresses.

  • column: of the fixed-length or variable-length character type such as TEXT, VARCHAR, or CHAR.

  • letter: the character that you want to use to replace the masked data. Default value: 'x'.

Same as the data type of the column argument

mask_shuffle(column anyelement)

Returns data in a disordered manner. The return value has the same length as the original data.

column: of the fixed-length or variable-length character type such as TEXT, VARCHAR, or CHAR.

Same as the data type of the column argument

mask_random(column anyelement)

Returns a random character string that has the same length as the original data.

column: of the fixed-length or variable-length character type such as TEXT, VARCHAR, or CHAR.

Same as the data type of the column argument

mask_regexp(column anyelement, reg text, replace_text text, pos INTEGER default 0, reg_len INTEGER default -1)

Replaces data by using a regular expression.

  • column: of the fixed-length or variable-length character type such as TEXT, VARCHAR, or CHAR.

  • reg: the regular expression, which is of the TEXT type.

  • replace_text: the content that you want to use to replace the matched data, which is of the TEXT type.

  • pos: the position where the match starts, which is of the INT type. The default value is 0, which specifies that the match starts from the beginning of the original data.

  • reg_len: the length of the matched content, which is of the INT type. The default value is -1, which specifies that the match ends at the end of the original data.

Same as the data type of the column argument

mask_partial(column anyelement, padding text, prefix INTEGER, suffix INTEGER)

Partially masks data.

  • column: of the fixed-length or variable-length character type such as TEXT, VARCHAR, or CHAR.

  • padding: the content that you want to use to replace the masked data, which is of the TEXT type.

  • prefix: the number of the first characters that are not masked, which is of the INT type.

  • suffix: the number of the last characters that are not masked, which is of the INT type.

Same as the data type of the column argument

  • Example 1: Use the mask_regexp function to mask the numeric value in a string with number signs (#).

    SELECT mask_regexp('$27412.45earned'::text, '[\d+]', '#');
         mask_regexp
    ----------------------
     $#####.##earned
    (1 row)
  • Example 2: Use the mask_partial function to mask the number in the middle of a string with number signs (#).

    SELECT mask_partial('$27412.45earned'::text, '###.##', 5, 1);
        mask_partial
    --------------------
     $###.##earned
    (1 row)