Hologres provides the data masking feature to allow you to mask data by column. After you enable this feature, sensitive information is masked in query results. This helps protect your sensitive and private data. This topic describes how to enable the data masking feature, and query and delete data masking configurations.

Background information

In the era of big data, new technologies such as big data, cloud computing, and artificial intelligence (AI) are widely applied. These technologies provide powerful support for in-depth data mining and analysis and help enterprises explore the potential value of big data. However, sensitive and private data is difficult to protect in the face of these technologies.

Hologres provides the data masking feature to protect and prevent your sensitive and private data from leaking during sharing, mining, and analysis.

Limits

  • For security considerations, Hologres does not allow data to be imported from tables and columns that have masking rules configured to those that do not have masking rules configured. If you attempt to import data from a table or column that has masking rules configured to a table or column that does not have masking rules configured, the following error is returned:
    ERROR: The insert table has not set SECURITY LABEL
  • Data masking may affect query performance. The severity of the effect varies based on the masking method and data volume. The query performance may degrade by 10% to 20%, or even by several times in extreme circumstances.

Use the data masking feature

Hologres allows you to configure data masking for a specific column or a specific user. Before you can use this feature, you must log on to the database and enable data masking. Then, you can configure data masking for a specific column or a specific user.

  1. Enable data masking for a database.
    By default, data masking is disabled. To enable this feature for a database, you must log on to the database as the superuser and execute the following statements:
    • Syntax
      CREATE EXTENSION IF NOT EXISTS hg_anon; -- Create the hg_anon extension function. 
      ALTER DATABASE <db_name> SET hg_anon_enable = on; -- Enable data masking for the database. By default, the value of the hg_anon_enable parameter is off. 
    • Example
      CREATE EXTENSION IF NOT EXISTS hg_anon; -- Create the hg_anon extension function. 
      ALTER DATABASE test SET hg_anon_enable = on; -- Enable data masking for the test database. 
    • Parameters
      Parameter Description
      hg_anon The built-in extension function provided by Hologres. This function is required to enable data masking.
      <db_name> The name of the database for which you want to enable data masking. Replace <db_name> with the name of your database.
      hg_anon_enable Specifies whether to enable or disable data masking. Valid values:
      • on: enables data masking.
      • off: disables data masking.
      Default value: off.
  2. Configure data masking for a column.
    You can mask data in a specific column as the superuser or table owner.
    • Syntax
      Note To mask data in multiple columns, you must execute the statement multiple times.
      SECURITY LABEL FOR hg_anon ON COLUMN <tablename>.<col_name> IS <label_name>;
    • Example: Mask data in the ID column in the holotest table by using the name rule.
      SECURITY LABEL FOR hg_anon ON COLUMN holotest.id IS 'name';
    • Parameters
      Parameter Description
      hg_anon The built-in extension function provided by Hologres. This function is required to enable data masking.
      <tablename> The name of the table that contains the column whose data you want to mask. Replace <tablename> with the name of the table.
      <col_name> The name of the column whose data you want to mask. Replace <col_name> with the name of the column.
      label_name The masking rule to be used to mask data. You can execute the show hg_anon_labels; statement to query the label_name values available for the database.
      The following table describes the default label_name values as of Hologres V1.1.
      label_name Description Example
      name Masks the name.
      • Before masking: Li Hua. After masking: *Hua.
      • Before masking: Wang Xiaoqiang. After masking: *Xiaoqiang.
      email Masks the email address. Before masking: lihuang@alibaba.com. After masking: lih***@alibaba.com.
      ip Masks the IP address. Before masking: 1.2.3.4. After masking: 1.*.*.*.
      id Masks the ID card number. Before masking: 110345188812011234. After masking: 1****************4.
      phone Masks the mobile number. Before masking: 13900001234. After masking: *********34.
      bank_id Masks the bank or credit card number. Before masking: 2349867902834701928. After masking: ***************1928.
      hash Uses the MD5 algorithm for masking. Before masking: Wenyi West Road, Hangzhou, Zhejiang. After masking: c5c2d3d6b0d040f78d66c2e54c02213d.
      first_mask Displays only the first character. Before masking: 123456789. After masking: 1********.
    • Other related statements
      • Delete the data masking configuration for a column
        SECURITY LABEL FOR hg_anon ON COLUMN test_hg_anon_demo.name is NULL;
      • Query the columns for which data masking is configured
        SELECT c.relname, a.attname, provider, label FROM pg_seclabel s INNER JOIN pg_catalog.pg_class c on s.objoid = c.relfilenode INNER JOIN pg_catalog.pg_attribute a on s.objoid = a.attrelid where a.attnum = objsubid;

Advanced settings

  • Configure data masking for different users
    By default, the data masking feature is enabled for all users. You can disable data masking for specific users.
    • Syntax
       SECURITY LABEL FOR hg_anon ON ROLE user_name IS 'all:unmasked';
      Parameter Description
      user_name The ID of the account. You can obtain the account ID on the Basic Information page.
    • Example

      Execute the following statement to disable data masking for a user whose account ID is 1365xxxxxxxxxxxx:

      SECURITY LABEL FOR hg_anon ON ROLE "1365xxxxxxxxxxxx" IS 'all:unmasked';
    • Related statements
      • Query the data masking configuration for a user
        select usename, label from pg_shseclabel s inner join pg_catalog.pg_user u on s.objoid = u.usesysid;
      • Delete the data masking configuration for a user
        SECURITY LABEL FOR hg_anon ON ROLE "1365xxxxxxxxxxxx" IS NULL;
  • Create custom data masking rules

    If the default label_name values do not meet your requirements, you can modify the hg_anon_labels Global User Configuration (GUC) parameter to create custom data masking rules.

    • Syntax
      ALTER DATABASE <db_name> SET hg_anon_labels = '[
        {"label": <label_name1>, "method", <method1>},
        {"label": <label_name2>, "method", <method2>},
        ... 
      ]'; -- Set label_name to specify a name for the data masking rule and set the method parameter to configure the rule. 
      Note After you execute the ALTER DATABASE statement, the new settings apply to subsequent sessions, but not the current session. You must reconnect to the database to check the new settings. You can execute the following statement to check whether the new settings have taken effect:
      show hg_anon_enable;
    • Example
      ALTER DATABASE test_db SET hg_anon_labels = '[
      {"label":"ip", "method":{"desensType":"mask", "type":"ip"}},
      {"label":"email", "method":{"desensType":"mask", "type":"email"}},
      {"label":"name", "method":{"desensType":"mask", "type":"name"}},
      {"label":"first_mask", "method":{"desensType":"mask", "type":"user_define", "before":1, "after":0}},
      {"label":"hash", "method":{"desensType":"hash", "type":"md5", "salt":""}}]';
    • Parameters
      Data masking rule Description Sample result
      {"desensType":"mask", "type":"ip"} Masks the IP address. 192.*.*.*
      {"desensType":"mask", "type":"email"} Masks the email address. abc***@gmail.com
      {"desensType":"mask", "type":"name"} Masks the name. *Wu
      {"desensType":"hash", "type":"md5", "salt":""} Uses the MD5 algorithm for masking. e086aa137fa19f67d27b39d0eca186103228f322c9c98a125554a24f875f0f7e
      {"label":"first_mask", "method":{"desensType":"mask", "type":"user_define", "before":1, "after":0}}{"label":"last_mask", "method":{"desensType":"mask", "type":"user_define", "before":0, "after":1}} Performs custom data masking. N/A

Use examples

The following procedure provides an example on how to use the data masking feature.

  1. Create a database.
    Execute the following statement to create a database named hg_anon_demo:
    CREATE DATABASE hg_anon_demo;
  2. Create a data table.
    Execute the following statements to create a data table named personal_basic_information in the hg_anon_demo database:
    DROP TABLE IF EXISTS personal_basic_information;
    
    CREATE TABLE personal_basic_information
    (
        name TEXT
        ,email TEXT
        ,ip TEXT
        ,id TEXT
        ,phone TEXT
        ,bank_id TEXT
    )
    ;
  3. Insert data to the table.
    Execute the following statement to insert data to the personal_basic_information table:
    INSERT INTO personal_basic_information(name,email,ip,id,phone,bank_id) VALUES
    ('Zhang San','zhangsan@alibaba.com','127.0.0.1','142732199102290022','13900001234','4514610803067088'),
    ('Li Si','lisi@alibaba.com','127.0.0.1','510622198402308000','13900001111','6252470010027800'),
    ('Li Xiaoyao','lixiaoyao@alibaba.com','172.21.4.234','511025188812271696','13900002222','6252470010027800');
  4. View data before masking.
    Execute the following statement to view the data before masking:
    SELECT * FROM personal_basic_information;
  5. Configure data masking rules.
    Execute the following statements to configure data masking rules:
    -- Create the hg_anon extension function. 
    CREATE EXTENSION IF NOT EXISTS hg_anon;
    
    -- Enable data masking for the hg_anon_demo database. 
    ALTER DATABASE hg_anon_demo SET hg_anon_enable = on;
    
    --- Configure data masking rules for each column.
    SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.name IS 'name';
    SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.id IS 'id';
    SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.phone IS 'phone';
    SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.email IS 'email';
    SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.bank_id IS 'bank_id';
    SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.ip IS 'ip';
  6. View data after masking.
    Disconnect from the database, reconnect to the database, and then execute the following statement to view the data after masking:
    SELECT * FROM personal_basic_information;

FAQ

Data is not masked after the procedure described in the example is performed.
  • Problem description: Data is not masked after the procedure described in the example is performed.
  • Possible causes:
    • Data masking is disabled for specific users.
    • The label_name parameter is not configured.
  • Solutions:
    • Execute the following statement to check whether data masking is disabled for specific users:
      select
        usename
        , label
      from pg_shseclabel s
      inner join pg_catalog.pg_user u on s.objoid = u.usesysid;
      By default, the query result is empty, which indicates that data masking is enabled for all users. If the query result is not empty, data masking is disabled for specific users and you must enable data masking for these users. For more information, see the "Advanced settings" section of this topic.
    • Execute the following statement to check the label_name values:
      show hg_anon_labels;
      If the query result does not contain values such as ip, you must execute the following statement to configure the label_name values:
      ALTER DATABASE compress_test SET hg_anon_labels = '[
      {"label":"ip", "method":{"desensType":"mask", "type":"ip"}},
      {"label":"email", "method":{"desensType":"mask", "type":"email"}},
      {"label":"name", "method":{"desensType":"mask", "type":"name"}},
      {"label":"id", "method":{"desensType":"mask", "type":"id"}},
      {"label":"phone", "method":{"desensType":"mask", "type":"phone"}},
      {"label":"bank_id", "method":{"desensType":"mask", "type":"bank_id"}},
      {"label":"hash", "method":{"desensType":"hash", "type":"md5", "salt":""}},
      null
      ]';
      For more information about the rules corresponding to the label_name values, see the default label_name values table of this topic.