Hologres provides a data de-identification feature to allow you to de-identify data in columns. After you enable this feature, sensitive information is de-identified in query results. This helps you protect sensitive and private data. This topic shows you how to enable the data de-identification feature, check the data de-identification configurations, and delete the data de-identification rule for a column.

Background information

In the era of big data, new technologies such as big data, cloud computing, and artificial intelligence (AI) are widely applied. This provides powerful supports for in-depth data mining and analysis and helps enterprises gradually explore the potential value of big data. However, sensitive and private data is difficult to protect in the face of these technologies.

When you use these technologies to implement highly efficient data sharing, mining, and analysis, the data de-identification feature provided by Hologres can prevent leaks of your sensitive and private data.

Use the data de-identification feature

Hologres allows you to configure data de-identification for a specified column or a specified user. Before you can configure data de-identification for a database, you must log on to the database and enable the data de-identification feature. Then, you can configure data de-identification for a specified column or a specified user. Perform the following steps:

  1. Enable the data de-identification feature for the database for which you want to configure data de-identification.
    By default, the data de-identification feature is disabled. To enable this feature for a database, you must log on to the database as a superuser and execute the following statements:
    • Syntax:
      CREATE EXTENSION IF NOT EXISTS hg_anon; -- Create the hg_anon extension function. 
      ALTER DATABASE <db_name> SET hg_anon_enable = on; -- Enable the data de-identification feature for the specified database. By default, the value of the hg_anon_enable parameter is off. 
    • Examples:
      CREATE EXTENSION IF NOT EXISTS hg_anon; -- Create the hg_anon extension function. 
      ALTER DATABASE test SET hg_anon_enable = on; -- Enable the data de-identification feature for the test database. 
    • The following table describes the parameters that are used to enable the data de-identification feature.
      Parameter Description
      hg_anon The built-in extension function provided by Hologres. To enable the data de-identification feature, you must call this function.
      <db_name> The name of the database for which you want to enable the data de-identification feature. Replace <db_name> with the name of the database.
      hg_anon_enable Specifies whether to enable or disable the data de-identification feature. Valid values:
      • on: enables the data de-identification feature.
      • off: disables the data de-identification feature.
      Default value: off.
  2. Create a data de-identification rule.
    • Syntax:
      ALTER DATABASE <db_name> SET hg_anon_labels = '[
        {"label": <label_name1>, "method", <method1>},
        {"label": <label_name2>, "method", <method2>},
        ... 
      ]'; -- Set the label parameter to the name that you define for a built-in data-identification rule provided by Hologres and the method parameter to the settings of the rule. 
      Note After you execute the ALTER DATABASE statement, the new settings apply to subsequent sessions, but not the current session. You must reconnect to the database to check the new settings. You can run the following command to check whether the new settings take effect:
      show hg_anon_enable;
    • Examples:
      ALTER DATABASE test_db SET hg_anon_labels = '[
      {"label":"ip", "method":{"desensType":"mask", "type":"ip"}},
      {"label":"email", "method":{"desensType":"mask", "type":"email"}},
      {"label":"name", "method":{"desensType":"mask", "type":"name"}},
      {"label":"first_mask", "method":{"desensType":"mask", "type":"user_define", "before":1, "after":0}},
      {"label":"hash", "method":{"desensType":"hash", "type":"md5", "salt":""}}]';
    • The following table describes the data de-identification rules.
      Data de-identification rule Description Sample result
      {"desensType":"mask", "type":"ip"} Masks an IP address. 192.*.*.*
      {"desensType":"mask", "type":"email"} Masks an email address. abc***@gmail.com
      {"desensType":"mask", "type":"name"} Masks a username. *xxx
      {"desensType":"hash", "type":"md5", "salt":""} Uses a hash function to calculate the hash values of sensitive data and replace the raw data with the hash values. e086aa137fa19f67d27b39d0eca186103228f322c9c98a125554a24f875f0f7e
      {"label":"first_mask", "method":{"desensType":"mask", "type":"user_define", "before":1, "after":0}}{"label":"last_mask", "method":{"desensType":"mask", "type":"user_define", "before":0, "after":1}} Customizes data de-identification. None
  3. De-identify data in a specified column.
    You can de-identify data in a specified column as a superuser or table owner.
    • Syntax:
      Note To de-identify data in multiple columns, execute the following statement multiple times.
      SECURITY LABEL FOR hg_anon ON COLUMN <tablename>.<col_name> IS label_name;
    • Example: De-identify data in the id column in the holotest table by using the name rule.
      SECURITY LABEL FOR hg_anon ON COLUMN holotest.id IS 'name';
    • The following table describes the parameters that are used to de-identify data in the specified column.
      Parameter Description
      hg_anon The built-in extension function provided by Hologres. To enable the data de-identification feature, you must call this function.
      <tablename> The name of the table that contains the column whose data is to be de-identified. Replace <tablename> with the name of the table.
      <col_name> The name of the column whose data is to be de-identified. Replace <col_name> with the name of the column.
      label_name The built-in de-identification rule to be used to de-identify data. For more information, see the description of data de-identification rules.

      For example, the replace rule replaces a specific string with xxxx.

    • Sample result:Sample result
  4. Optional:Configure data de-identification for a specified user.
    You can log on as a superuser and configure the permissions of a user to determine whether to allow the user to view the data before or after the data de-identification process.
    • Syntax:
      SECURITY LABEL FOR hg_anon ON ROLE user_name IS ANON_RULE;
    • Examples:
      • Disable the data de-identification feature for all columns for the Alibaba Cloud account specified by Account ID.
        SECURITY LABEL FOR hg_anon ON ROLE 'Account ID' IS 'all:unmasked';
      • De-identify all columns for the specified RAM user.
        SECURITY LABEL FOR hg_anon ON ROLE "RAM$Alibaba Cloud account:RAM user" IS 'all:masked';
      • Disable the data de-identification feature for the columns that use the name rule for the specified RAM user.
        Note You can view your username by running the select current_user command.
        SECURITY LABEL FOR hg_anon ON ROLE "RAM$Alibaba Cloud account:RAM user" IS 'name:unmasked';
    • The following table describes the parameters that are used to configure data de-identification for a specified user.
      Note If you do not specify the unmasked keyword for the data de-identification rule, such as replace, that you specify for a user, the masked keyword is used. In this case, data in the columns that use the data de-identification rule is displayed in a specific format for the user.
      Parameter Description
      label_name The name of the custom data de-identification rule.
      all:unmasked Disables the data de-identification feature for the user.
      all:masked De-identifies all columns for the user.

View the data de-identification configurations

  • Execute the following statement to view the data de-identification configurations for all columns in the specified table:
    SELECT * FROM pg_seclabel;
    The following figure shows the query result. View the data de-identification configurationsThe following table describes the parameters in the query result.
    Parameter Description
    objoid The table in which data is de-identified.
    objsubid The sequence number of the column that is de-identified in the table.
    label The data de-identification rule that is used to de-identify the column.
  • Execute the following statement to view the data de-identification configurations for the current user:
    SELECT * FROM pg_shseclabel;

Delete the data de-identification rule for a column

Execute the following statement to delete the data de-identification rule that is configured for the specified column in the specified table:
SECURITY LABEL FOR hg_anon ON COLUMN tbname.col_name IS NULL;