This topic describes how to customize de-identification rules in Data Security Guard so that DataWorks can dynamically and statically de-identify sensitive data in the results of ad hoc queries.

Prerequisites

DataWorks Professional Edition or a more advanced edition is activated.

Background information

DataWorks supports dynamic de-identification and static de-identification.
Type Description De-identification scene
Dynamic de-identification DataWorks de-identifies sensitive data in query results. DataWorks provides several de-identification scenes, such as Global Config, DataWorks Studio Config, Hologres Config, DataWorks Analysis Config, and MaxCompute Config. These scenes are typical scenarios of dynamic de-identification.
Static de-identification DataWorks de-identifies sensitive data before sensitive data is stored in a database. DataWorks provides the DataWorks Data Integration Config scene. The scene is a typical scenario of static de-identification.

Select a de-identification scene

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Click the More icon icon in the upper-left corner and choose All Products > Data governance > Data Security Guard.
  3. Click Try now to go to the Data Security Guard page.
  4. In the left-side navigation pane, choose Rule Change > Data Masking.
    On the Data Masking page, select a de-identification scene from the Masking Scene drop-down list based on your business requirements. DataWorks provides multiple scenes. You can also create custom scenes.
    • Global Config: The de-identification rules and whitelists that are configured in the Global Config scene will take effect in other scenes, such as DataWorks Studio Config, DataWorks Analysis Config, MaxCompute Config, and DataWorks Data Integration Config.
    • DataWorks Studio Config:
      • After you configure de-identification rules, the sensitive data that you query in DataStuidio is de-identified. DataStudio
      • After you configure de-identification rules, the sensitive data that you preview in DataMap is de-identified. DataMap
    • DataWorks Analysis Config: After you configure de-identification rules, the sensitive data that you query on the SQL Query and SQLNotes pages of DataAnalysis is de-identified. DataAnalysis
    • Hologres Config: After you configure de-identification rules, the sensitive data that you query from Hologres databases in DataStudio and HoloStudio is de-identified. The de-identification rules that are configured in the Hologres Config scene take effect only in workspaces in the China (Hangzhou) and China (Beijing) regions. By default, the rules are not enabled in the Hologres Config scene. To enable the rules, submit a ticket.
      Note Hologres does not support pseudonymisation. If you configure a de-identification rule that uses the pseudonymisation method in the Global Config scene, the sensitive data that you query from Hologres databases is de-identified with multiple asterisks (***).
    • MaxCompute Config: After you configure de-identification rules, the sensitive data that you query from MaxCompute projects by using all methods is de-identified. The de-identification rules that are configured in the MaxCompute Config scene take effect only in workspaces in the China (Shanghai) region.
    • Custom de-identification scene: You can create a custom de-identification scene by performing the following steps: Click Masking Scene at the bottom of the Masking Scene drop-down list. In the New dialog box, set the Scene Name and Scene Code parameters. The scene name can contain only letters, digits, underscores (_), and hyphens (-). The scene code can contain only digits and letters.
  5. Create a de-identification rule.
    After you select a de-identification scene, you can create de-identification rules in this scene.

Create a de-identification rule in the Global Config scene

The following example shows how to create a de-identification rule in the Global Config scene. To create a rule in the Hologres Config, DataWorks Studio Config, DataWorks Analysis Config, or MaxCompute Config scene, you can also follow the steps in this example.

  1. On the Data Masking page, set the Masking Scene parameter to Global Config(_default_scene_code).
  2. Optional. Select one or more MaxCompute projects or Hologres databases and authorize Data Security Guard to de-identify data for the MaxCompute projects or Hologres databases.
    Note This step is required only in the Hologres Config and MaxCompute Config scenes.
    Click Select Desensitization Project or Select desensitization database. In the dialog box that appears, select one or more projects or databases, and click the option button.
  3. Create a de-identification rule.
    1. On the Data Masking tab, click Create Rule in the upper-right corner.
    2. In the Create Rule dialog box, set the Masking Rule and Method parameters.

      You can select an existing de-identification rule from the Masking Rule drop-down list. For more information about de-identification rules, see Configure data identification rules.

      You can set the Method parameter to Pseudonymisation, HASH, or Masking Out. The valid values that are displayed for the Method parameter vary based on the identification rule that you select from the Masking Rule drop-down list.
      • Pseudonymisation
        This method replaces the characters of a data record with an artificial pseudonym of the same data type. If you select this method, you must set the Data watermark and Domain parameters.
        • Data watermark: Watermarks allow you to track the source of the data. If your data leaks, you can track the potential source where the data leak occurs based on the watermark.
        • Domain: You can select a digit from 0 to 9 as the security domain. De-identification rules in different security domains are different. Therefore, different data de-identification results are generated when a data record resides in different security domains. For example, if the data record is a123 and the security domain is 0, the data de-identification result is b124. If the security domain is 1, the data de-identification result is c234. In a security domain, the same data de-identification result is returned for a data record at all times.

      • HASH
        If you select this method, you must set the Data watermark and Domain parameters.
        • Data watermark: Watermarks allow you to track the source of the data. If your data leaks, you can track the potential source where the data leak occurs based on the watermark.
        • Domain: You can select a digit from 0 to 9 as the security domain. De-identification rules in different security domains are different. Therefore, different data de-identification results are generated when a data record resides in different security domains. For example, if the data record is a123 and the security domain is 0, the data de-identification result is b124. If the security domain is 1, the data de-identification result is c234. In a security domain, the same data de-identification result is returned for a data record at all times.

      • Masking Out
        This method uses asterisks (*) to mask specified sections of a data record. This is a common method.
        Parameter Description
        Recommended You can select recommended policies to mask data of common types such as ID card numbers and bank card numbers.
        Custom You can flexibly specify whether to mask the specified number of characters at the first, middle, or last section of a data record.
    3. Click Save.
    4. On the Data Masking tab, set the status of the created de-identification rule to Active or Inactive as needed.
      To test whether the de-identification rule works, find the de-identification rule and click the Test icon in the Actions column.
  4. Configure a whitelist.
    1. Click the Whitelist tab.
    2. On the Whitelist tab, click Add Account in the upper-right corner.
    3. In the Add Account dialog box, set the Rule, Account, and Effective From parameters.
      Note

      You do not need to configure a whitelist in the Hologres Config scene.

      If a user queries data beyond the time range that is specified in the whitelist, the query results are de-identified.

Create a de-identification rule in the DataWorks Data integration Config scene

  1. On the Data Masking page, set the Masking Scene parameter to DataWorks Data Integration Config(dataworks_data_integration_desense_code).
  2. Create a de-identification rule.
    1. On the Data Masking tab, click Create Rule in the upper-right corner.
    2. In the Masking Rule dialog box, set the Sensitive data type, Name of the desensitization rule, Method, Domain, and Replacement character set parameters.
      Masking Rule dialog box
      1. Set basic parameters.
        Parameter Description
        Sensitive data type
        • By default, There are is selected from the drop-down list on the left. You can select an existing sensitive data type from the drop-down list on the right. The existing sensitive data types include the built-in sensitive data types and sensitive data types created by all users. You can select an existing sensitive data type based on your business requirements.
        • You can also select The new type from the drop-down list on the left. In the field on the right, enter the name for a new sensitive data type. The name must be 1 to 30 characters in length and can contain letters and digits.

          After you enter the name for the new data type, the system checks whether the name is used by existing data types, including the built-in sensitive data types and sensitive data types created by all users. If the name is used, The sensitive field type is repeated is displayed below the field.

        Note The built-in sensitive data types are Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, Company, Nation, Constellation, Gender, and Nationality.
        Name of the desensitization rule

        After you click the Name of the desensitization rule field, the system automatically enters the value of the Sensitive data type parameter in the field. The name of the rule must be 1 to 30 characters in length and can contain letters and digits. You can also change the name. If the name is used by existing rules that you create, Duplicate rule names is displayed below the field.

      2. You can set the Method parameter to Pseudonymisation, HASH, or Masking Out.
        • Pseudonymisation
          This method replaces the characters of a data record with an artificial pseudonym of the same data type. The format of the pseudonym is the same as that of the original data record.
          • If you set the Sensitive data type parameter to a built-in sensitive data type, such as Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, and Company, you must set the Domain parameter for your data records.

            Domain: You can select a digit from 0 to 9 as the security domain. De-identification rules in different security domains are different. Therefore, different data de-identification results are generated when a data record resides in different security domains. For example, if the data record is a123 and the security domain is 0, the data de-identification result is b124. If the security domain is 1, the data de-identification result is c234. In a security domain, the same data de-identification result is returned for a data record at all times.

          • If you do not set the Sensitive data type parameter to a built-in sensitive type, you must set the Replacement character set parameter for your data records.

            Replacement character set: the character set that contains one or more characters to be replaced. Separate multiple characters with commas (,). Each character can be a letter or a digit. If a character in data records is included in this character set, the character is replaced with another character of the same type. For example, if a data record contains only digits from 0 to 3 and letters from a to d, the data de-identification result also contains only digits from 0 to 3 and letters from a to d. If the character is not included in this character set, it is not replaced.

        • The hash

          This method encrypts a data record to generate a hash value of a fixed length. If you select this method, you must set the Domain parameter.

          Domain: You can select a digit from 0 to 9 as the security domain. De-identification rules in different security domains are different. Therefore, different data de-identification results are generated when a data record resides in different security domains. For example, if the data record is a123 and the security domain is 0, the data de-identification result is b124. If the security domain is 1, the data de-identification result is c234. In a security domain, the same data de-identification result is returned for a data record at all times.

        • Masking Out
          This method replaces each of the characters at specific positions of a data record with an asterisk (*).
          • Recommended: You can select Only show first and last character, Show first three and last two characters, and Show first three and last four characters from the Recommended drop-down list. By default, Only show first and last character is selected.
          • Custom: You can flexibly specify whether to mask the specified number of characters at the first, middle, or last sections of a data record. You can add up to 10 sections, and one of the sections must be The remaining digits. Masking Out
            Icon Description
            You can select digits or The remaining digits.
            You can enter an integer from 1 to 100.
            You can select desensitization or Don't desensitization.
            The following figure shows how to de-identify the first three digits and leave the remaining digits intact. Masking Out 1
            The following figure shows how to de-identify the last three digits and leave the remaining digits intact. Masking Out 2
      3. To verify the configurations of the de-identification rule, you can enter sample data in the Sample Data field. The sample data must be 0 to 100 characters in length. Click Test. The data de-identification result is displayed in the Effect of desensitization field.
    3. Click OK.
    4. The rule that you create appears on the Data Masking tab. In the Status column, you can set the status of the rule to Active or Inactive.
      In the Actions column, you can click the Delete, Change, or View Details icon to delete the rule, edit the rule, or view the details of the rule.
      Note
      • You cannot delete or edit a rule in the Active state. To delete or edit a rule, you must set the status to Inactive and check whether the rule is configured for a node. You must also contact the security administrator for further confirmation.
      • When the status is set to Inactive, you can modify the Method parameter of the rule, but you cannot modify the Sensitive data type and Name of the desensitization rule parameters.
      • After you modify the parameters, set the status of the rule to Active. Then, the data of the node for which the rule is configured can be de-identified based on the rule.
  3. After you create a data de-identification rule, you can add the rule when you create and configure a real-time sync node for data in a single table, see Configure data de-identification.

Verify the de-identification result in DataWorks

After you create and configure de-identification rules, DataWorks dynamically de-identifies the results of queries in your workspace based on the rules.
Note You must first turn on Mask Data in Page Query Results for your workspace in the DataWorks console. For more information, see Workspace settings.