This topic describes how to create data masking rules in Data Security Guard so that DataWorks can dynamically and statically mask sensitive data in the results of ad hoc queries.

Prerequisites

  • DataWorks Professional Edition or a more advanced edition is activated. For more information, see Differences among DataWorks editions.
  • Mask Data in Page Query Results is turned on for your workspace in the DataWorks console. For more information, see Workspace settings.

Background information

DataWorks supports dynamic data masking and static data masking.
Type Description data masking scene
Dynamic data masking DataWorks masks sensitive data in query results. DataWorks provides several data masking scenes, such as Global Config, DataWorks Studio Config, Hologres Config, DataWorks Analysis Config, and MaxCompute Config. These scenes are typical scenarios of dynamic data masking.
Static data masking DataWorks masks sensitive data before sensitive data is stored in a database. DataWorks provides the DataWorks Data Integration Config scene. The scene is a typical scenario of static data masking.

Select a data masking scene

  1. Go to the Data Security Guard page. For more information, see Overview.
  2. In the left-side navigation pane, choose Rule Change > Data Masking.
    On the Data Masking page, select a data masking scene from the Masking Scene drop-down list based on your business requirements. DataWorks provides multiple scenes. You can also create custom scenes.
    • Global Config: The data masking rules and whitelists that are configured in the Global Config scene will take effect in other scenes, such as DataWorks Studio Config, DataWorks Analysis Config, MaxCompute Config, and DataWorks Data Integration Config.
    • DataWorks Studio Config:
      • After you configure data masking rules, the sensitive data that you query in DataStudio is masked. DataStudio
      • After you configure data masking rules, the sensitive data that you preview in Data Map is masked. Data Map
    • DataWorks Analysis Config: After you configure data masking rules, the sensitive data that you query on the SQL Query and SQLNotes pages of DataAnalysis is masked. DataAnalysis
    • Hologres Config: After you configure data masking rules, the sensitive data that you query from Hologres databases in DataStudio and HoloStudio is masked. The data masking rules that are configured in the Hologres Config scene take effect only in workspaces in the China (Hangzhou) and China (Beijing) regions. By default, the rules are not enabled in the Hologres Config scene. To enable the rules, submit a ticket.
      Note Hologres does not support pseudonymization. If you configure a data masking rule that uses the pseudonymization method in the Global Config scene, the sensitive data that you query from Hologres databases is masked with multiple asterisks (***).
    • MaxCompute Config: After you configure data masking rules, the sensitive data that you query from MaxCompute projects by using all methods is masked. The data masking rules that are configured in the MaxCompute Config scene take effect only in workspaces in the China (Shanghai) region.
    • Custom data masking scene: You can create a custom data masking scene by performing the following steps: Click Masking Scene at the bottom of the Masking Scene drop-down list. In the New dialog box, set the Scene Name and Scene Code parameters. The scene name can contain only letters, digits, underscores (_), and hyphens (-). The scene code can contain only digits and letters.
  3. Create a data masking rule.
    After you select a data masking scene, you can create data masking rules in this scene. The following list provides the links to the sections that describe how to create data masking rules in different scenes:

Create a data masking rule in the Global Config scene

The following example shows how to create a data masking rule in the Global Config scene. To create a rule in the Hologres Config, DataWorks Studio Config, DataWorks Analysis Config, or MaxCompute Config scene, you can also follow the steps in this example.

  1. On the Data Masking page, set the Masking Scene parameter to Global Config(_default_scene_code).
  2. Optional. Select one or more MaxCompute projects or Hologres databases and authorize Data Security Guard to mask data for the MaxCompute projects or Hologres databases.
    Note This step is required only in the Hologres Config and MaxCompute Config scenes.
    Click Select Desensitization Project or Select desensitization database. In the dialog box that appears, select one or more projects or databases, and select the option button.
  3. Create a data masking rule.
    1. On the Data Masking tab, click Create Masking Rule in the upper-right corner.
    2. In the Create Masking Rule dialog box, set the Sensitive field type and Desensitization way parameters.
      Create Masking Rule
      1. Configure basic information.
        Parameter Description
        Sensitive field type You can select an existing sensitive field type from the Sensitive field type drop-down list as needed. The system automatically filters out the sensitive field types that have been used in the current data masking scene. For more information, see Manage sensitive field types.
        Name of desensitization rule

        The system automatically enters the value of the Sensitive field type parameter in the field. You can also change the value. The name of the rule must be 1 to 30 characters in length and can contain letters and digits. If the name of the rule conflicts with that of an existing rule, the message The name of the rule already exists appears.

      2. Set the Desensitization way parameter to Reserved format encryption, To cover up, HASH encryption, Characters to replace, Range transform, integer, or empty.
        • Reserved format encryption
          This method replaces the characters of a data record with an artificial pseudonym of the same data type. The format of the pseudonym is the same as that of the original data record.
          • Data watermark: Watermarks allow you to track the source of the data. If your data leaks, you can track the potential source where the data leak occurs based on the watermark. You can turn on or off Data watermark as needed.
            Note Only DataWorks Enterprise Edition or a more advanced edition supports the data watermark feature.
          • Desensitization characteristic value: By default, 5 is selected. You can select a digit from 0 to 9 as the characteristic value. data masking rules vary depending on characteristic values. Therefore, different data masking results are generated when different characteristic values are used. For example, if the data record is a123 and the characteristic value is set to 0, the data masking result is b124. If the characteristic value is set to 1, the data masking result is c234. If the characteristic value remains unchanged, the same data masking result is returned for a data record at all times.
          • If you do not set the Sensitive field type parameter to a built-in sensitive type, you must set the Substitution character set parameter for your data records.

            Substitution character set: the character set that contains one or more characters to be replaced. Separate multiple characters with commas (,). Each character can be a letter or a digit. If a character in data records is included in this character set, the character is replaced with another character of the same type. For example, if a data record contains only digits from 0 to 3 and letters from a to d, the data masking result also contains only digits from 0 to 3 and letters from a to d. If the character is not included in this character set, it is not replaced.

        • To cover up
          This method replaces each of the characters at specific positions of a data record with an asterisk (*).
          • Recommended method: You can select Only show first and last character, Show first three and last two characters, and Show first three and last four characters from the Recommended method drop-down list.
          • Custom: You can flexibly specify whether to mask the specified number of characters of a data record from left to right. You can add up to 10 sections, and one of the sections must be The remaining digits. To cover up
            No. Description
            1 You can select digits or The remaining digits.
            2 You can enter an integer from 1 to 100.
            3 You can select Desensitization or No desensitization.
            The following figure shows how to mask the first three characters and leave the remaining characters intact. Example
          • HASH encryption
            • Data watermark: Watermarks allow you to track the source of the data. If your data leaks, you can track the potential source where the data leak occurs based on the watermark. You can turn on or off Data watermark as needed.
              Note Only DataWorks Enterprise Edition or a more advanced edition supports the data watermark feature.
            • Encryption Algorithm: Select MD5, SHA256, SHA512, or SM3.
            • Add salt value: Set a salt value for each encryption algorithm. By default, 5 is selected. You can select a digit from 0 to 9 as the salt value.
              Note In cryptography, you can insert a specific string to a fixed position of a password to generate a hash value that is different from that of the original password. This process is called salting. A salt value is the specific string that you insert.
          • Characters to replace: This method replaces the characters at the specified position based on the replacement method you selected.
            • Replacement position: You can select Replace all, Replace the first three digits, and Four digits after replacement from the drop-down list. You can also customize the replacement position.
              If you select Custom, you can customize sections and configure the replacement method for each section. You can add up to 10 sections, and one of the sections must be The remaining digits. Custom
              No. Description
              1 You can select digits or The remaining digits.
              2 You can enter an integer from 1 to 100.
              3 You can select Random replacement, Sample substitution, or Fixed value substitution.
            • Replace the way: You can select Random replacement, Sample substitution, or Fixed value substitution.
              • Random replacement: This method randomly replaces the characters at the specific positions. The number of characters remains unchanged before and after the replacement.
              • Sample substitution: You must specify a sample library first. After you select the sample library, this method replaces the characters at the specific positions with the data in the specified sample library.
              • Fixed value substitution: You must enter a replacement value. The value must be 1 to 100 characters in length, and cannot be a string that contains only spaces. After you set the value, this method replaces the characters at the specific positions with the replacement value.
          • Range transform: This method is applicable to only the data masking of numeric data. This method masks data within a specified value range to a fixed value. You can add 1 to 10 ranges.
            • Original value range (m,n): the value range of the original data record. The valid value is a numeric value that is greater than or equal to 0. A maximum of two decimal places is supported.
            • Value after desensitization: the value that is used to replace the data record to be masked. The valid value is a numeric value that is greater than or equal to 0. A maximum of two decimal places is supported.
          • integer
            • Original data type: Only numeric data is supported.
            • Keep decimal places: You can select an integer from 0 to 5 as the valid value. The remaining parts are rounded. For example, if the original value is 3.1415 and the value is rounded down to two decimal places, the data masking result is 3.14.
          • empty: This method replaces the original data record with an empty string.
      3. Verify the configuration of the data masking rule. You can enter sample data in the Sample data field and click Desensitization verification. The sample data must be 0 to 100 characters in length. The data masking result is displayed in the Desensitization effect field.
    3. Click save.
    4. On the Data Masking tab, set the status of the created data masking rule to Active or Inactive as needed.
      In the Actions column, you can click the Delete, Change, or View Details icon to delete the rule, edit the rule, or view the details of the rule.
      Note
      • You cannot delete or edit a rule in the Active state. To delete or edit a rule, you must set the status to Inactive and check whether the rule is configured for a node. You must also contact the security administrator for further confirmation.
      • When the status is set to Inactive, you can modify the Desensitization way parameter of the rule, but you cannot modify the Sensitive field type and Name of desensitization rule parameters.
      • After you modify the parameters, set the status of the rule to Active. Then, the data of the node for which the rule is configured can be masked based on the rule.
  4. Configure a whitelist.
    1. Click the Whitelist tab.
    2. On the Whitelist tab, click Add Account in the upper-right corner.
    3. In the Add Account dialog box, set the Rule, Account, and Effective From parameters.
      Note

      You do not need to configure a whitelist in the Hologres Config scene.

      If a user queries data beyond the time range that is specified in the whitelist, the query results are masked.

  5. After you create and configure the data masking rules, the sensitive data that you query on specific pages, such as the DataStudio, DataMap, and DataAnalysis pages, is masked based on the rules. For more information, see Select a data masking scene.

Create a data masking rule in the DataWorks Data Integration Config scene

  1. On the Data Masking page, set the Masking Scene parameter to DataWorks Data Integration Config(dataworks_data_integration_desense_code).
  2. Create a data masking rule.
    1. On the Data Masking tab, click Create Masking Rule in the upper-right corner.
    2. In the Masking Rule dialog box, set the Sensitive data type, Name of the desensitization rule, Method, Domain, and Replacement character set parameters.
      Masking Rule dialog box
      1. Configure basic information.
        Parameter Description
        Sensitive data type
        • By default, There are is selected from the drop-down list on the left. You can select an existing sensitive data type from the drop-down list on the right. The existing sensitive data types include the built-in sensitive data types and sensitive data types created by all users. You can select an existing sensitive data type based on your business requirements.
        • You can also select The new type from the drop-down list on the left. In the field on the right, enter a name for a new sensitive data type. The name must be 1 to 30 characters in length and can contain letters and digits.

          After you enter the name of a new sensitive data type, the system checks whether the name is used by existing sensitive data types, including built-in sensitive data types and sensitive data types created by all users. If the name has been used, the message The sensitive field type is repeated is displayed.

        Note The built-in sensitive data types are Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, Company, Nation, Constellation, Gender, and Nationality.
        Name of the desensitization rule

        The system automatically enters the value of the Sensitive data type parameter in the field. You can also change the value. The name of the rule must be 1 to 30 characters in length and can contain letters and digits. If the name of the rule conflicts with that of an existing rule, the message Duplicate rule names appears.

      2. You can set the Method parameter to Pseudonymisation, The hash, or Masking Out.
        • Pseudonymisation
          This method replaces the characters of a data record with an artificial pseudonym of the same data type. The format of the pseudonym is the same as that of the original data record.
          • If you set the Sensitive data type parameter to a built-in sensitive data type, such as Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, or Company, you must set the Domain parameter for your data records.

            Domain: You can select a digit from 0 to 9 as the security domain. data masking rules vary depending on security domains. Therefore, different data masking results are generated when a data record resides in different security domains. For example, if the data record is a123 and the security domain is set to 0, the data masking result is b124. If the security domain is set to 1, the data masking result is c234. In a security domain, the same data masking result is returned for a data record at all times.

          • If you do not set the Sensitive data type parameter to a built-in sensitive type, you must set the Replacement character set parameter for your data records.

            Replacement character set: the character set that contains one or more characters to be replaced. Separate multiple characters with commas (,). Each character can be a letter or a digit. If a character in data records is included in this character set, the character is replaced with another character of the same type. For example, if a data record contains only digits from 0 to 3 and letters from a to d, the data masking result also contains only digits from 0 to 3 and letters from a to d. If the character is not included in this character set, it is not replaced.

        • The hash

          This method encrypts a data record to generate a hash value of a fixed length. If you select this method, you must set the Domain parameter.

          Domain: You can select a digit from 0 to 9 as the security domain. data masking rules vary depending on security domains. Therefore, different data masking results are generated when a data record resides in different security domains. For example, if the data record is a123 and the security domain is set to 0, the data masking result is b124. If the security domain is set to 1, the data masking result is c234. In a security domain, the same data masking result is returned for a data record at all times.

        • Masking Out
          This method replaces each of the characters at specific positions of a data record with an asterisk (*).
          • Recommended: You can select Only show first and last character, Show first three and last two characters, and Show first three and last four characters from the Recommended drop-down list.
          • Custom: You can flexibly specify whether to mask the specified number of characters of a data record from left to right. You can add up to 10 sections, and one of the sections must be The remaining digits. Masking Out
            No. Description
            1 You can select digits or The remaining digits.
            2 You can enter an integer from 1 to 100.
            3 You can select desensitization or Don't desensitization.
            The following figure shows how to mask the first three characters and leave the remaining characters intact. Masking Out 1
            The following figure shows how to mask the last three characters and leave the remaining characters intact. Masking Out 2
      3. Verify the configuration of the data masking rule. You can enter sample data in the Sample data field and click Test. The sample data must be 0 to 100 characters in length. The data masking result is displayed in the Effect of desensitization field.
    3. Click OK.
    4. The rule that you create appears on the Data Masking tab. In the Status column, you can set the status of the rule to Active or Inactive.
      In the Actions column, you can click the Delete, Change, or View Details icon to delete the rule, edit the rule, or view the details of the rule.
      Note
      • You cannot delete or edit a rule in the Active state. To delete or edit a rule, you must set the status to Inactive and check whether the rule is configured for a node. You must also contact the security administrator for further confirmation.
      • When the status is set to Inactive, you can modify the Method parameter of the rule, but you cannot modify the Sensitive data type and Name of the desensitization rule parameters.
      • After you modify the parameters, set the status of the rule to Active. Then, the data of the node for which the rule is configured can be masked based on the rule.
  3. After you create a data masking rule, you can add the rule when you create and configure a real-time sync node for data in a single table. For more information, see Configure data de-identification.