This topic describes how to create a data masking rule in Data Security Guard so that DataWorks can dynamically and statically mask sensitive data in the results of ad hoc queries.

Prerequisites

  • DataWorks Professional Edition or a more advanced edition is activated. For more information, see Differences among DataWorks editions.
  • Mask Data in Page Query Results is turned on for your workspace in the DataWorks console. For more information, see Workspace settings.

Background information

DataWorks supports dynamic data masking and static data masking.
Type Description Data masking scenario
Dynamic data masking DataWorks masks sensitive data in query results. DataWorks provides several data masking scenarios such as Global Config, DataWorks Studio Config, Hologres Config, DataWorks Analysis Config, and MaxCompute Config. These are typical scenarios of dynamic data masking.
Static data masking DataWorks masks sensitive data before sensitive data is stored in a database. DataWorks provides the DataWorks Data Integration Config scenario. This is a typical scenario of static data masking.

Select a data masking scenario

  1. Go to the Data Security Guard page. For more information, see Overview.
  2. In the left-side navigation pane, choose Rule Change > Data Masking.
    On the Data Masking page, select a data masking scenario from the Masking Scene drop-down list based on your business requirements. DataWorks provides multiple scenarios. You can also create a custom scenario.
    • Global Config: The data masking rules and whitelists that are configured in the Global Config scenario will take effect in other scenarios, such as DataWorks Studio Config, Hologres Config, DataWorks Analysis Config, and MaxCompute Config.
    • DataWorks Studio Config:
      • After you configure data masking rules, the sensitive data that you query on the DataStudio page is masked. DataStudio
      • After you configure data masking rules, the sensitive data that you preview on the DataMap page is masked. DataMap
    • DataWorks Analysis Config: After you configure data masking rules, the sensitive data that you query on the SQL Query and SQLNotes pages of DataAnalysis is masked. DataAnalysis
    • Hologres Config: After you configure data masking rules, the sensitive data that you query from Hologres databases on the DataStudio and HoloStudio pages is masked. The data masking rules that are configured in the Hologres Config scenario take effect only in workspaces in the China (Hangzhou) and China (Beijing) regions. By default, the rules are not enabled in the Hologres Config scenario. To enable the rules, submit a ticket.
      Note Hologres does not support pseudonymization. If you configure a data masking rule that uses the pseudonymization method in the Global Config scenario, the sensitive data that you query from Hologres databases is masked with multiple asterisks (***).
    • MaxCompute Config: After you configure data masking rules, the sensitive data that you query from MaxCompute projects by using all methods is masked. The data masking rules that are configured in the MaxCompute Config scenario take effect only in workspaces in the China (Shanghai) region. For more information about how to enable the dynamic data masking feature in the MaxCompute Config scenario, see Dynamic data masking.
    • Custom data masking scenario: You can create a custom data masking scenario by performing the following steps: Click Masking Scene at the bottom of the Masking Scene drop-down list. In the New dialog box, configure the Scene Name and Scene Code parameters. The scenario name can contain only letters, digits, underscores (_), and hyphens (-). The scenario code can contain only digits and letters.
  3. Create a data masking rule.
    After you select a data masking scenario, you can create a data masking rule in this scenario. The following list provides the links to the sections that describe how to create data masking rules in different scenarios:

Create a data masking rule in the Global Config scenario

The following example shows how to create a data masking rule in the Global Config scenario. To create a rule in the Hologres Config, DataWorks Studio Config, DataWorks Analysis Config, or MaxCompute Config scenario, you can also follow the steps in this example.

  1. On the Data Masking page, set the Masking Scene parameter to Global Config(_default_scene_code).
  2. Optional. Select one or more MaxCompute projects or Hologres databases and authorize Data Security Guard to mask data for the MaxCompute projects or Hologres databases.
    Note This step is required only in the Hologres Config and MaxCompute Config scenarios.
    Click Select Desensitization Project or Select desensitization database. In the dialog box that appears, select one or more projects or databases, click the rightwards arrow to add them to the section on the right, and select the option button.
  3. Create a data masking rule.
    1. On the Data Masking tab, click Create Masking Rule in the upper-right corner.
    2. In the Create Masking Rule dialog box, configure the Sensitive field type and Desensitization way parameters.
      Create Masking Rule
      1. Configure basic information.
        Parameter Description
        Sensitive field type You can select an existing sensitive field type from the Sensitive field type drop-down list based on your business requirements. The system automatically filters out the sensitive field types that have been used in the current data masking scenario. For more information, see Identify sensitive data.
        Name of desensitization rule

        The system automatically enters the value of the Sensitive field type parameter in the field. You can also change the value. The name of the rule must be 1 to 30 characters in length and can contain letters and digits. If the name of the rule conflicts with that of an existing rule created by a user of the current tenant, the message The name of the rule already exists appears.

      2. Set the Desensitization way parameter to Reserved format encryption, To cover up, HASH encryption, Characters to replace, Range transform, integer, or empty.
        • Reserved format encryption
          This method replaces the characters of a data record with an artificial pseudonym of the same characteristics. The data format of the pseudonym is the same as that of the original data record.
          • Data watermark: Watermarks allow you to track the source of the data. If your data leaks, you can track the potential source where the data leak occurs based on the watermark. You can turn on or off Data watermark based on your business requirements.
            Note Only DataWorks Enterprise Edition or a more advanced edition supports the data watermark feature.
          • Desensitization characteristic value: By default, 5 is selected. You can select a digit from 0 to 9 as the characteristic value. Data masking rules vary depending on characteristic values. Therefore, different data masking results are generated when different characteristic values are used. For example, if the data record is a123 and the characteristic value is set to 0, the data masking result is b124. If the characteristic value is set to 1, the data masking result is c234. If the characteristic value remains unchanged, the same data masking result is returned for a data record at all times.
          • If you do not set the Sensitive field type parameter to a built-in sensitive field type, you must configure the Substitution character set parameter for your data records.

            Substitution character set: the character set that contains one or more characters to be replaced. You can separate multiple characters in a character set with commas (,). Each character can be a letter or a digit. If a character in your data records is included in this character set, the character is replaced with another character of the same type. For example, if a data record contains only digits from 0 to 3 and letters from a to d, the data masking result also contains only digits from 0 to 3 and letters from a to d. If the character is not included in this character set, it is not replaced.

        • To cover up
          This method replaces each of the characters at specific positions of a data record with an asterisk (*).
          • Recommended method: You can select Only show first and last character, Show first three and last two characters, and Show first three and last four characters from the Recommended method drop-down list.
          • Custom: You can flexibly specify whether to mask the specified number of characters of a data record from left to right. You can add up to 10 sections, and one of the sections must be The remaining digits. To cover up
            No. Description
            1 You can select digits or The remaining digits.
            2 You can enter an integer from 1 to 100.
            3 You can select Desensitization or No desensitization.
            The following figure shows how to mask the first three characters and leave the remaining characters intact. Example
          • HASH
            • Data watermark: Watermarks allow you to track the source of the data. If your data leaks, you can track the potential source where the data leak occurs based on the watermark. You can turn on or off Data watermark based on your business requirements.
              Note Only DataWorks Enterprise Edition or a more advanced edition supports the data watermark feature.
            • Encryption Algorithm: Select MD5, SHA256, SHA512, or SM3.
            • Add salt value: Set a salt value for each encryption algorithm. By default, 5 is selected. You can select a digit from 0 to 9 as the salt value.
              Note In cryptography, you can insert a specific string to a fixed position of a password to generate a hash value that is different from that of the original password. This process is called salting. A salt value is the specific string that you insert.
          • Characters to replace: This method replaces the characters at the specified positions based on the replacement method you selected.
            • Replacement position: You can select Replace all, Replace the first three digits, and Four digits after replacement from the drop-down list. You can also customize the replacement position.
              If you select Custom, you can customize sections and configure the replacement method for each section. You can add up to 10 sections, and one of the sections must be The remaining digits. Custom
              No. Description
              1 You can select digits or The remaining digits.
              2 You can enter an integer from 1 to 100.
              3 You can select Random replacement, Sample substitution, or Fixed value substitution.
            • Replacement Method: You can select Random replacement, Sample substitution, or Fixed value substitution.
              • Random replacement: This method randomly replaces the characters at the specific positions. The number of characters remains unchanged before and after the replacement.
              • Sample substitution: You must specify a sample library first. After you select the sample library, this method replaces the characters at the specific positions with the data in the specified sample library.
              • Fixed value substitution: You must enter a replacement value. The value must be 1 to 100 characters in length, and cannot be a string that contains only spaces. After you set the value, this method replaces the characters at the specific positions with the replacement value.
          • Range transform: This method is applicable to only the masking of numeric data. This method masks data within a specified value range to a fixed value. You can add 1 to 10 value ranges.
            • Original value range (m,n): the value range of the original data record. The valid value is a numeric value that is greater than or equal to 0. A maximum of two decimal places is supported.
            • Value after desensitization: the value that is used to replace the data record that you want to mask. The valid value is a numeric value that is greater than or equal to 0. A maximum of two decimal places is supported.
          • integer
            • Original data type: Only numeric data is supported.
            • Keep decimal places: You can select an integer from 0 to 5 as the valid value. The remaining parts are rounded. For example, if the original value is 3.1415 and the value is rounded down to two decimal places, the data masking result is 3.14.
          • empty: This method replaces the original data record with an empty string.
      3. Verify the configuration of the data masking rule. You can enter sample data in the Sample data field and click Desensitization verification. The sample data must be 0 to 100 characters in length. The data masking result is displayed in the Desensitization effect field.
    3. Click save.
    4. On the Data Masking tab, set the status of the created data masking rule to Active or Inactive based on your business requirements.
      In the Actions column, you can click the Delete, Change, or View Details icon to delete the rule, edit the rule, or view the details of the rule.
      Note
      • You cannot delete or edit a rule in the Active state. To delete or edit a rule, you must set the status to Inactive and check whether the rule is configured for a node. You must also contact the security administrator for further confirmation.
      • When the status is set to Inactive, you can modify the Method parameter of the rule, but you cannot modify the Sensitive field type and Masking Rule parameters.
      • After you modify the parameters, set the status of the rule to Active. Then, the data of the node for which the rule is configured can be masked based on the rule.
  4. Configure a whitelist.
    1. Click the Whitelist tab.
    2. On the Whitelist tab, click Add Account in the upper-right corner.
    3. In the Add Account dialog box, configure the parameters.
      Note
      • You do not need to configure a whitelist in the Hologres Config scenario.
      • If a user queries data within the time range that is specified by the Effective From parameter in the whitelist, the query results are not masked.
      • You cannot set the values of all parameters for the whitelist to All.
      1. Configure basic information.
        Parameter Description
        Whitelist Name The name of the whitelist. The name must be 1 to 30 characters in length and cannot contain special characters.
        Sensitivity Level The sensitivity level of sensitive data. You can select a built-in sensitivity level or a custom sensitivity level from custom sensitivity levels created by all users. For more information about how to configure a data category and a sensitivity level for sensitive data, see Mange data sensitivity levels.
        Data Category The data category of sensitive data. You can select a built-in data category or a custom data category from custom data categories created by all users.
        User Group The user group. You can select a user group that you added on the User Group Management page. You can select up to 50 user groups. After you add the selected user groups to the whitelist, you can use the Alibaba Cloud accounts or RAM users that belong to the selected user groups to view the original data that is not masked. For more information about how to add and manage a user group, see Create and manage user groups.
        Effective From The effective time range of the whitelist. If a user queries data beyond the time range that is specified in the whitelist, the query results are masked.
        Note If you set this parameter to Short, the effective time range is from the current time to the specified time. If a user queries data within this time range, the query results are not masked.
      2. Configure advanced settings.
        Parameter Description
        Sensitive field type The sensitive field type. You can select an existing sensitive field type from the drop-down list on the right. The existing sensitive field types include the built-in sensitive field types and sensitive field types created by all users.
        Project Scope The compute engines and the projects that belong to the compute engines. If you do not configure this parameter, all compute engines and the projects that belong to the compute engines are selected.
        Note You can select only projects on which the current account has permissions.
        Table Range The range of tables. If you do not configure this parameter, all tables are selected.
        Note The wildcard (.*) that consists of a period (.) and an asterisk (*) can be used. For example, .*name indicates that tables whose names are suffixed with name are selected. private.* indicates that tables whose names are prefixed with private are selected. If you specify multiple tables, separate them with commas (,). The total length of the tables cannot exceed 100 characters.
        Field Range The range of fields. If you do not configure this parameter, all fields are selected.
        Note The wildcard (.*) that consists of a period (.) and an asterisk (*) can be used. For example, .*name indicates that fields whose names are suffixed with name are selected. private.* indicates that fields whose names are prefixed with private are selected. If you specify multiple fields, separate them with commas (,). The total length of the fields cannot exceed 100 characters.
    4. Click OK.
  5. After you create and configure the data masking rules, the sensitive data that you query on specific pages such as DataStudio, DataMap, and DataAnalysis is masked based on the rules. For more information, see Select a data masking scenario.

Create a data masking rule in the DataWorks Data Integration Config scenario

  1. On the Data Masking page, set the Masking Scene parameter to DataWorks Data Integration Config(dataworks_data_integration_desense_code).
  2. Create a data masking rule.
    1. On the Data Masking tab, click Create Masking Rule in the upper-right corner.
    2. In the Masking Rule dialog box, set the Sensitive data type, Name of the desensitization rule, Method, Domain, and Replacement character set parameters.
      Masking Rule dialog box
      1. Configure basic information.
        Parameter Description
        Sensitive data type
        • By default, There are is selected from the drop-down list on the left. You can select an existing sensitive field type from the drop-down list on the right. The existing sensitive field types include the built-in sensitive field types and sensitive field types created by all users. You can select an existing sensitive field type based on your business requirements.
        • You can also select The new type from the drop-down list on the left. In the field on the right, enter a name for a new sensitive field type. The name must be 1 to 30 characters in length and can contain letters and digits.

          After you enter the name of a new sensitive field type, the system checks whether the name is used by existing sensitive field types, including built-in sensitive field types and sensitive field types created by all users. If the name has been used, the message The sensitive field type is repeated is displayed.

        Note The built-in sensitive field types are Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, Company, Nation, Constellation, Gender, and Nationality.
        Name of desensitization rule

        The system automatically enters the value of the Sensitive data type parameter in the field. You can also change the value. The name of the rule must be 1 to 30 characters in length and can contain letters and digits. If the name of the rule conflicts with that of an existing rule created by a user of the current tenant, the message The name of the rule already exists appears.

      2. You can set the Method parameter to Pseudonymisation, The hash, or Masking Out.
        • Pseudonymisation
          This method replaces the characters of a data record with an artificial pseudonym of the same characteristics. The data format of the pseudonym is the same as that of the original data record.
          • If you set the Sensitive data type parameter to a built-in sensitive field type, such as Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, or Company, you must configure the Domain parameter for your data records.

            Domain: You can select a digit from 0 to 9 as the security domain. Data masking rules vary depending on security domains. Therefore, different data masking results are generated when a data record resides in different security domains. For example, if the data record is a123 and the security domain is set to 0, the data masking result is b124. If the security domain is set to 1, the data masking result is c234. In a security domain, the same data masking result is returned for a data record at all times.

          • If you do not set the Sensitive data type parameter to a built-in sensitive field type, you must configure the Replacement character set parameter for your data records.

            Replacement character set: You can separate multiple characters in a character set with commas (,). Each character can be a letter or a digit. If a character in your data records is included in this character set, the character is replaced with another character of the same type. For example, if a data record contains only digits from 0 to 3 and letters from a to d, the data masking result also contains only digits from 0 to 3 and letters from a to d. If the character is not included in this character set, it is not replaced.

        • The hash

          This method encrypts a data record to generate a hash value of a fixed length. If you select this method, you must configure the Domain parameter.

          Domain: You can select a digit from 0 to 9 as the security domain. Data masking rules vary depending on security domains. Therefore, different data masking results are generated when a data record resides in different security domains. For example, if the data record is a123 and the security domain is set to 0, the data masking result is b124. If the security domain is set to 1, the data masking result is c234. In a security domain, the same data masking result is returned for a data record at all times.

        • Masking Out
          This method replaces each of the characters at specific positions of a data record with an asterisk (*).
          • Recommended: You can select Only show first and last character, Show first three and last two characters, and Show first three and last four characters from the Recommended drop-down list.
          • Custom: You can flexibly specify whether to mask the specified number of characters of a data record from left to right. You can add up to 10 sections, and one of the sections must be The remaining digits. Masking Out
            No. Description
            1 You can select digits or The remaining digits.
            2 You can enter an integer from 1 to 100.
            3 You can select Desensitization or No desensitization.
            The following figure shows how to mask the first three characters and leave the remaining characters intact. Masking Out 1
            The following figure shows how to mask the last three characters and leave the remaining characters intact. Masking Out 2
      3. Verify the configuration of the data masking rule. You can enter sample data in the Sample data field and click Desensitization verification. The sample data must be 0 to 100 characters in length. The data masking result is displayed in the Desensitization effect field.
    3. Click OK.
    4. The rule that you create appears on the Data Masking tab. In the Status column, you can set the status of the rule to Active or Inactive.
      In the Actions column, you can click the Delete, Change, or View Details icon to delete the rule, edit the rule, or view the details of the rule.
      Note
      • You cannot delete or edit a rule in the Active state. To delete or edit a rule, you must set the status to Inactive and check whether the rule is configured for a node. You must also contact the security administrator for further confirmation.
      • When the status is set to Inactive, you can modify the Method parameter of the rule, but you cannot modify the Sensitive field type and Masking Rule parameters.
      • After you modify the parameters, set the status of the rule to Active. Then, the data of the node for which the rule is configured can be masked based on the rule.
  3. After you create a data masking rule, you can add the rule when you create and configure a real-time synchronization node for data in a single table. For more information, see Configure data de-identification.