This topic describes how to create a data masking rule in Data Security Guard so that DataWorks can dynamically and statically mask sensitive data in the results of ad hoc queries.

Prerequisites

Background information

DataWorks supports dynamic data masking and static data masking.
TypeDescriptionData masking scenario
Dynamic data maskingDataWorks masks sensitive data in query results. DataWorks provides several data masking scenarios such as Global Config, DataWorks Studio Config, Hologres Config, DataWorks Analysis Config, and MaxCompute Config. These are typical scenarios of dynamic data masking. For more information, see Create a data masking rule in the Global Config scenario.
Static data maskingDataWorks masks sensitive data before sensitive data is stored in a database. DataWorks provides the DataWorks Data Integration Config scenario. This is a typical scenario of static data masking. For more information, see Create a data masking rule in the DataWorks Data Integration Config scenario.

Select a data masking scenario

  1. Go to the Data Security Guard page. For more information, see Overview.
  2. In the left-side navigation pane, choose Rule Change > Data Masking.
    On the Data Masking page, select a data masking scenario from the Masking Scene drop-down list based on your business requirements. DataWorks provides multiple scenarios. You can also create a custom scenario.
    • Global Config: The data masking rules and whitelists that are configured in the Global Config scenario will take effect in other scenarios, such as DataWorks Studio Config, Hologres Config, DataWorks Analysis Config, and MaxCompute Config.
    • DataWorks Studio Config:
      • After you configure data masking rules, the sensitive data that you query on the DataStudio page is masked. DataStudio
      • After you configure data masking rules, the sensitive data that you preview on the DataMap page is masked. DataMap
    • DataWorks Analysis Config: After you configure data masking rules, the sensitive data that you query on the SQL Query and SQLNotes pages of DataAnalysis is masked. DataAnalysis
    • Hologres Config: After you configure data masking rules, the sensitive data that you query from Hologres databases on the DataStudio and HoloStudio pages is masked. The data masking rules that are configured in the Hologres Config scenario take effect only in workspaces in the China (Hangzhou) and China (Beijing) regions. By default, the rules are not enabled in the Hologres Config scenario. To enable the rules, submit a ticket.
      Note Hologres does not support pseudonymization. If you configure a data masking rule that uses the pseudonymization method in the Global Config scenario, the sensitive data that you query from Hologres databases is masked with multiple asterisks (***).
    • MaxCompute Config: After you configure data masking rules, the sensitive data that you query from MaxCompute projects by using all methods is masked. The data masking rules that are configured in the MaxCompute Config scenario take effect only in workspaces in the China (Shanghai) region. For more information about how to enable the dynamic data masking feature in the MaxCompute Config scenario, see Dynamic data masking.
    • Custom data masking scenario: You can create a custom data masking scenario by performing the following steps: Click Masking Scene at the bottom of the Masking Scene drop-down list. In the New dialog box, configure the Scene Name and Scene Code parameters. The scenario name can contain only letters, digits, underscores (_), and hyphens (-). The scenario code can contain only digits and letters.
  3. Create a data masking rule.
    After you select a data masking scenario, you can create a data masking rule in this scenario. The following list provides the links to the sections that describe how to create data masking rules in different scenarios:

Create a data masking rule in the Global Config scenario

The following example shows how to create a data masking rule in the Global Config scenario. To create a rule in the Hologres Config, DataWorks Studio Config, DataWorks Analysis Config, or MaxCompute Config scenario, you can also follow the steps in this example.

  1. On the Data Masking page, set the Masking Scene parameter to Global Config(_default_scene_code).
  2. Optional. Select one or more MaxCompute projects or Hologres databases and authorize Data Security Guard to mask data for the MaxCompute projects or Hologres databases.
    Note This step is required only in the Hologres Config and MaxCompute Config scenarios.
    Click Select Desensitization Project or Select desensitization database. In the dialog box that appears, select one or more projects or databases, click the rightwards arrow to add them to the section on the right, and select the option button.
  3. Create a data masking rule.
    1. On the Data Masking tab, click Create Masking Rule in the upper-right corner.
    2. In the Create Masking Rule dialog box, configure the Sensitive field type and Desensitization way parameters.
      Create Masking Rule
      1. Configure basic information.
        ParameterDescription
        Sensitive field typeYou can select an existing sensitive field type from the Sensitive field type drop-down list based on your business requirements. The system automatically filters out the sensitive field types that have been used in the current data masking scenario. For more information, see Identify sensitive data.
        Name of desensitization rule

        The system automatically enters the value of the Sensitive field type parameter in the field. You can also change the value. The name of the rule must be 1 to 30 characters in length and can contain letters and digits. If the name of the rule conflicts with that of an existing rule created by a user of the current tenant, the message The name of the rule already exists appears.

      2. Set the Desensitization way parameter to Reserved format encryption, To cover up, HASH encryption, Characters to replace, Range transform, integer, or empty.
        • Reserved format encryption
          This method replaces the characters of a data record with an artificial pseudonym of the same characteristics. The data format of the pseudonym is the same as that of the original data record.
          • Data watermark: Watermarks allow you to trace the source of data. If your data is leaked, you can trace the potential source from which the data leak occurred based on the data watermark. You can turn on or off Data watermark based on your business requirements.
            Note Only DataWorks Enterprise Edition or a more advanced edition supports the data watermark feature.
          • Desensitization characteristic value: By default, 5 is selected. You can select a digit from 0 to 9 as the characteristic value. Data masking rules vary depending on characteristic values. Therefore, different data masking results are generated when different characteristic values are used. For example, if the data record is a123 and the characteristic value is set to 0, the data masking result is b124. If the characteristic value is set to 1, the data masking result is c234. If the characteristic value remains unchanged, the same data masking result is returned for a data record at all times.
          • If you do not set the Sensitive field type parameter to a built-in sensitive field type, you must configure the Substitution character set parameter for your data records.

            Substitution character set: the character set that contains one or more characters to be replaced. You can separate multiple characters in a character set with commas (,). Each character can be a letter or a digit. If a character in your data records is included in this character set, the character is replaced with another character of the same type. For example, if a data record contains only digits from 0 to 3 and letters from a to d, the data masking result also contains only digits from 0 to 3 and letters from a to d. If the character is not included in this character set, it is not replaced.

        • To cover up
          This method replaces each of the characters at specific positions of a data record with an asterisk (*).
          • Recommended method: You can select Only show first and last character, Show first three and last two characters, and Show first three and last four characters from the Recommended method drop-down list.
          • Custom: You can flexibly specify whether to mask the specified number of characters of a data record from left to right. You can add up to 10 segments, and The remaining digits must be specified for one of the segments. To cover up
            No.Description
            1You can select digits or The remaining digits.
            2You can enter an integer from 1 to 100.
            3You can select Desensitization or No desensitization.
            The following figure shows how to mask the first three characters and leave the remaining characters intact. Example
          • HASH
            • Data watermark: Watermarks allow you to trace the source of data. If your data is leaked, you can trace the potential source from which the data leak occurred based on the data watermark. You can turn on or off Data watermark based on your business requirements.
              Note Only DataWorks Enterprise Edition or a more advanced edition supports the data watermark feature.
            • Encryption Algorithm: Select MD5, SHA256, SHA512, or SM3.
            • Add salt value: Set a salt value for each encryption algorithm. By default, 5 is selected. You can select a digit from 0 to 9 as the salt value.
              Note In cryptography, you can insert a specific string to a fixed position of a password to generate a hash value that is different from that of the original password. This process is called salting. A salt value is the specific string that you insert.
          • Characters to replace: This method replaces the characters at the specified positions based on the replacement method you selected.
            • Replacement position: You can select Replace all, Replace the first three digits, and Four digits after replacement from the drop-down list. You can also customize the replacement position.
              If you select Custom, you can customize segments and configure the replacement method for each segment. You can add up to 10 segments, and The remaining digits must be specified for one of the segments. Custom
              No.Description
              1You can select digits or The remaining digits.
              2You can enter an integer from 1 to 100.
              3You can select Random replacement, Sample substitution, or Fixed value substitution.
            • Replacement Method: You can select Random replacement, Sample substitution, or Fixed value substitution.
              • Random replacement: This method randomly replaces the characters at the specific positions. The number of characters remains unchanged before and after the replacement.
              • Sample substitution: You must specify a sample library first. After you select the sample library, this method replaces the characters at the specific positions with the data in the specified sample library.
              • Fixed value substitution: You must enter a replacement value. The value must be 1 to 100 characters in length, and cannot be a string that contains only spaces. After you set the value, this method replaces the characters at the specific positions with the replacement value.
          • Range transform: This method is applicable to only the masking of numeric data. This method masks data within a specified value range to a fixed value. You can add 1 to 10 value ranges.
            • Original value range (m,n): the value range of the original data record. The valid value is a numeric value that is greater than or equal to 0. A maximum of two decimal places is supported.
            • Value after desensitization: the value that is used to replace the data record that you want to mask. The valid value is a numeric value that is greater than or equal to 0. A maximum of two decimal places is supported.
          • integer
            • Original data type: Only numeric data is supported.
            • Keep decimal places: You can select an integer from 0 to 5 as the valid value. The remaining parts are rounded. For example, if the original value is 3.1415 and the value is rounded down to two decimal places, the data masking result is 3.14.
          • empty: This method replaces the original data record with an empty string.
      3. Verify the configuration of the data masking rule. You can enter sample data in the Sample data field and click Desensitization verification. The sample data must be 0 to 100 characters in length. The data masking result is displayed in the Desensitization effect field.
    3. Click save.
    4. On the Data Masking tab, set the status of the created data masking rule to Active or Inactive based on your business requirements.
      In the Actions column, you can click the Delete, Change, or View Details icon to delete the rule, edit the rule, or view the details of the rule.
      Note
      • You cannot delete or edit a rule in the Active state. To delete or edit a rule, you must set the status to Inactive and check whether the rule is configured for a node. You must also contact the security administrator for further confirmation.
      • When the status is set to Inactive, you can modify the Method parameter of the rule, but you cannot modify the Sensitive field type and Masking Rule parameters.
      • After you modify the parameters, set the status of the rule to Active. Then, the data of the node for which the rule is configured can be masked based on the rule.
  4. Configure a whitelist.
    1. Click the Whitelist tab.
    2. On the Whitelist tab, click Add Account in the upper-right corner.
    3. In the Add Account dialog box, configure the parameters.
      Note
      • You do not need to configure a whitelist in the Hologres Config scenario.
      • If a user queries data within the time range that is specified by the Effective From parameter in the whitelist, the query results are not masked.
      • You cannot set the values of all parameters for the whitelist to All.
      1. Configure basic information.
        ParameterDescription
        Whitelist NameThe name of the whitelist. The name must be 1 to 30 characters in length and cannot contain special characters.
        Sensitivity LevelThe sensitivity level of sensitive data. You can select a built-in sensitivity level or a custom sensitivity level from custom sensitivity levels created by all users. For more information about how to configure a data category and a sensitivity level for sensitive data, see Mange data sensitivity levels.
        Data CategoryThe data category of sensitive data. You can select a built-in data category or a custom data category from custom data categories created by all users.
        User GroupThe user group. You can select a user group that you added on the User Group Management page. You can select up to 50 user groups. After you add the selected user groups to the whitelist, you can use the Alibaba Cloud accounts or RAM users that belong to the selected user groups to view the original data that is not masked. For more information about how to add and manage a user group, see Create and manage user groups.
        Effective FromThe effective time range of the whitelist. If a user queries data beyond the time range that is specified in the whitelist, the query results are masked.
        Note If you set this parameter to Short, the effective time range is from the current time to the specified time. If a user queries data within this time range, the query results are not masked.
      2. Configure advanced settings.
        ParameterDescription
        Sensitive field typeThe sensitive field type. You can select an existing sensitive field type from the drop-down list on the right. The existing sensitive field types include the built-in sensitive field types and sensitive field types created by all users.
        Project ScopeThe compute engines and the projects that belong to the compute engines. If you do not configure this parameter, all compute engines and the projects that belong to the compute engines are selected.
        Note You can select only projects on which the current account has permissions.
        Table RangeThe range of tables. If you do not configure this parameter, all tables are selected.
        Note The wildcard (.*) that consists of a period (.) and an asterisk (*) can be used. For example, .*name indicates that tables whose names are suffixed with name are selected. private.* indicates that tables whose names are prefixed with private are selected. If you specify multiple tables, separate them with commas (,). The total length of the tables cannot exceed 100 characters.
        Field RangeThe range of fields. If you do not configure this parameter, all fields are selected.
        Note The wildcard (.*) that consists of a period (.) and an asterisk (*) can be used. For example, .*name indicates that fields whose names are suffixed with name are selected. private.* indicates that fields whose names are prefixed with private are selected. If you specify multiple fields, separate them with commas (,). The total length of the fields cannot exceed 100 characters.
    4. Click OK.
  5. After you create and configure the data masking rules, the sensitive data that you query on specific pages such as DataStudio, DataMap, and DataAnalysis is masked based on the rules. For more information, see Select a data masking scenario.

Create a data masking rule in the DataWorks Data Integration Config scenario

  1. On the Data Masking page, set the Masking Scene parameter to DataWorks Data Integration Config(dataworks_data_integration_desense_code).
  2. Create a data masking rule.
    1. On the Data Masking tab, click Create Masking Rule in the upper-right corner.
    2. In the Masking Rule dialog box, set the Sensitive data type, Name of the desensitization rule, Method, Domain, and Replacement character set parameters.
      Masking Rule dialog box
      1. Configure basic information.
        ParameterDescription
        Sensitive data type
        • By default, There are is selected from the drop-down list on the left. You can select an existing sensitive field type from the drop-down list on the right. The existing sensitive field types include the built-in sensitive field types and sensitive field types created by all users. You can select an existing sensitive field type based on your business requirements.
        • You can also select The new type from the drop-down list on the left. In the field on the right, enter a name for a new sensitive field type. The name must be 1 to 30 characters in length and can contain letters and digits.

          After you enter the name of a new sensitive field type, the system checks whether the name is used by existing sensitive field types, including built-in sensitive field types and sensitive field types created by all users. If the name has been used, the message The sensitive field type is repeated is displayed.

        Note The built-in sensitive field types are Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, Company, Nation, Constellation, Gender, and Nationality.
        Name of desensitization rule

        The system automatically enters the value of the Sensitive data type parameter in the field. You can also change the value. The name of the rule must be 1 to 30 characters in length and can contain letters and digits. If the name of the rule conflicts with that of an existing rule created by a user of the current tenant, the message The name of the rule already exists appears.

      2. You can set the Method parameter to Pseudonymisation, The hash, or Masking Out.
        • Pseudonymisation
          This method replaces the characters of a data record with an artificial pseudonym of the same characteristics. The data format of the pseudonym is the same as that of the original data record.
          • If you set the Sensitive data type parameter to a built-in sensitive field type, such as Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, or Company, you must configure the Domain parameter for your data records.

            Domain: You can select a digit from 0 to 9 as the security domain. Data masking rules vary depending on security domains. Therefore, different data masking results are generated when a data record resides in different security domains. For example, if the data record is a123 and the security domain is set to 0, the data masking result is b124. If the security domain is set to 1, the data masking result is c234. In a security domain, the same data masking result is returned for a data record at all times.

          • If you do not set the Sensitive data type parameter to a built-in sensitive field type, you must configure the Replacement character set parameter for your data records.

            Replacement character set: You can separate multiple characters in a character set with commas (,). Each character can be a letter or a digit. If a character in your data records is included in this character set, the character is replaced with another character of the same type. For example, if a data record contains only digits from 0 to 3 and letters from a to d, the data masking result also contains only digits from 0 to 3 and letters from a to d. If the character is not included in this character set, it is not replaced.

        • The hash

          This method encrypts a data record to generate a hash value of a fixed length. If you select this method, you must configure the Domain parameter.

          Domain: You can select a digit from 0 to 9 as the security domain. Data masking rules vary depending on security domains. Therefore, different data masking results are generated when a data record resides in different security domains. For example, if the data record is a123 and the security domain is set to 0, the data masking result is b124. If the security domain is set to 1, the data masking result is c234. In a security domain, the same data masking result is returned for a data record at all times.

        • Masking Out
          This method replaces each of the characters at specific positions of a data record with an asterisk (*).
          • Recommended: You can select Only show first and last character, Show first three and last two characters, and Show first three and last four characters from the Recommended drop-down list.
          • Custom: You can flexibly specify whether to mask the specified number of characters of a data record from left to right. You can add up to 10 sections, and one of the sections must be The remaining digits. Masking Out
            No.Description
            1You can select digits or The remaining digits.
            2You can enter an integer from 1 to 100.
            3You can select Desensitization or No desensitization.
            The following figure shows how to mask the first three characters and leave the remaining characters intact. Masking Out 1
            The following figure shows how to mask the last three characters and leave the remaining characters intact. Masking Out 2
      3. Verify the configuration of the data masking rule. You can enter sample data in the Sample data field and click Desensitization verification. The sample data must be 0 to 100 characters in length. The data masking result is displayed in the Desensitization effect field.
    3. Click OK.
    4. The rule that you create appears on the Data Masking tab. In the Status column, you can set the status of the rule to Active or Inactive.
      In the Actions column, you can click the Delete, Change, or View Details icon to delete the rule, edit the rule, or view the details of the rule.
      Note
      • You cannot delete or edit a rule in the Active state. To delete or edit a rule, you must set the status to Inactive and check whether the rule is configured for a node. You must also contact the security administrator for further confirmation.
      • When the status is set to Inactive, you can modify the Method parameter of the rule, but you cannot modify the Sensitive field type and Masking Rule parameters.
      • After you modify the parameters, set the status of the rule to Active. Then, the data of the node for which the rule is configured can be masked based on the rule.
  3. After you create a data masking rule, you can add the rule when you create and configure a real-time synchronization node for data in a single table. For more information, see Configure data de-identification.

Appendix: Use the masking_v2 method to perform underlying data masking on MaxCompute projects

The dsg_fin_demo project is the project on which you want to perform underlying data masking.

  1. Submit a request for adding the IP addresses or endpoints of Data Security Guard and Object Storage Service (OSS) to the whitelist of the dsg_fin_demo project.
    Fill out a request form by using an Alibaba Cloud account.

    If external access to the IP address or endpoint of the project is not restricted, Data Security Guard and OSS can access the dsg_fin_demo project after the request is approved. The request processing period does not exceed three business days.

    Request content:
    Project name (the name of the project whose data you want to mask): dsg_fin_demo
    Log address:
    Request reason: Add the IP addresses or endpoints of Data Security Guard and OSS to the whitelist of the dsg_fin_demo project to enable the created function to access the IP address or endpoint of the project when the function is run.
    Region: China (Shanghai)
    IP addresses or endpoints that want to access the project: dsg-cn-shanghai.data.aliyun.com, dsg-oss-dic-ori.oss-cn-shanghai.aliyuncs.com
    Ports: 80 and 443
    The IP address or endpoint of the project varies based on regions. If the IP address or endpoint of the project that Data Security Guard and OSS want to access is not included in the following regions, you can submit a ticket to Data Security Guard. Ports 80 and 443 are used.
    China (Shanghai): dsg-cn-shanghai.data.aliyun.com, dsg-oss-dic-ori.oss-cn-shanghai.aliyuncs.com
    China (Hangzhou): dsg-cn-hangzhou.data.aliyun.com, dsg-oss-dic-ori-hz.oss-cn-hangzhou.aliyuncs.com
    China (Beijing): dsg-cn-beijing.data.aliyun.com, dsg-oss-dic-ori.oss-cn-beijing.aliyuncs.com
    China (Chengdu): dsg-cn-chengdu.data.aliyun.com, dsg-oss-dic-ori-cd.oss-cn-chengdu.aliyuncs.com
    China (Shenzhen): dsg-cn-shenzhen.data.aliyun.com, dsg-oss-dic-ori-sz.oss-cn-shenzhen.aliyuncs.com
    China North 2 Ali Gov: dsg-cn-north-2-gov-1.data.aliyun.com, dsg-oss-dic-ori-north-2-gov-1.oss-cn-north-2-gov-1-internal.aliyuncs.com
    China East 2 Finance: dsg-cn-shanghai-finance-1.data.aliyun.com, dsg-oss-dic-ori-sh-fin-1.oss-cn-shanghai.aliyuncs.com
    China (Hong Kong): dsg-cn-hongkong.data.aliyun.com, dsg-oss-hongkong.oss-cn-hongkong.aliyuncs.com
    Singapore: dsg-ap-southeast-1.data.aliyun.com, dsg-oss-ap-southeast-1.oss-ap-southeast-1.aliyuncs.com
    US (Silicon Valley): dsg-us-west-1.data.aliyun.com, dsg-oss-us-west-1.oss-us-west-1.aliyuncs.com
    Malaysia (Kuala Lumpur): dsg-ap-southeast-3.data.aliyun.com, dsg-oss-ap-malaysia.oss-ap-southeast-3.aliyuncs.com
    Germany (Frankfurt): dsg-eu-central-1.data.aliyun.com, dsg-oss-eu-central-1.oss-eu-central-1-internal.aliyuncs.com
  2. You can submit a ticket to Data Security Guard for performing underlying data masking on the desired MaxCompute project.
  3. Go to the Data Masking tab in Data Security Guard to select the MaxCompute projects whose data you want to mask.
    Access the Data Masking page of a workspace in which you want to run the masking_v2 function within a tenant, select the MaxCompute Config data masking scenario, and then select the MaxCompute projects whose data you want to mask.
    Note If the MaxCompute project whose data you want to mask is not added to the Masked Projects section, an error occurs when the masking_v2 function is run.
  4. Execute SQL statements to check whether data masking is successful.
    1. Turn off Mask Data in Page Query Results on the Security Settings and Others tab and execute SQL statements.
      Query statement: (The China (Shanghai) region is used in the example.)
      set odps.output.field.formatter={"name":"aegis:masking_v2","param":["alias","index"]};
      set odps.isolation.session.enable=true;
      set odps.internet.access.list=dsg-cn-shanghai.data.aliyun.com:80,dsg-cn-shanghai.data.aliyun.com:443,dsg-oss-dic-ori.oss-cn-shanghai.aliyuncs.com:80,dsg-cn-shanghai.data.aliyun.com:80,dsg-cn-shanghai.data.aliyun.com:443,dsg-oss-dic-ori.oss-cn-shanghai.aliyuncs.com:443;
      select * from table;
    2. View the execution result on the DataStudio page.
    3. View the execution result in odpscmd.
      Query resultsContent in the configuration file for odpscmd:
      project_name=dsg_demo_bj_new
      access_id=xxxx
      access_key=yyy
      end_point=http://service.odps.aliyun.com/api
      # this endpoint is for office environment
      #end_point=http://service-corp.odps.aliyun-inc.com/api
      # this endpoint is for production environment
      #end_point=http://service.odps.aliyun-inc.com/api
      # this url is for odpscmd update
      update_url=http://odps.alibaba-inc.com/official_downloads
      # download sql results by instance tunnel
      use_instance_tunnel=true
      # the max records when download sql results by instance tunnel
      instance_tunnel_max_record=10000
      
      # use set.<key>=<value> to set flags when console launched
      # e.g. set.odps.sql.select.output.format=csv
      
      Note: The addition of the IP addresses or endpoints of Data Security Guard and OSS to the configuration file of the dsg_fin_demo project in odpscmd is different from the addition of the IP address or endpoint of the dsg_fin_demo project to the whitelists of Data Security Guard and OSS. You must run the set.odps command in odpscmd to achieve the desired effect.
      
      set.odps.internet.access.list=dsg-cn-shanghai.data.aliyun.com:80,dsg-oss-dic-ori.oss-cn-shanghai.aliyuncs.com:80;
               
  5. Disable underlying data masking on MaxCompute projects.
    Execute the following statement to disable underlying data masking that is performed by using the masking_v2 method:
    set odps.output.field.formatter=;
    select * from table;
    Move the projects whose data you do not want to mask to the Unmasked Projects section. This way, underlying data masking will not be performed on the projects.