All Products
Search
Document Center

DataWorks:Create a data masking rule

Last Updated:Jan 05, 2024

DataWorks supports a variety of data masking scenarios. You can select a scenario and create a data masking rule based on your business requirements. This topic describes how to create a data masking rule and how DataWorks masks query results in your workspace based on data masking rules.

Background information

Data masking scenarios in DataWorks can be categorized into static data masking scenarios and dynamic data masking scenarios.

  • Dynamic data masking scenarios include masking of displayed data in DataStudio and Data Map, masking of displayed data in DataAnalysis, data masking at the MaxCompute compute engine layer, and data masking at the Hologres compute engine layer.

  • Static data masking scenarios refer to static data masking in Data Integration.

For more information about various data masking scenarios, see Descriptions of data masking scenarios.

Prerequisites

  • This prerequisite is required in dynamic data masking scenarios and optional in other cases. Sensitive data identification rules must be configured based on your business requirements. This allows you to associate sensitive field types that you specified in the sensitive data identification rules with data masking rules when you create the data masking rules. For more information, see Identify sensitive data.

  • This prerequisite is required in dynamic data masking scenarios and optional in other cases. Specific users must be added to a whitelist as user groups in advance. You need to perform this operation if you want specific users to have access to sensitive data on which data masking rules take effect within a specified period of time. For more information, see Create and manage user groups.

  • This prerequisite is required for data masking at the MaxCompute compute engine layer and optional in other cases. The IP address or endpoint of Data Security Guard must be added to the whitelist of a MaxCompute project on which you want to perform data masking at the compute engine layer. After you add the IP address or endpoint of Data Security Guard to the whitelist of the MaxCompute project, you can call data masking functions to mask sensitive data in query results that you obtain by using methods such as the related DataWorks service, MaxCompute client (odpscmd), and MaxCompute LogView based on data masking rules. For more information, see Sample practice for performing underlying data masking on MaxCompute projects.

Permission management

  • Manage a data masking rule, such as creating, modifying, and deleting a data masking rule:

    • The tenant administrator and tenant security administrator can perform management operations on a data masking rule in all data masking scenarios.

    • The workspace administrator and workspace security administrator can perform management operations on a data masking rule only in data masking scenarios on which they have the required permissions.

  • Manage a whitelist for a data masking rule, such as creating, modifying, and deleting a whitelist:

    • The tenant administrator and tenant security administrator can perform management operations on a whitelist in all data masking scenarios.

    • The workspace administrator and workspace security administrator can perform management operations on a whitelist only in data masking scenarios on which they have the required permissions.

You must be assigned the required role to perform the preceding operations. For more information about authorization, see Manage permissions on workspace-level services and Manage permissions on global-level services.

Entry for configuring a data masking rule

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > DataStudio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

  2. Click the 图标 icon in the upper-left corner, choose All Products > Data Governance > Data Security Guard, and then click Try now to go to the Data Security Guard page.

    Note
    • If your Alibaba Cloud account is granted the required permissions, you can directly access the homepage of Data Security Guard.

    • If your Alibaba Cloud account is not granted the required permissions, you are redirected to the authorization page of Data Security Guard. You can use the features of Data Security Guard only after your Alibaba Cloud account is granted the required permissions.

  3. In the left-side navigation pane, choose Rule Change > Data Masking. The Data Masking page appears.

  4. In the Masking Scene section of the Data desensitization management page, select a data masking scenario and click + Desensitization rules in the upper-right corner of the page to create a data masking rule based on the data masking scenario.

Create a dynamic data masking rule in the scenario of masking of displayed data in DataStudio and Data Map

  1. Select a data masking scenario.

    In the Masking Scene section of the Data desensitization management page, click Default scene below Data development / Data map display desensitization and click + Desensitization rules in the upper-right corner of the page.

  2. Create a data masking rule.

    1. In the Create new desensitization rule dialog box, configure the parameters.新建脱敏规则

      1. Select a sensitive field type and specify the rule name.

        Parameter

        Description

        Sensitive field type

        The sensitive field type based on which the data masking rule masks sensitive data.

        • You can select system built-in sensitive field types or the sensitive field types that you added on the Data Recognition Rules tab of the Sensitive data identification page. For more information about sensitive field types that you can manually add, see Configure sensitive data identification rules.

        • If you have created data masking rules in the same data masking scenario, DataWorks filters out the sensitive field types that you selected for the data masking rules. This prevents different data masking rules from taking effect for the same sensitive field types in the same data masking scenario.

        Desensitization rule name

        The name of the data masking rule. The name of the sensitive field type is used as the name of the data masking rule by default. You can also specify a name based on your business requirements. The rule name must be unique.

      2. Configure data masking scenarios.

        Select data masking scenarios to which the data masking rule applies. The data masking scenario that you select in Step 1 is used as a valid value of the Desensitization scene parameter by default. You can also change the used value or add more valid values.

      3. Configure the data masking method.

        DataWorks supports data masking methods such as original format-based encryption, masking out, hash-based encryption, character replacement, range change, rounding, and leave empty.

        Pseudonym

        This method replaces the characters of a data record with an artificial pseudonym of the same characteristics. The data format of the pseudonym is the same as that of the original data record. The following table describes the parameters that you need to configure if you select this data masking method.

        Parameter

        Description

        Data watermark

        Watermarks allow you to trace the source of data. If your data is leaked, you can trace the potential source from which the data leak occurred based on the data watermark. You can turn on or off Data watermark based on your business requirements.

        Note

        Only DataWorks Enterprise Edition or a more advanced edition supports the data watermark feature.

        Desensitization characteristic value

        Data masking rules vary based on characteristic values. Different data masking results are generated when different characteristic values are used for the same data that you want to mask. If the characteristic value remains unchanged, the same data masking result is returned for a data record at all times.

        For example, a data record is a123:

        • If the Desensitization characteristic value parameter is set to 0, the data masking result is b124.

        • If the Desensitization characteristic value parameter is set to 1, the data masking result is c234.

        By default, the Desensitization characteristic value parameter is set to 5. You can select a digit from 0 to 9 as the characteristic value.

        Optional. Substitution character set

        If you do not set the Sensitive field type parameter to a built-in sensitive field type, you must configure the Substitution character set parameter for your data records. If a character in your data records is included in the character set, the character is replaced with another character of the same type.

        For example, if a data record contains only digits from 0 to 3 and letters from a to d, the data masking result contains only digits from 0 to 3 and letters from a to d.

        Note

        If the character that you want to mask is not included in a character set, it is not replaced.

        Masking out

        This method replaces each of the characters at specific positions of a data record with an asterisk (*). If you use this data masking method, you must configure the Recommended method or Custom parameter.

        Parameter

        Description

        Recommended method

        The recommended methods. You can configure the parameter based on the field that you want to mask.

        Valid values: Only show first and last character, Show first three and last two characters, and Show first three and last four characters. You can select a method from the drop-down list based on your business requirements.

        Customize

        You can flexibly specify whether to mask the specified number of characters of a data record from left to right. You can add up to 10 segments, and The remaining digits must be specified for one of the segments.

        The following figure shows how to mask the first three characters and leave the remaining characters intact.例子

        HASH

        The following table describes the parameters that you need to configure if you select this data masking method.

        Parameter

        Description

        Data watermark

        Watermarks allow you to trace the source of data. If your data is leaked, you can trace the potential source from which the data leak occurred based on the data watermark. You can turn on or off Data watermark based on your business requirements.

        Note

        Only DataWorks Enterprise Edition or a more advanced edition supports the data watermark feature.

        Encryption Algorithm

        The encryption algorithm. Valid values: MD5, SHA256, SHA512, and SM3.

        Add salt value

        The salt value for each encryption algorithm. By default, 5 is selected. You can select a digit from 0 to 9 as the salt value.

        Note

        A salt value is the specific string that you insert. In cryptography, you can insert a specific string to a fixed position of a password to generate a hash value that is different from that of the original password. This process is called salting.

        Characters to replace

        This method replaces the characters at the specified positions based on the replacement method you select. The following table describes the parameters that you need to configure if you select this data masking method.

        Parameter

        Description

        Replacement position

        You can select Replace all, Replace the first three digits, and Four digits after replacement from the drop-down list. You can also select Custom from the drop-down list to configure a custom replacement position.

        If you select Custom, you can customize segments and configure the replacement method for each segment. You can add up to 10 segments, and The remaining digits must be specified for one of the segments.自定义

        Replace the way

        The replacement method. Valid values: Random replacement, Sample substitution, or Fixed value substitution.

        • Random replacement: This method randomly replaces the characters at the specific positions. The number of characters remains unchanged before and after the replacement.

        • Sample substitution: You must specify a sample library first. After you select the sample library, this method replaces the characters at the specific positions with the data in the specified sample library.

        • Fixed value substitution: You must enter a replacement value. The value must be 1 to 100 characters in length, and cannot be a string that contains only spaces. After you set the value, this method replaces the characters at the specific positions with the replacement value.

        Range transform

        This method is applicable to only the masking of numeric data. This method masks data within a specified value range to a fixed value. You can add 1 to 10 value ranges.

        Parameter

        Description

        Original value range [m,n)

        The value range of the original data record. The valid value is a numeric value that is greater than or equal to 0. A maximum of two decimal places is supported.

        Value after desensitization

        The value that is used to replace the data record that you want to mask. The valid value is a numeric value that is greater than or equal to 0. A maximum of two decimal places is supported.

        integer

        Parameter

        Description

        Original data type

        Only numeric data is supported.

        Keep decimal places

        You can select an integer from 0 to 5 as the valid value. The remaining parts are rounded. For example, if the original value is 3.1415 and the value is rounded down to two decimal places, the data masking result is 3.14.

        empty

        This method replaces the original data record with an empty string.

    2. Verify the data masking result.

      You can enter sample data in the Sample data field and click Verify. The sample data must be 0 to 100 characters in length. The data masking result is displayed in the Desensitization effect field.

    3. Click Save or Save and take effect. The rule configuration is complete.

You can perform the following operations after you configure the data masking rule:

  • If you want to perform dynamic data masking, you can specify a whitelist for the rule. If you add specific users to the whitelist, the users can have access to the sensitive data on which the data masking rule takes effect within a subsequent specified period of time. For information about how to add users to a whitelist, see the Configure a whitelist for the data masking rule (supported only for dynamic data masking scenarios) section in this topic.

  • By default, the data masking rule is inactive after you create it. After you set its status to Active, the rule can be used in data masking scenarios. For information about how to set the rule status, see Configure the rule status.

Create a static data masking rule in the scenario of static data masking in Data Integration

  1. In the Masking Scene section of the Data desensitization management page, click Default scene below Static desensitization of data integration and click + Desensitization rules in the upper-right corner of the page.

  2. Create a data masking rule.

    1. In the Masking Rule dialog box, configure the parameters.

      脱敏规则

      1. Select a sensitive field type and specify the rule name.

        Parameter

        Description

        Sensitive data type

        • There are: Select an existing sensitive field type from the drop-down list on the right based on your business requirements. The existing sensitive field types include the built-in sensitive field types and custom sensitive field types.

        • The new type: Enter a sensitive field type name. The name must be unique.

        Note

        The built-in sensitive field types are Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, Company, Nation, Constellation, Gender, and Nationality.

        Name of the desensitization rule

        The name of the data masking rule. The name of the sensitive field type is used as the name of the data masking rule by default. You can also specify a name based on your business requirements. The rule name must be unique.

      2. Configure the data masking method.

        You can set the Method parameter to Pseudonym, The hash, or Masking Out based on your business requirements.

        Pseudonym

        This method replaces the characters of a data record with an artificial pseudonym of the same characteristics. The data format of the pseudonym is the same as that of the original data record.

        • If you set the Sensitive data type parameter to a built-in sensitive field type, such as Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, or Company, you must configure the Domain parameter for your data records.

          Domain: You can select a digit from 0 to 9 as the security domain. Data masking rules vary depending on security domains. Different data masking results are generated when the same data record that you want to mask resides in different security domains. For example, if the data record is a123 and the security domain is set to 0, the data masking result is b124. If the security domain is set to 1, the data masking result is c234. In a security domain, the same data masking result is returned for a data record at all times.

        • If you do not set the Sensitive data type parameter to a built-in sensitive field type, you must configure the Replacement character set parameter for your data records.

          Replacement character set: You can separate multiple characters in a character set with commas (,). Each character can be a letter or a digit. If a character in your data records is included in this character set, the character is replaced with another character of the same type. For example, if a data record contains only digits from 0 to 3 and letters from a to d, the data masking result contains only digits from 0 to 3 and letters from a to d. If the character is not included in this character set, it is not replaced.

        The hash

        This method encrypts a data record to generate a hash value of a fixed length. If you select this method, you must configure the Domain parameter.

        Domain: You can select a digit from 0 to 9 as the security domain. Data masking rules vary based on security domains. Different data masking results are generated when the same data record that you want to mask resides in different security domains. In a security domain, the same data masking result is returned for a data record at all times.

        For example, a data record is a123:

        • If the security domain is set to 0, the data masking result is b124.

        • If the security domain is set to 1, the data masking result is c234.

        Masking Out

        This method replaces each of the characters at specific positions of a data record with an asterisk (*).

        • Recommended: You can select Only show first and last character, Show first three and last two characters, and Show first three and last four characters from the Recommended drop-down list.

        • Custom: You can flexibly specify whether to mask the specified number of characters of a data record from left to right. You can add up to 10 segments, and The remaining digits must be specified for one of the segments.

          • Example 1: Mask the first three characters and leave the remaining characters intact.掩盖1

          • Example 2: Mask the last three characters and leave the remaining characters intact.掩盖2

    2. Verify the data masking result.

      You can enter sample data in the Sample data field and click Test. The sample data must be 0 to 100 characters in length. The data masking result is displayed in the Effect of desensitization field.

    3. Click OK. The rule configuration is complete.

You can perform the following operations after you configure the data masking rule:

  • By default, the data masking rule is inactive after you create it. After you set its status to Active, the rule can be used in data masking scenarios. For information about how to set the rule status, see Configure the rule status.

  • After you create a data masking rule for the DataWorks Data Integration Config scenario, you can use the rule when you create a task to synchronize data from a single table in real time. For more information, see Configure data masking.

Configure a whitelist for the data masking rule (supported only for dynamic data masking scenarios)

For a rule that you configure in dynamic data masking scenarios, you can specify a whitelist for the rule. If you add specific users to the whitelist, the users can have access to the sensitive data on which the data masking rule takes effect within a subsequent specified period of time.

Note

Before you create a whitelist, you must add specific users to the whitelist as a user group. For information about how to configure a user group, see Create and manage user groups.

You can perform the following steps to create a whitelist for a rule:

  1. On the Data desensitization management page, click the Whitelist configuration tab.

  2. Click + Whitelist in the upper-right corner.

  3. In the New whitelist panel, configure the parameters.

    Note
    • The Whitelist configuration tab is not available in the scenarios of data masking at the Hologres compute engine layer and static data masking in Data Integration.

    • If a user queries data within the time range that is specified by the Effective time parameter in the whitelist, the query results are not masked.

    配置白名单

    The following table describes the parameters that you need to configure.

    Parameter

    Description

    Sensitive field type

    You can select only sensitive field types in the selected data masking scenario.

    User group scope

    You can select a user group that you configured. You can select up to 50 user groups. After you add the selected user groups to the whitelist, you can use the Alibaba Cloud accounts or RAM users that belong to the selected user groups to view the original data that is not masked. For information about how to configure a user group, see Create and manage user groups.

    Effective time

    The effective time range of the whitelist. If a user queries data beyond the time range that is specified in the whitelist, the query results are masked.

    Note

    If you set this parameter to Short, the effective time range is from the current time to the specified time. If a user queries data within this time range, the query results are not masked.

  4. Click Save to complete the whitelist configurations.

Configure the rule status

On the Data masking rules tab, find the desired rule and toggle the switch in the Status column. You can set the rule status to Effective or Invalid.

After the rule is configured, you can perform operations on the rule, such as modifying, deleting, and querying the details of the rule.

Note
  • You cannot delete or modify a rule in the Effective state. To delete or modify a rule, you must set the rule status to Invalid and then check whether the rule is configured for a task. You must contact the security administrator for further confirmation.

  • After the rule status is set to Invalid, you can modify the data masking method for the rule, but you cannot modify the sensitive field type or name for the rule.

  • After you modify the parameters, set the status of the rule to Active. Then, the data of the task for which the rule is configured can be masked based on the rule.

Example of using the data masking rule