All Products
Search
Document Center

DataWorks:Configure data de-identification

Last Updated:Aug 01, 2023

The data de-identification feature can de-identify sensitive data in a single table that is synchronized in real time and store the de-identified data to a specified database.

Prerequisites

A reader is configured. For more information, see Data source types that support real-time synchronization.

Procedure

  1. Go to the DataStudio page.

    1. Log on to the DataWorks console.

    2. In the left-side navigation pane, click Workspaces.

    3. In the top navigation bar, select the region in which the workspace that you want to manage resides. On the Workspaces page, find the workspace and click Shortcuts > Data Development in the Actions column.

  2. In the Scheduled Workflow pane, move the pointer over the Create a table icon and choose Create Node > Data Integration > Real-time synchronization.

    Alternatively, right-click the required workflow, and then choose Create Node > Data Integration > Real-time synchronizationReal-time synchronization.

  3. In the Create Node dialog box, set the Sync Method parameter to End-to-end ETL and configure the Name and Path parameters.

    Important

    The node name cannot exceed 128 characters in length and can contain letters, digits, underscores (_), and periods (.).

  4. Click Confirm.

  5. On the configuration tab of the real-time sync node, drag Data Masking under Conversion to the canvas on the right.
  6. Click the Data Masking node. In the configuration panel that appears, set the parameters.
    Data filtering
    1. Create a data de-identification rule. Click Create Data Masking Rule. In the Create Data Masking Rule dialog box that appears, set the Sensitive Data Type, Rule Name, Data Masking Method, Security Domain, and Character Set for Replacement parameters.
      1. Create a data de-identification rule.Masking Rule dialog box
        1. Set basic parameters.
          ParameterDescription
          Sensitive Data Type
          • By default, Existing Data Type is selected from the drop-down list on the left. You can select an existing sensitive data type from the drop-down list on the right. The existing sensitive data types include built-in sensitive data types and sensitive data types created by all users.
          • You can also select New Data Type from the drop-down list on the left. In the field on the right, enter the name of the sensitive data type. The name must be 1 to 30 characters in length and can contain letters and digits.

            After you enter the name of a new data type, the system checks whether the name is used by existing sensitive data types, including built-in sensitive data types and sensitive data types created by all users. If the name has been used, the message The specified sensitive field type already exists. is displayed.

          Note The built-in sensitive data types are Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, Company, Nation, Constellation, Gender, and Nationality.
          Rule Name

          By default, the system populates this field with the value of the Sensitive Data Type parameter. You can change the rule name. The name must be 1 to 30 characters in length and can contain letters and digits. If you enter a name that has been used by an existing rule, the message The specified rule name already exists. is displayed.

        2. Configure the data de-identification method. DataWorks supports the following data de-identification methods: pseudonymisation, hashing, and redaction.
          • Pseudonymisation
            This method replaces the characters of a data record with an artificial pseudonym of the same data type. The format of the pseudonym is the same as that of the original data record.
            • If you set the Sensitive Data Type parameter to a built-in sensitive data type, such as Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, or Company, you must set the Security Domain parameter for your data records.

              Security Domain: You can select a digit from 0 to 9 from the Security Domain drop-down list. Data de-identification policies vary with the security domain. Different data de-identification results are returned for the same data record in different security domains. For example, if the data record is a123 and the security domain is set to 0, the data de-identification result is b124. If the security domain is set to 1, the data de-identification result is c234. In a security domain, the same data de-identification result is returned for a data record at all times.

            • If you set the Sensitive Data Type parameter to a custom sensitive data type, you must set the Character Set for Replacement parameter for your data records.

              Character Set for Replacement: Enter the characters to be replaced. Separate multiple characters with commas (,). The characters can be letters or digits. If a data record contains a character that is specified in this field, the character is replaced with another character of the same type. For example, if a data record contains digits from 0 to 3 and letters from a to d, the data de-identification result also contains only characters within these ranges. If a character is not included in this field, the character is not replaced.

          • Hashing

            This method encrypts a data record to generate a hash value of a fixed length. If you select this method, you must set the Security Domain parameter.

            Security Domain: You can select a digit from 0 to 9 from the Security Domain drop-down list. Data de-identification policies vary with the security domain. Different data de-identification results are returned for the same data record in different security domains. For example, if the data record is a123 and the security domain is set to 0, the data de-identification result is b124. If the security domain is set to 1, the data de-identification result is c234. In a security domain, the same data de-identification result is returned for a data record at all times.

          • Redaction
            This method replaces each of the characters at specific positions of a data record with an asterisk (*).
            • Recommendation Method: If you select Recommendation Method for the Redaction Mode parameter, select Only show first and last character, Show first three and last two characters, or Show first three and last four characters from the Recommendation Method drop-down list. By default, Only show first and last character is selected.
            • Custom: You can flexibly specify whether to de-identify the specified number of characters at the first, middle, or last part of a data record. You can add up to 10 segments. You must add at least one Remaining Digits segment. Masking Out
              IconDescription
              You can select Digits or Remaining Digits.
              You can enter an integer from 1 to 100.
              You can select Mask or Do Not Mask.
              The following figure shows how to de-identify the first three digits and leave the remaining digits intact. Masking Out 1
              The following figure shows how to de-identify the last three digits and leave the remaining digits intact. Masking Out 2
        3. Verify the data de-identification rule. You can enter sample data in the Sample Data field. The sample data can be 0 to 100 characters in length. Click Test. The data de-identification result is displayed in the Data Masking Effect field.
      2. Click OK. In the configuration panel, you can select this newly added rule from the Data Masking Rule drop-down list for a field to be de-identified. The newly created rule is also synchronized to Data Security Guard.
    2. Click Add condition to add a row. In this row, you can configure the data de-identification rule for another field.
      • In the Field column, select an output field of the parent node of the data de-identification node from the drop-down list.
      • In the Data Masking Rule column, select a rule from the drop-down list. The rules that can be selected are those that have taken effect in Data Security Guard.
      • In the Actions column of a field, click the Edit icon.
        • If the data de-identification rule for this field is created by you, you can modify the rule in the Edit Data Masking Rule dialog box that appears. You can enter sample data to verify the rule.
        • If the data de-identification rule is not created by you, you can check the configuration details of the rule in the dialog box. You can also enter sample data to verify the rule.
      • In the Actions column of a field, click the Delete icon to delete the field.
    3. In the Output field section, select the fields to be used as output fields from the fields of the original table.