All Products
Search
Document Center

DataWorks:Identify sensitive data by using sample libraries

Last Updated:Aug 16, 2023

DataWorks can generate sample libraries based on the sample files that you provide. You can associate a sample library with a data identification rule to identify data. If the data to be identified contains the data in the sample library, the data to be identified hits the data identification rule. You can use sample libraries to identify enumerated values, such as employee names and user addresses. This topic describes how to create and manage sample libraries.

Limits

You can upload only UTF-8-encoded files in the TXT format as sample files to DataWorks. The size of a sample file cannot exceed 500 KB. Each data entry in a sample file occupies a line.

Note

A data identification rule can be used to identify only one type of data. We recommend that you store the data of the same type in a sample library. To identify multiple types of data, you must configure multiple sample libraries. For example, if you want to identify employee names and home addresses, you must configure two sample libraries. One is used to identify employee names and the other is used to identify home addresses.

Create a sample library

  1. Go to the Data Security Guard page.

    1. Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > DataStudio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

    2. Click the More icon icon in the upper-left corner and choose All Products > Data Governance > Data Security Guard.

    3. Click Try now to go to the Data Security Guard page.

  2. In the left-side navigation pane, choose Rule Change > Sensitive data identification.

  3. Click the Sample Management tab.

  4. On the Sample Management tab, click Add Sample. In the dialog box that appears, specify a name for the sample library and upload a sample file.

    Note
    • You can upload only UTF-8-encoded files in the TXT format as sample files to DataWorks. The size of a sample file cannot exceed 500 KB. Each data entry in a sample file occupies a line.

      Note

      A data identification rule can be used to identify only one type of data. We recommend that you store the data of the same type in a sample library. To identify multiple types of data, you must configure multiple sample libraries. For example, if you want to identify employee names and home addresses, you must configure two sample libraries. One is used to identify employee names and the other is used to identify home addresses.

    • You can upload multiple sample files to a sample library in DataWorks.

    Add Sample dialog box
  5. Click Save. The sample library is created.

After you create a sample library, you can associate the sample library with a data identification rule. If the data to be identified contains the data in the sample library, the data to be identified hits the data identification rule. For more information about how to associate a sample library with a data identification rule, see Identify sensitive data.

Manage sample libraries

On the Sample Management tab, you can perform the following operations to manage existing sample libraries.Manage sample libraries

  • View the sample libraries.

    You can view the number of samples in and the data identification rules associated with each existing sample library. To view the details of a sample library, find the sample library and click the View icon in the Actions column.

  • Edit a sample library.

    Find the sample library that you want to edit and click the Edit icon in the Actions column. You can upload a new sample file or replace existing sample files.

  • Delete a sample library.

    To delete a sample library, find the sample library and click the Delete a sample library icon in the Actions column.

    Note

    If the sample library is associated with a data identification rule, you cannot delete the sample library. You can view the data identification rule that is associated with the sample library on the Sample Management tab. Before you delete the sample library, you must disassociate the sample library from the data identification rule on the configuration page of the data identification rule. For more information about how to configure a data identification rule, see Identify sensitive data.