All Products
Search
Document Center

DataWorks:Identify sensitive data using sample libraries

Last Updated:Dec 25, 2025

DataWorks can generate sample libraries from the sample files that you provide. You can then configure a sample library as a sensitive data identification rule. If target data contains entries from the sample library, it is identified as a match. This feature is typically used to identify data that can be enumerated, such as employee names and user addresses. This topic describes how to create and manage sample libraries.

Limits

You can upload only .txt text files that are in UTF-8 format and are no larger than 500 KB. Each data entry in the sample file must be on a separate line.

Note

A sensitive data identification rule can identify only one type of data. Therefore, each sample library must contain only one type of data. To identify multiple types of data, you must configure a separate sample library for each type. For example, to identify employee names and home addresses, you must configure one sample library for names and another for home addresses.

Create a sample library

  1. Go to Data Security Guard.

    1. Go to the DataStudio page.

      Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

    2. Click the 图标 icon in the upper-left corner. Then, choose All Products > Data Governance > Data Security Guard. On the page that appears, click Try Now to go to the Data Security Guard page.

      Note
      • If your Alibaba Cloud account is granted the required permissions, you can directly access the homepage of Data Security Guard.

      • If your Alibaba Cloud account is not granted the required permissions, you are redirected to the authorization page of Data Security Guard. You can use the features of Data Security Guard only after your Alibaba Cloud account is granted the required permissions.

  2. In the navigation pane on the left, choose Rule Configuration > Sensitive Data Identification to open the Sensitive Data Identification page.

  3. Create a sample library.

    1. On the Sample Data Management tab, click Create Sample Library.

    2. In the Create Sample Library dialog box, specify a name for the sample library and upload a sample file.

      You can upload only .txt text files that are in UTF-8 format and are no larger than 500 KB. Each data entry in the sample file must be on a separate line.

      Note

      A sensitive data identification rule can identify only one type of data. Therefore, each sample library must contain only one type of data. To identify multiple types of data, you must configure a separate sample library for each type. For example, to identify employee names and home addresses, you must configure one sample library for names and another for home addresses.

  4. Click Save to create the sample library.

After you create a sample library, you can configure it as a sensitive data identification rule. This rule matches target data that contains data from the sample library. For more information about using a sample library in a sensitive data identification rule, see Configure sensitive data identification rules and run identification tasks.

Manage sample libraries

On the Sample Data Management page, you can also perform the following operations on existing sample libraries:

  • You can view the number of samples and the associated sensitive data identification rules for each sample library. To view the details for a sample library, find the sample library and click the View icon in the Actions column.

  • To modify a sample library file, click the Modify icon in the Actions column of the target sample library to replace the existing sample file.

  • To delete a sample library, click the Delete sample library icon in the Actions column for that library.

    Note

    You cannot delete a sample library if it is referenced by a sensitive data identification rule. You can view the associated sensitive data identification rule in the sample library list. Then, go to the configuration page for the rule and remove the reference to the sample library. After the reference is removed, you can delete the library. For more information about configuring a sensitive data identification rule, see Configure sensitive data identification rules and run identification tasks.