All Products
Search
Document Center

DataWorks:Identify sensitive data by using sample libraries

Last Updated:Aug 09, 2024

DataWorks can generate sample libraries based on the sample files that you provide. You can associate a sample library with a sensitive data identification rule to identify data. If the data to be identified contains data in the sample library, the data to be identified matches the sensitive data identification rule. You can use sample libraries to identify enumerated values, such as employee names and user addresses. This topic describes how to create and manage sample libraries.

Limits

You can upload only UTF-8-encoded files in the TXT format as sample files to DataWorks. The size of a sample file cannot exceed 500 KB. Each data entry in a sample file occupies one line.

Note

A sensitive data identification rule can be used to identify only one type of data. We recommend that you store the data of the same type in a sample library. To identify multiple types of data, you must configure multiple sample libraries. For example, if you want to identify employee names and home addresses, you must configure two sample libraries. One is used to identify employee names and the other is used to identify home addresses.

Create a sample library

  1. Go to the Data Security Guard page.

    1. Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, choose Data Modeling and Development > DataStudio in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

    2. Click the 图标 icon in the upper-left corner, choose All Products > Data Governance > Data Security Guard, and then click Try now to go to the Data Security Guard page.

      Note
      • If your Alibaba Cloud account is granted the required permissions, you can directly access the homepage of Data Security Guard.

      • If your Alibaba Cloud account is not granted the required permissions, you are redirected to the authorization page of Data Security Guard. You can use the features of Data Security Guard only after your Alibaba Cloud account is granted the required permissions.

  2. In the left-side navigation pane, choose Rule Configuration > Sensitive Data Identification. The Sensitive Data Identification page appears.

  3. Create a sample library.

    1. On the Sample Data Management tab, click Create Sample Library.

    2. In the Create Sample Library dialog box, specify the name of the sample library and upload a sample file.

      You can upload only UTF-8-encoded files in the TXT format as sample files to DataWorks. The size of a sample file cannot exceed 500 KB. Each data entry in a sample file occupies one line.

      Note

      A sensitive data identification rule can be used to identify only one type of data. We recommend that you store the data of the same type in a sample library. To identify multiple types of data, you must configure multiple sample libraries. For example, if you want to identify employee names and home addresses, you must configure two sample libraries. One is used to identify employee names and the other is used to identify home addresses.

  4. Click Save. The sample library is created.

After you create a sample library, you can associate the sample library with a sensitive data identification rule. If the data to be identified contains data in the sample library, the data to be identified matches the sensitive data identification rule. For information about how to associate a sample library with a sensitive data identification rule, see Configure sensitive data identification rules.

Manage sample libraries

On the Sample Data Management tab, you can perform the following operations to manage existing sample libraries.样本库管理

  • View sample libraries: You can view the number of samples in each sample library and the sensitive data identification rules that are associated with each sample library. To view the details of a sample library, find the sample library and click the 查看 icon in the Actions column.

  • Modify a sample library: Find the sample library that you want to modify and click the 修改 icon in the Actions column to replace the existing sample file.

  • Delete a sample library: Find the sample library that you want to delete and click the 删除样本库 icon in the Actions column.

    Note

    If a sample library is associated with a sensitive data identification rule, you cannot delete the sample library. To delete the sample library, you must first perform the following operations: Find the sensitive data identification rule that is associated with the sample library on the Sample Data Management tab, and disassociate the sample library from the sensitive data identification rule on the configuration page of the rule. For information about how to configure a sensitive data identification rule, see Configure sensitive data identification rules.