DataWorks can generate sample libraries from the sample files that you provide. You can then configure a sample library as a sensitive data identification rule. If target data contains entries from the sample library, it is identified as a match. This feature is typically used to identify data that can be enumerated, such as employee names and user addresses. This topic describes how to create and manage sample libraries.
Limits
You can upload only .txt text files that are in UTF-8 format and are no larger than 500 KB. Each data entry in the sample file must be on a separate line.
A sensitive data identification rule can identify only one type of data. Therefore, each sample library must contain only one type of data. To identify multiple types of data, you must configure a separate sample library for each type. For example, to identify employee names and home addresses, you must configure one sample library for names and another for home addresses.
Create a sample library
Go to Data Security Guard.
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Click the
icon in the upper-left corner. Then, choose . On the page that appears, click Try Now to go to the Data Security Guard page. NoteIf your Alibaba Cloud account is granted the required permissions, you can directly access the homepage of Data Security Guard.
If your Alibaba Cloud account is not granted the required permissions, you are redirected to the authorization page of Data Security Guard. You can use the features of Data Security Guard only after your Alibaba Cloud account is granted the required permissions.
In the navigation pane on the left, choose to open the Sensitive Data Identification page.
Create a sample library.
On the Sample Data Management tab, click Create Sample Library.
In the Create Sample Library dialog box, specify a name for the sample library and upload a sample file.
You can upload only
.txttext files that are inUTF-8format and are no larger than 500 KB. Each data entry in the sample file must be on a separate line.NoteA sensitive data identification rule can identify only one type of data. Therefore, each sample library must contain only one type of data. To identify multiple types of data, you must configure a separate sample library for each type. For example, to identify employee names and home addresses, you must configure one sample library for names and another for home addresses.
Click Save to create the sample library.
After you create a sample library, you can configure it as a sensitive data identification rule. This rule matches target data that contains data from the sample library. For more information about using a sample library in a sensitive data identification rule, see Configure sensitive data identification rules and run identification tasks.
Manage sample libraries
On the Sample Data Management page, you can also perform the following operations on existing sample libraries:
You can view the number of samples and the associated sensitive data identification rules for each sample library. To view the details for a sample library, find the sample library and click the
icon in the Actions column.To modify a sample library file, click the
icon in the Actions column of the target sample library to replace the existing sample file.To delete a sample library, click the
icon in the Actions column for that library.NoteYou cannot delete a sample library if it is referenced by a sensitive data identification rule. You can view the associated sensitive data identification rule in the sample library list. Then, go to the configuration page for the rule and remove the reference to the sample library. After the reference is removed, you can delete the library. For more information about configuring a sensitive data identification rule, see Configure sensitive data identification rules and run identification tasks.