Sample libraries let you flag sensitive data based on a list of known values — such as employee names or user addresses — rather than relying on patterns or regular expressions. Upload a text file containing the values you want to detect, and DataWorks generates a sample library from it. Configure the sample library as a sensitive data identification rule: any target data that contains a matching entry is flagged.
Prerequisites
Before you begin, make sure that you have:
-
Access to Data Security Guard in your DataWorks workspace
-
The required permissions to use Data Security Guard. If your Alibaba Cloud account lacks the required permissions, you are redirected to the authorization page when you open Data Security Guard.
Sample file requirements
Before creating a sample library, prepare a plain text file that meets the following requirements:
| Requirement | Value |
|---|---|
| Format | Plain text (.txt), encoded in UTF-8 |
| Maximum size | 500 KB |
| Structure | One data entry per line |
Example file:
Alice Johnson
Bob Smith
Carol Williams
Each sample library can contain only one type of data, because a sensitive data identification rule identifies only one type of data. To identify multiple types of sensitive data — for example, employee names and home addresses — create a separate sample library for each type.
Create a sample library
Step 1: Prepare your sample file
Using a text editor, create a plain text file that lists the values you want to identify. Make sure the file meets the sample file requirements above.
Step 2: Create the library in the console
-
Log on to the DataWorks console. In the top navigation bar, select the desired region.
-
In the left-side navigation pane, choose Data Development and O&M > Data Development. Select the desired workspace from the drop-down list and click Go to Data Development.
-
Click the
icon in the upper-left corner. Choose All Products > Data Governance > Data Security Guard, then click Try Now. -
In the left-side navigation pane, choose Rule Configuration > Sensitive Data Identification.
-
On the Sample Data Management tab, click Create Sample Library.
-
In the Create Sample Library dialog box, enter a name for the library and upload the sample file you prepared.
-
Click Save.
After the sample library is created, configure it as a sensitive data identification rule. For details, see Configure sensitive data identification rules and run identification tasks.
Manage sample libraries
On the Sample Data Management tab, you can view, modify, and delete existing sample libraries.
View a sample library
Each entry in the list shows the number of samples and any associated sensitive data identification rules. To view full details for a library, click the
icon in the Actions column.
Modify a sample library
To replace the sample file in an existing library, click the
icon in the Actions column.
Delete a sample library
Before deleting a sample library, check whether any sensitive data identification rule references it. Deleting a library that is still referenced by a rule is not allowed. Remove the reference from the rule first.
To delete a sample library:
-
On the Sample Data Management tab, locate the library you want to delete and check the Actions column for any associated sensitive data identification rules.
-
If the library is referenced by a rule, go to the configuration page for that rule and remove the reference to the sample library. For details, see Configure sensitive data identification rules and run identification tasks.
-
Return to the Sample Data Management tab and click the
icon in the Actions column to delete the library.
What's next
After creating a sample library, use it as the matching source in a sensitive data identification rule to scan your data assets. For details, see Configure sensitive data identification rules and run identification tasks.