This topic describes how to create a sensitive data detection task and manually correct inaccurate detection results.
Manually corrected results are displayed the next day.
Create a detection task
Go to the sensitive data detection rule page. For more information, see Go to the sensitive data detection rule page.
Click the Detection Tasks tab to go to the detection task page.
Start the sensitive data detection task.
Configure the sensitive data detection task.
In the Enable Sensitive Data Detection Task dialog box, configure the task type, scan method, and scope. You can configure a real-time task, a scheduled task, or a one-time task.
Configure a real-time task.
The following table describes the parameters.
Parameter
Description
Account for Detection
Configure data sampling and scanning using an Alibaba Cloud account or a RAM user. The selected account is used to sample and scan data. The data range that can be sampled varies based on account permissions.
NoteTo perform detection using a RAM user, the RAM user must first be granted permissions on the MaxCompute project.
Real-time Detection
Only ODPS supports real-time detection. When ODPS metadata changes, such as adding tables or fields, or changing fields, Data Security Guard automatically starts sensitive data detection for the changed metadata.
Data Security Guard obtains metadata change information in real time. If the metadata change is caused by adding a new table or field, the new table or field may not have content yet. In this case, only metadata is used for sensitive data detection.
Configure a scheduled task.
Parameter
Description
Task Execution
Manually enable task execution.
Subsequent Detection Task Scan and Update Policy
Includes two options:
Rescan and update results only for changed rules, data affected by the changed rules, and data with no results.
Rescan all data and overwrite all results.
You can select Do not overwrite manually corrected results.
Account for Detection
Configure data sampling and scanning using an Alibaba Cloud account or a RAM user. The selected account is used to sample and scan data. The data range that can be sampled and scanned varies based on account permissions.
NoteTo sample and scan data using a RAM user, the RAM user must first be granted permissions on the MaxCompute project.
Content Detection
Configure whether the Content Detection and Metadata Detection rules in the sensitive data detection rules take effect. The corresponding rules take effect only after you select them.
NoteIf you do not select Content Detection, Data Security Guard does not sample or scan data. The content detection rules will not take effect, but the rules for field names and field comments will still take effect.
Sample Size
Configure the sample size for content detection. A value greater than 100 is recommended.
This parameter is required when you select Content Detection.
Scan Frequency and Scan Time
Define the scan period for the scheduled task.
This parameter is required only when you set Task Type to Scheduled Task.
For Scan Frequency, you can select Once a week or Once a day. If you select Once a week, you can specify a day from Monday to Friday. The time range is from 00:00 to 23:59.
Scan Scope
Configure the data scope for the sensitive data detection task.
All: Scans all data under the authorized account of the current tenant.
Partial Data: You can choose to scan table data in specified projects.
NoteBy default, the project scope includes all projects of all data engines.
You can scan data in specified tables of ODPS, EMR, and HOLO projects.
The total length of a table name can be
0 to 100characters. All character types are supported. If you leave this blank, all tables are scanned.The wildcard character
.*is supported. For example,.*namematches names ending withname, andprivate.*matches names starting withprivate.Separate multiple table names or field names with commas (,).
If you select Partial Data, you can add multiple project or database scan scopes. The final scan scope is the union of all specified scopes.
Manually select projects in the pane on the left.
After you select a project, the data tables within the project or database are displayed on the right. You can manually select tables or select all tables at once. By default, all data tables in the database are selected.
Keyword search is supported for project or database scopes and data tables. To search for a data table by keyword, first select a project to search within.
Configure a one-time task.
Parameter
Description
Detection Task Scan and Update Policy
Includes two options:
Rescan and update results only for changed rules, data affected by the changed rules, and data with no results.
Rescan all data and overwrite all results.
You can select Do not overwrite manually corrected results.
Account for Detection
Configure data sampling and scanning using an Alibaba Cloud account or a RAM user. The selected account is used to sample and scan data. The data range that can be sampled and scanned varies based on account permissions.
NoteTo sample and scan data using a RAM user, the RAM user must first be granted permissions on the MaxCompute project.
Content Detection
Configure whether the Content Detection and Metadata Detection rules in the sensitive data detection rules take effect. The corresponding rules take effect only after you select them.
NoteIf you do not select Content Detection, Data Security Guard does not sample or scan data. The content detection rules will not take effect, but the rules for field names and field comments will still take effect.
Sample Size
Configure the sample size for content detection. A value greater than 100 is recommended.
This parameter is required when you select Content Detection.
Scan Scope
Configure the data scope for the sensitive data detection task.
All: Scans all data under the authorized account of the current tenant.
Partial Data: You can choose to scan table data in specified projects.
NoteBy default, the project scope includes all projects of all data engines.
You can scan data in specified tables of ODPS, EMR, and HOLO projects.
The total length of a table name can be
0 to 100characters. All character types are supported. If you leave this blank, all tables are scanned.The wildcard character
.*is supported. For example,.*namematches names ending withname, andprivate.*matches names starting withprivate.Separate multiple table names or field names with commas (,).
If you select Partial Data, you can add multiple project or database scan scopes. The final scan scope is the union of all specified scopes.
Manually select projects in the pane on the left.
After you select a project, the data tables within the project or database are displayed on the right. You can manually select tables or select all tables at once. By default, all data tables in the database are selected.
Keyword search is supported for project or database scopes and data tables. To search for a data table by keyword, first select a project to search within.
Click Enable to start the scan task.
After the task starts, the Task Status changes as follows:
Real-time task: The status changes to Enabling.
Scheduled task: The status changes to Enabling. When the configured scan time is reached, the platform performs sensitive data detection based on the configuration.
One-time task: The status changes to a progress bar. The task is complete when the progress reaches 100%. The progress is calculated using the following formula: (Number of tables scanned in the current task / Total number of tables to be scanned in the current task) × 100%.
NoteAfter a detection rule is modified, the new rule takes effect during the next scheduled task. To trigger a new task immediately, you can create a one-time detection task.
After the scan task is complete, the Task Status is updated to No Task.
Manually correct detection results
Go to the sensitive data detection rule page. For more information, see Go to the sensitive data detection rule page.
Click the Detection Results tab to go to the detection results page.
Manually correct inaccurate detection results.
Operation
Description
Filter by engine type
In the section marked with ① in the preceding figure, you can select a data engine from the drop-down list.
NoteYou can correct detection results for sensitive fields in ODPS, EMR, CDH_HIVE, and HOLO engines.
Filter
In the section marked with ② in the preceding figure, you can specify filter conditions to find the detection results that you want to query.
You can filter by conditions such as Project, Table Name, and Field Name. You can also click Expand to view more filter conditions, such as Category, Level, and Sensitive Field Type.
Category: The categorization information in the default categorization and classification template of the current tenant. For more information, see Configure sensitive data categorization and classification.
Level: The classification information in the default categorization and classification template of the current tenant.
Correct a single result
The section marked with ③ in the preceding figure displays the list of detection results. You can click Displayed Fields Settings to select the field information that you want to view and refresh the list details. By default, the list displays Project, Table Name, Field Name, Category, Level, Sensitive Field Type, Manually Corrected, and Last Updated. You can also click Lineage Analysis in the Actions column to go to the Data Lineage (Public Preview) module and view the field-level data lineage.
For fields with incorrect Sensitive Field Type results, click the drop-down list in the Sensitive Field Type column. The list displays all published sensitive field types from the default categorization and classification template of the current tenant. Check whether the existing sensitive field types meet your requirements:
If they meet your requirements: Select another existing sensitive field type. Then, click the
icon to go to the Sensitive Data Detection Rules page. Modify the detection rules for both the original and the new sensitive field types to ensure future detection accuracy.If they do not meet your requirements: Click the
icon to go to the Sensitive Data Detection Rules page. Alternatively, scroll to the bottom of the drop-down list and click Manage Sensitive Field Types. You are redirected to the Sensitive Data Detection Rules page, and the Create Sensitive Field Type dialog box appears. Add a new sensitive field type and configure its detection rule. For more information, see Configure sensitive data detection rules and run detection tasks.
Correct multiple results in a batch
Select the fields that you want to correct in a batch and click Batch Correct in the section marked with ④ in the preceding figure. In the Batch Correct Detection Results dialog box, the Sensitive Field Type drop-down list displays all published sensitive field types from the default categorization and classification template of the current tenant. Select the correct sensitive field type and click Save to complete the batch correction.
Export detection results
You can click Export Detection Results to export the results that match the specified filter conditions to your local computer.
Export Detection Results: Click the
icon to automatically export the detection results that match the current filter conditions.NoteYou can export a maximum of 100,000 records.