Identify sensitive data in DataWorks using Data Security Guard, which lets you define detection rules based on sensitive field types and automatically scan your tenant's data for matches.
How it works
Detection is built around sensitive field types — named categories such as "phone number" or "employee ID" that belong to a data category and carry a sensitivity level. Each field type has one detection rule that specifies what to look for (field content, field name, or field comment) and how strictly to match.
After you publish a rule, you start a detection task. The task scans the data in scope and marks fields that match as sensitive. The Sensitive data overview module then shows the distribution of all detected sensitive fields across projects.

If detection results look wrong, you can view and manually correct them.
Prerequisites
Before you begin, make sure you have:
A DataWorks workspace with Data Security Guard enabled
An Alibaba Cloud account or RAM user with permissions on Data Security Guard
(If using content detection) DataWorks Professional Edition or higher
Go to the Data detection rules page
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select your workspace from the drop-down list and click Go to Data Development.
Click the
icon in the upper-left corner, then choose All Products > Data Governance > Data Security Guard. On the page that appears, click Try Now.Note If your account already has Data Security Guard permissions, you are taken directly to the homepage. Otherwise, you are redirected to an authorization page and must complete the authorization before proceeding.In the left navigation pane, choose Rule Configuration > Sensitive Data Detection to open the Data Identification Rules page.
Step 1: Set up data categories and sensitivity levels
Every sensitive field type must belong to a data category and have a sensitivity level. If the built-in template already covers your needs, skip this step.
Data Security Guard includes a built-in classification and grading template with 4 sensitivity levels and 4 major categories. You can edit the template or create your own — up to 10 sensitivity levels in total. Categories support multiple layers of subcategories.
Configure sensitivity levels on the Rule Configuration > Data Category and Sensitivity Level page:
Click the
icon next to the built-in template to edit its name, description, and number of grades.
Configure categories on the Rule Configuration > Sensitive Data Identification page:
New users see the default categories from the built-in template on the left side of the Data Detection Rules page. Click the
icon next to a category name to add a same-level category, add a subcategory, rename it, or delete it.Existing users can create up to 4 data categories on the left side of the page.
Step 2: Create a sensitive field type and configure a detection rule
Create a sensitive field type
On the Data Detection Rules page, click + Sensitive field type in the upper-right corner.
On the Basic Information tab, configure the following parameters:
Parameter Description Sensitive field type A unique name for the field type, such as "name", "ID card number", or "phone number". Category The data category this field type belongs to. If no suitable category exists, go to Data Category and Sensitivity Level to create one. See Configure sensitive data classification and grading. Sensitivity level The sensitivity level for this field type. Higher numbers indicate higher sensitivity. If no suitable level exists, go to Data Category and Sensitivity Level to create one. See Configure sensitive data classification and grading. 
Click Next.
Configure the detection rule
On the Rule Configuration tab, define the detection logic for the sensitive field type.

Modifying a rule after publishing it clears all detection results previously matched by that rule. Plan for a rescan before making changes to a published rule.
Hit rules
Select how multiple detection conditions combine:
| Option | Behavior |
|---|---|
| Satisfy any rule | A field is flagged if either the data content detection or field name detection condition matches. |
| Meet all rules | A field is flagged only if both conditions match. |
Detection conditions
Configure one or more of the following conditions:
Data Content Identification — detects sensitive data based on field values.
For example, if a field named name contains the value "Zhang San", this condition inspects "Zhang San". Four rule types are available:
| Rule type | When to use | Edition required |
|---|---|---|
| Regular expression | Match field values against a pattern. For example, to detect employee IDs like F-12345678 or P-87654321, use the regex [A-Z]-\d{8}. Enter test data to verify accuracy before publishing. | Professional and higher |
| Built-in identification rule | Use a pre-built rule maintained by DataWorks. Enter test data to verify accuracy. | Enterprise only |
| Sample library | Match against a configured set of example values. Useful when your sensitive data does not follow a predictable pattern. Enter test data to verify accuracy. See Detection using a sample library. | Professional and higher |
| Custom sensitive data identification model | Use a custom ML model trained for your data. MaxCompute engine only. See Detection using a custom model. | Enterprise only |
Field Name Identification — detects sensitive data based on field names.
For example, if a field is named name, this condition checks whether name matches your configured patterns, regardless of the field's value.
Specify field names in the format appropriate for your data source:
EMR, CDH, and MaxCompute:
project.table.columnHologres:
instance_id.project.table.column
Use an asterisk (*) as a wildcard in any segment:
| Pattern | Matches |
|---|---|
a.b.* | All fields in table b of project a |
ab*.c*.salary | All salary fields in tables starting with c in projects starting with ab |
*cd.ef*.sa*ry | All fields starting with sa and ending with ry in tables starting with ef in projects ending with cd |
The logical relationship between multiple entries is OR.
Field Description Identification — detects sensitive data based on field comments.
For example, configure a phone number field type with the comments "phone number" and "contact method". When the system finds a field whose comment contains "contact method", it flags that field as a phone number.
Enter field comments in the input box (0–100 characters each, all character types supported). Add up to 10 comments.
Field Exclusion — fields matching exclusion rules are never flagged by this detection rule, even if they match other conditions.
Specify fields to exclude using the same format as Field Name Identification. Wildcards are supported. The logical relationship between multiple entries is OR.
Hit ratio configuration
The hit ratio defines the minimum percentage of non-empty data in a column that must match the Data Content Identification condition for the rule to flag that column as sensitive. The default is 50%.
Formula: 100% × Number of data records in the column that hit the detection rule / Total number of data records in the column
Publish the rule
Click Publish to Use to activate the rule immediately.
When a column's data matches detection rules for multiple sensitive field types, rules are applied in this priority order:
If the number of matching conditions differs, detection priority follows: Field Name Identification > Data Content Identification > Field Description Identification.
If the number and types of matching conditions are the same, the rule for the field type with the higher sensitivity level takes precedence.
Step 3: Start a sensitive data detection task
After publishing your rules, authorize and start a detection task. The task scans data in your tenant against the active rules.
Authorize the detection task
The first time you start a detection task, click Enable and Authorize in the upper-left corner of the Sensitive Data Identification page and follow the prompts.
Configure and start the task
Configure the task on the Sensitive Data Identification page. Three task types are available:
Real-time task

| Parameter | Description |
|---|---|
| Account used for identification | The Alibaba Cloud account or RAM user whose permissions determine what data can be sampled and scanned. If using a RAM user, ensure it has permissions on the MaxCompute project. |
| Real-time recognition | Available for ODPS (MaxCompute) only. When ODPS metadata changes — such as a new table or field being added, or a field being modified — Data Security Guard automatically runs a detection task for the changed metadata. If the change introduces a new table or field with no data yet, only metadata-based detection runs. |
After clicking Run, the task status changes to Running.
Scheduled task

| Parameter | Description |
|---|---|
| Task execution | Start manually. |
| Follow-up recognition task scanning and update strategy | Choose how follow-up runs handle existing results: rescan only changed rules and affected data, or rescan all data and overwrite all results. Optionally select Do not overwrite manually corrected results to preserve manual corrections. |
| Account used for identification | The Alibaba Cloud account or RAM user for sampling and scanning. RAM users must have permissions on the MaxCompute project. |
| Content identification | Select to enable content detection and metadata detection. Without this, only field name and field comment rules run. |
| Sampling quantity | Number of rows to sample per column for content detection. Set to more than 100 for reliable results. Required when Content identification is selected. |
| Scanning frequency and Scan time | Define the scan schedule. Set to Once a week (Monday–Friday) or Once a day. Time range is 00:00–23:59. |
| Scanning range | Full: scan all data accessible to the authorized account. Partial data: scan tables in selected projects. You can scan specific tables in ODPS, E-MapReduce (EMR), and Hologres projects using table name patterns (0–100 characters, .* wildcards supported, separate multiple names with commas). |
After clicking Run, the task status changes to Running and the platform scans at the configured time.
One-time task

| Parameter | Description |
|---|---|
| Identify task scan and update strategy | Choose whether to rescan only changed rules and affected data, or rescan all data and overwrite all previous results. Optionally select Do not overwrite manually corrected results. |
| Account used for identification | The Alibaba Cloud account or RAM user for sampling and scanning. RAM users must have permissions on the MaxCompute project. |
| Content identification | Select to enable content detection and metadata detection. Without this, only field name and field comment rules run. |
| Sampling quantity | Number of rows to sample per column for content detection. Set to more than 100. Required when Content identification is selected. |
| Scanning range | Full: scan all data accessible to the authorized account. Partial data: scan tables in selected projects. Supports ODPS, EMR, and Hologres with table name patterns (wildcards and comma-separated values supported). |
After clicking Run, a progress bar appears. Progress is calculated as:
(tables scanned so far / total tables to scan) × 100%
When progress reaches 100%, the task completes and Task Status changes to No Task.
Manage detection rules
Copy a rule
Click the
icon to copy a rule. The copy is named with the suffix -copy and has Draft status. Edit it as needed before publishing.
Edit a rule
Click the
icon to modify a rule.
Editing a rule clears all detection results previously matched by that version of the rule. The basic information of rules based on built-in sensitive field types cannot be modified.
Delete a rule
Click the
icon to delete a rule.
Deleting a rule has permanent consequences:
Detection results for this sensitive field type are removed. See View and correct sensitive data detection results.
Statistics for this field type are removed from the sensitive data distribution in the Sensitive data overview module. See Sensitive data overview.
If a Fraud Detection rule references this field type, the reference is removed. See Fraud Detection management.
Batch publish rules
To publish multiple rules at once:
Click Batch publishing on the Data Detection Rules page and select the rules to publish. Only rules in Draft status are selectable.
Click Publish. The rules' status changes to Published. To cancel, click Cancel — the rules revert to Draft.
Batch deactivate rules
After deactivation, the platform stops using the rule for detection, and records for this field type in Data Discovery and Manual Data Correction are deleted.
Before deactivating, check whether the rule is referenced by a Data Masking Rule or Risk Identification Rules. If so, set the Data Masking Rule to inactive and remove the reference from Risk Identification Rules first. See Create a data masking rule and Fraud Detection management.
To deactivate multiple rules at once:
Click Batch failure on the Data Detection Rules page and select the rules to deactivate. Only rules in Published status are selectable.
Click Batch Deactivate. The rules' status changes to Draft. To cancel, click Cancel — the rules revert to Published.
What's next
After a detection task runs, view the results:
Task execution records: Go to Sensitive Data Identification > Identify Tasks > Task execution record to view completed task records from the past week (running tasks are not included). Records include Start Time, End Time, Duration, Task Type, Owner, and Data Range.
Sensitive data overview: The Sensitive data overview module shows the distribution of detected sensitive fields across projects based on the most recent scan results.
Correct detection results: If results are inaccurate, view and manually correct them.