All Products
Search
Document Center

DataWorks:Configure sensitive data detection rules and run tasks

Last Updated:Mar 25, 2026

Identify sensitive data in DataWorks using Data Security Guard, which lets you define detection rules based on sensitive field types and automatically scan your tenant's data for matches.

How it works

Detection is built around sensitive field types — named categories such as "phone number" or "employee ID" that belong to a data category and carry a sensitivity level. Each field type has one detection rule that specifies what to look for (field content, field name, or field comment) and how strictly to match.

After you publish a rule, you start a detection task. The task scans the data in scope and marks fields that match as sensitive. The Sensitive data overview module then shows the distribution of all detected sensitive fields across projects.

image

If detection results look wrong, you can view and manually correct them.

Prerequisites

Before you begin, make sure you have:

  • A DataWorks workspace with Data Security Guard enabled

  • An Alibaba Cloud account or RAM user with permissions on Data Security Guard

  • (If using content detection) DataWorks Professional Edition or higher

Go to the Data detection rules page

  1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select your workspace from the drop-down list and click Go to Data Development.

  2. Click the 图标 icon in the upper-left corner, then choose All Products > Data Governance > Data Security Guard. On the page that appears, click Try Now.

    Note If your account already has Data Security Guard permissions, you are taken directly to the homepage. Otherwise, you are redirected to an authorization page and must complete the authorization before proceeding.
  3. In the left navigation pane, choose Rule Configuration > Sensitive Data Detection to open the Data Identification Rules page.

Step 1: Set up data categories and sensitivity levels

Every sensitive field type must belong to a data category and have a sensitivity level. If the built-in template already covers your needs, skip this step.

Data Security Guard includes a built-in classification and grading template with 4 sensitivity levels and 4 major categories. You can edit the template or create your own — up to 10 sensitivity levels in total. Categories support multiple layers of subcategories.

Configure sensitivity levels on the Rule Configuration > Data Category and Sensitivity Level page:

  • Click the image icon next to the built-in template to edit its name, description, and number of grades.

Configure categories on the Rule Configuration > Sensitive Data Identification page:

  • New users see the default categories from the built-in template on the left side of the Data Detection Rules page. Click the image icon next to a category name to add a same-level category, add a subcategory, rename it, or delete it.

  • Existing users can create up to 4 data categories on the left side of the page.

Note Category names must be unique, 1–30 characters long, and contain only letters and digits. Before deleting a category, deactivate any published detection rules it contains. For configuration details, see Configure sensitive data classification and grading.

Step 2: Create a sensitive field type and configure a detection rule

Create a sensitive field type

  1. On the Data Detection Rules page, click + Sensitive field type in the upper-right corner.

  2. On the Basic Information tab, configure the following parameters:

    ParameterDescription
    Sensitive field typeA unique name for the field type, such as "name", "ID card number", or "phone number".
    CategoryThe data category this field type belongs to. If no suitable category exists, go to Data Category and Sensitivity Level to create one. See Configure sensitive data classification and grading.
    Sensitivity levelThe sensitivity level for this field type. Higher numbers indicate higher sensitivity. If no suitable level exists, go to Data Category and Sensitivity Level to create one. See Configure sensitive data classification and grading.

    c4d5ddbe9d6dd319096ed9dc93957d61

  3. Click Next.

Configure the detection rule

On the Rule Configuration tab, define the detection logic for the sensitive field type.

image
Important

Modifying a rule after publishing it clears all detection results previously matched by that rule. Plan for a rescan before making changes to a published rule.

Hit rules

Select how multiple detection conditions combine:

OptionBehavior
Satisfy any ruleA field is flagged if either the data content detection or field name detection condition matches.
Meet all rulesA field is flagged only if both conditions match.
Note The Hit rules setting applies only to Data Content Identification and Field Name Identification conditions.

Detection conditions

Configure one or more of the following conditions:

Data Content Identification — detects sensitive data based on field values.

For example, if a field named name contains the value "Zhang San", this condition inspects "Zhang San". Four rule types are available:

Rule typeWhen to useEdition required
Regular expressionMatch field values against a pattern. For example, to detect employee IDs like F-12345678 or P-87654321, use the regex [A-Z]-\d{8}. Enter test data to verify accuracy before publishing.Professional and higher
Built-in identification ruleUse a pre-built rule maintained by DataWorks. Enter test data to verify accuracy.Enterprise only
Sample libraryMatch against a configured set of example values. Useful when your sensitive data does not follow a predictable pattern. Enter test data to verify accuracy. See Detection using a sample library.Professional and higher
Custom sensitive data identification modelUse a custom ML model trained for your data. MaxCompute engine only. See Detection using a custom model.Enterprise only
Note Content detection requires DataWorks Professional Edition or higher. Without it, content detection rules do not take effect; field name and field comment detection still work.

Field Name Identification — detects sensitive data based on field names.

For example, if a field is named name, this condition checks whether name matches your configured patterns, regardless of the field's value.

Specify field names in the format appropriate for your data source:

  • EMR, CDH, and MaxCompute: project.table.column

  • Hologres: instance_id.project.table.column

Use an asterisk (*) as a wildcard in any segment:

PatternMatches
a.b.*All fields in table b of project a
ab*.c*.salaryAll salary fields in tables starting with c in projects starting with ab
*cd.ef*.sa*ryAll fields starting with sa and ending with ry in tables starting with ef in projects ending with cd

The logical relationship between multiple entries is OR.

Field Description Identification — detects sensitive data based on field comments.

For example, configure a phone number field type with the comments "phone number" and "contact method". When the system finds a field whose comment contains "contact method", it flags that field as a phone number.

Enter field comments in the input box (0–100 characters each, all character types supported). Add up to 10 comments.

Field Exclusion — fields matching exclusion rules are never flagged by this detection rule, even if they match other conditions.

Specify fields to exclude using the same format as Field Name Identification. Wildcards are supported. The logical relationship between multiple entries is OR.

Hit ratio configuration

The hit ratio defines the minimum percentage of non-empty data in a column that must match the Data Content Identification condition for the rule to flag that column as sensitive. The default is 50%.

Formula: 100% × Number of data records in the column that hit the detection rule / Total number of data records in the column

Note The hit ratio applies only to Data Content Identification conditions.

Publish the rule

Click Publish to Use to activate the rule immediately.

Note To save the rule without activating it, click Save as Draft instead.

When a column's data matches detection rules for multiple sensitive field types, rules are applied in this priority order:

  1. If the number of matching conditions differs, detection priority follows: Field Name Identification > Data Content Identification > Field Description Identification.

  2. If the number and types of matching conditions are the same, the rule for the field type with the higher sensitivity level takes precedence.

Step 3: Start a sensitive data detection task

After publishing your rules, authorize and start a detection task. The task scans data in your tenant against the active rules.

Authorize the detection task

The first time you start a detection task, click Enable and Authorize in the upper-left corner of the Sensitive Data Identification page and follow the prompts.

Note After authorizing, click Authorization Records in the upper-right corner to review authorization details at any time.

Configure and start the task

Configure the task on the Sensitive Data Identification page. Three task types are available:

Real-time task

image
ParameterDescription
Account used for identificationThe Alibaba Cloud account or RAM user whose permissions determine what data can be sampled and scanned. If using a RAM user, ensure it has permissions on the MaxCompute project.
Real-time recognitionAvailable for ODPS (MaxCompute) only. When ODPS metadata changes — such as a new table or field being added, or a field being modified — Data Security Guard automatically runs a detection task for the changed metadata. If the change introduces a new table or field with no data yet, only metadata-based detection runs.

After clicking Run, the task status changes to Running.

Scheduled task

image
ParameterDescription
Task executionStart manually.
Follow-up recognition task scanning and update strategyChoose how follow-up runs handle existing results: rescan only changed rules and affected data, or rescan all data and overwrite all results. Optionally select Do not overwrite manually corrected results to preserve manual corrections.
Account used for identificationThe Alibaba Cloud account or RAM user for sampling and scanning. RAM users must have permissions on the MaxCompute project.
Content identificationSelect to enable content detection and metadata detection. Without this, only field name and field comment rules run.
Sampling quantityNumber of rows to sample per column for content detection. Set to more than 100 for reliable results. Required when Content identification is selected.
Scanning frequency and Scan timeDefine the scan schedule. Set to Once a week (Monday–Friday) or Once a day. Time range is 00:00–23:59.
Scanning rangeFull: scan all data accessible to the authorized account. Partial data: scan tables in selected projects. You can scan specific tables in ODPS, E-MapReduce (EMR), and Hologres projects using table name patterns (0–100 characters, .* wildcards supported, separate multiple names with commas).

After clicking Run, the task status changes to Running and the platform scans at the configured time.

One-time task

image
ParameterDescription
Identify task scan and update strategyChoose whether to rescan only changed rules and affected data, or rescan all data and overwrite all previous results. Optionally select Do not overwrite manually corrected results.
Account used for identificationThe Alibaba Cloud account or RAM user for sampling and scanning. RAM users must have permissions on the MaxCompute project.
Content identificationSelect to enable content detection and metadata detection. Without this, only field name and field comment rules run.
Sampling quantityNumber of rows to sample per column for content detection. Set to more than 100. Required when Content identification is selected.
Scanning rangeFull: scan all data accessible to the authorized account. Partial data: scan tables in selected projects. Supports ODPS, EMR, and Hologres with table name patterns (wildcards and comma-separated values supported).

After clicking Run, a progress bar appears. Progress is calculated as:

(tables scanned so far / total tables to scan) × 100%

When progress reaches 100%, the task completes and Task Status changes to No Task.

Note Rule changes take effect in the next scheduled task, not immediately. To apply a rule change right away, create a one-time task manually.

Manage detection rules

Copy a rule

Click the 复制 icon to copy a rule. The copy is named with the suffix -copy and has Draft status. Edit it as needed before publishing.

Edit a rule

Click the 编辑 icon to modify a rule.

Important

Editing a rule clears all detection results previously matched by that version of the rule. The basic information of rules based on built-in sensitive field types cannot be modified.

Delete a rule

Click the 删除 icon to delete a rule.

Important

Deleting a rule has permanent consequences:

Batch publish rules

To publish multiple rules at once:

  1. Click Batch publishing on the Data Detection Rules page and select the rules to publish. Only rules in Draft status are selectable.

  2. Click Publish. The rules' status changes to Published. To cancel, click Cancel — the rules revert to Draft.

Batch deactivate rules

After deactivation, the platform stops using the rule for detection, and records for this field type in Data Discovery and Manual Data Correction are deleted.

Before deactivating, check whether the rule is referenced by a Data Masking Rule or Risk Identification Rules. If so, set the Data Masking Rule to inactive and remove the reference from Risk Identification Rules first. See Create a data masking rule and Fraud Detection management.

To deactivate multiple rules at once:

  1. Click Batch failure on the Data Detection Rules page and select the rules to deactivate. Only rules in Published status are selectable.

  2. Click Batch Deactivate. The rules' status changes to Draft. To cancel, click Cancel — the rules revert to Published.

What's next

After a detection task runs, view the results:

  • Task execution records: Go to Sensitive Data Identification > Identify Tasks > Task execution record to view completed task records from the past week (running tasks are not included). Records include Start Time, End Time, Duration, Task Type, Owner, and Data Range.

  • Sensitive data overview: The Sensitive data overview module shows the distribution of detected sensitive fields across projects based on the most recent scan results.

  • Correct detection results: If results are inaccurate, view and manually correct them.