All Products
Search
Document Center

Data Security Center:Scan sensitive data using identification tasks

Last Updated:Mar 31, 2026

Data Security Center (DSC) scans your connected data assets to detect sensitive information, then classifies and grades each finding by sensitivity level and type. This gives you the visibility needed to apply precise access controls and strengthen your data security posture.

Prerequisites

Before you begin, ensure that you have:

  • Authorized DSC to access the data assets you want to scan. For more information, see Asset authorization

  • At least one enabled identification template, if you plan to create a custom identification task. For more information, see Use identification templates

Task types

DSC supports two types of identification tasks. Choose the type that matches your scanning needs:

Task typeWhen to use
Default taskAutomatically created when you authorize an asset. Uses the main identification template. Suitable for routine, ongoing scans of all authorized assets.
Custom identification taskCreate when you need to scan specific assets with a non-default template, apply multiple templates, or scan historical Simple Log Service (SLS) data.

Default tasks

When you authorize an asset, DSC automatically creates a default task for that asset using the main identification template.

Identification templates: Default tasks always use the main identification template configured for DSC. This cannot be changed per task.

  • Main identification template: The global template configured in DSC. It can be a built-in industry template (such as Internet Industry Classification or Vehicle Internet Classification) or a custom template.

  • Common identification template: When the main template is a built-in industry template, DSC also applies the common identification template, which is based on the Personal Information Security Specification (GB/T 35273-2020) standard.

Scan trigger and schedule:

  • One-click connection (databases, Object Storage Service (OSS), SLS):

    • If you selected Scan assets and identify sensitive data now. during connection, the default task runs immediately.

    • If you did not select that option, trigger the scan manually: go to Classification and Grading > Tasks > Identification Tasks, click Default Tasks, and then click Rescan.

  • Account/password connection (databases): The system creates a default task at connection time. Periodic scans then run automatically every day starting the next day, typically during early morning hours.

The minimum interval between two scans is 24 hours.

Scan scope:

  • Databases and OSS: Full scan on first run; subsequent runs scan only new or modified data.

  • SLS: Each scan covers data from 00:00 to 24:00 on the _previous day_ relative to scan execution time. To scan specific historical SLS data, create a custom identification task instead.

Changes to the main identification template do not trigger an immediate scan. New rules apply only at the next scheduled run.

Custom identification tasks

image

Create a custom identification task to:

  • Scan specific assets with one or more enabled identification templates (instead of the main template).

  • Scan historical SLS data by specifying a custom time range.

Important

The system supports a maximum of 5 active periodic identification tasks. Once you reach this limit, you cannot create additional tasks with a periodic schedule.

If a template is currently disabled, enable it before selecting it for a custom task. For more information, see Enable an identification template.

Scan behavior and limits

Scan logic

Task typeFirst scanSubsequent scans
Default taskFull scan of all authorized data in the assetScans only new or modified data objects; triggered manually or on a configured schedule
Custom identification taskScans data within the specified identification scopeScans only new or modified data objects within the specified scope

Data objects that have not changed since the last scan are skipped.

Sensitivity levels

DSC classifies sensitive data on a scale from S1 to S10, where a higher number indicates greater sensitivity. If a data object matches multiple identification rules, the highest matching sensitivity level takes precedence. A result of N/A means no sensitive data was detected.

The valid range of sensitivity levels depends on the associated identification template. For more information, see Set the sensitivity level for an identification template.

Scanned object units

Asset typeScanned object unit
Databases (RDS, PolarDB)<Instance>/<Database>/<Table> — each table is one data object
Big data (Tablestore, MaxCompute)<Instance>/<Table> — each table is one data object
OSS<Bucket>/<File> — each file is one data object
SLS<Project>/<Logstore>/<Time Segment> — data is split into 5-minute segments; each segment is one data object

Sampling limits

DSC samples data to balance detection coverage with system performance.

Structured and big data (RDS, PolarDB, Tablestore, MaxCompute):

  • By default, the first 200 rows of each table are sampled. You can increase this to a maximum of 1,000 rows per table.

  • Within the sampled rows, only the first 10 KB of data per field is scanned.

Unstructured data (OSS):

  • Files larger than 200 MB are skipped by default. You can raise this limit to a maximum of 1,000 MB per file.

  • For compressed or archived files, only the first 1,000 child files are scanned.

  • A single scan task scans at most 4 objects concurrently per bucket.

  • QPS limit: 100 API requests per second per scan task against an OSS bucket.

  • Bandwidth limit: 200 MB/s internal outbound bandwidth per scan task.

  • DSC supports over 800 OSS file types, including text, office documents, images, design files, code, binaries, archives, applications, audio, video, and chemical structure files. For the full list, see OSS file types that can be identified.

Unstructured data (SLS):

  • Files larger than 200 MB are skipped.

For a comprehensive reference on all limits, see Limits.

Scan speed

The following estimates are for reference only. Actual speed varies with system load and data complexity.

Data typeEstimated scan speed
Structured data (RDS, PolarDB) and big data (Tablestore, MaxCompute)~1,000 columns per minute for databases with 1,000+ tables (200-row sample)
Unstructured data (OSS, SLS)1 TB takes 6–48 hours, averaging 24 hours, depending on file type distribution

Best practices

RecommendationDetails
Prioritize high-risk assetsIf scanning all data at once is not feasible, start with assets that are frequently accessed, frequently modified, or subject to unknown operations.
Run a targeted pilot scan firstLimit your initial scan to a specific database or OSS bucket to validate and tune your identification rules before a full rollout. Avoid enabling all identification rules indiscriminately—generic rules (such as Date, Time, and URL) can generate excessive false positives on large datasets. Enable only the rules relevant to your business context. For structured data, make sure the sample size is large enough to capture representative data.
Align scan schedules with data update frequencyConfigure tasks to run daily, weekly, or monthly based on how often your data changes. Regular scanning ensures timely detection of new sensitive data. Schedule scans during off-peak hours to minimize performance impact.

Manage default identification tasks

Default tasks provide ongoing visibility into sensitive data across all authorized assets. You can view, configure, pause, terminate, and reactivate them, but you cannot delete them.

View default tasks

  1. Log on to the Data Security Center console.

  2. In the left navigation pane, go to Classification and Grading > Tasks.

  3. On the Tasks page, click the Identification Tasks tab, and then click Default Tasks.

  4. On the Discovery Task Monitoring page, view the list of default tasks.

From this page, you can perform the following operations:

OperationDescription
RescanTriggers an immediate full scan to update results. Use this after updating the main identification template, upgrading the identification model, or when significant data changes occur.
PauseTemporarily halts a running default task—for example, if you detect database performance issues.
TerminateStops the current task execution and prevents the default task from running in future cycles.
EnableReactivates a terminated task.
Default tasks cannot be deleted.

Configure scan settings

Customize the schedule for a default task. Align the scan cycle with your data update frequency (minimum: daily).

  1. On the Discovery Task Monitoring page, select the check box of the task you want to configure, and then click Scan Settings above the task list.

    image

  2. In the Scan Settings dialog box, configure the scan cycle and automatic scan start time, and then click OK.

Important
  • Set the start time to off-peak hours to minimize database impact.

  • Monitor CPU and memory usage during scans. If abnormalities occur, pause or terminate the task immediately.

Create custom identification tasks

Custom identification tasks let you scan specific assets on a non-default schedule or with a specialized template.

Important

The system supports a maximum of 5 active periodic identification tasks. Once you reach this limit, you cannot create additional periodic tasks.

Create a custom identification task

Before you start, make sure the identification templates you want to use are enabled. For more information, see Use identification templates.

  1. In the left navigation pane, go to Classification and Grading > Tasks.

  2. On the Identification Tasks tab, select the Asset Type for which you want to create a task, and then click Create.

    image

  3. In the Create panel, configure the parameters described below, and then click OK.

Basic information:

ParameterDescription
Asset TypeDisplays the asset type selected in the previous step. Cannot be modified.
Task NameEnter a name for the task.
Task notes(Optional) Enter notes for the task.

Task and plan:

Select a task start time:

  • Immediate Scan: Runs the task immediately upon creation.

  • Periodic Scan: Runs the task on a scheduled frequency. Configure Scan Frequency and Scan Time (Structured Data Only). To also trigger an immediate run alongside the schedule, select Scan Once Now.

The Scan Time setting applies only to structured data assets. Unstructured data scans run according to system resource availability.

Identification Template: Select up to two enabled identification templates for this task. For details on enabling templates, see Use identification templates.

Identification scope:

For structured data (RDS, PolarDB):

ParameterDescription
Identification Scope of Structured DataGlobal Scan: Scans all authorized structured assets. Specify Scan Scope: Select specific instances and databases. Click Add Identification Scope to add multiple instances.
Scan LimitNumber of rows sampled per table. Default: 200 rows. Maximum: 1,000 rows.

For unstructured data (OSS):

ParameterDescription
ObjectGlobal Scan: Scans all authorized OSS buckets. Specify Scan Scope: Select specific buckets, with optional filters (Prefix, Directory, Suffix) to include or exclude specific files.
Sampling MethodRetrieves data using the ListObjects API. Global Scan: Scans all data. Custom Depth: Scans based on a sampling ratio (Sampling Rate). For example, a rate of 1/10 scans the 1st file, skips 9, and scans the 11th.
Scan DepthGlobal Scan: Scans the full directory path. Specify Scan Scope: Limits depth to levels 1–10. For example, entering "5" scans only the top 5 directory levels.
Scan LimitMaximum file size to scan. Default: 200 MB. Maximum: 1,000 MB. Data beyond the limit is skipped.
Synchronize All Identification Results to SLSSelect to send full scan logs to Simple Log Service.

For unstructured data (SLS):

ParameterDescription
Asset ScopeGlobal Scan: Scans all authorized SLS Projects. Specify Scan Scope: Select specific Projects and Logstores.
Time RangeLast 15 Minutes, Last 1 hour, Yesterday, Last 1 Day, Last 7 Days, or Last 30 Days. Custom: Specify a custom range in minutes (step size: 5 minutes).

Other settings:

ParameterDescription
Tagging Result OverwritingSkip Manual Tagging Result: Preserves your manual corrections. Overwrite Manual Tagging Result: Replaces manual corrections with the new scan results.

Modify or delete a custom identification task

image
  • Edit: Reconfigures the custom identification task. All parameters can be modified.

  • > Delete (via the more actions menu image): Removes a redundant custom identification task.

Manage task operations

Rescan a task

Trigger a rescan when the identification model is upgraded or significant data changes have occurred. A rescan triggers an immediate full scan of the specified asset.

Rescan is not supported for custom tasks with Scan Type set to Immediate Scan. Before rescanning, make sure the relevant identification templates are enabled. Run this operation during off-peak hours to minimize performance impact.
  1. On the Identification Tasks tab, trigger the rescan:

    • Custom identification task: In the task list, click Rescan in the Actions column.

    • Default task: Click Default Tasks, find the target asset, and click Rescan in the Actions column.

  2. Track progress in the Scan Status column.

Pause or terminate a task

image
  • Pause: Temporarily halts a running task. Useful during service anomalies. Click Pause in the Actions column.

  • Terminate: Stops the current task and prevents all future runs of that task (applies to both custom identification tasks and default tasks).

Correct identification results

If DSC incorrectly identifies data (false positive) or misses sensitive data (false negative), you can manually correct the results. These corrections help the system improve its accuracy over time.

  1. On the Tasks page, click the Revision Tasks tab.

  2. In the left navigation pane, click the asset type you want to manage.

  3. Click Revision or Resume in the Actions column for the target sensitive data. Follow the on-screen instructions to modify the Revised Model, and then click OK.

    image

After restoring, the previous identification model is reinstated.

View and export results

The latest scan results from the main identification template are available on the Data Classification > Asset Insight page. For more information, see View sensitive data identification results.

To download results, create an export task specifying the target identification template and data assets.

Important

Export results are available only for assets and templates that have completed a successful identification task.

Create an export task

  1. On the Tasks page, click the Export Tasks tab.

  2. Click Create.

  3. Configure the export task:

    1. In the Basic Information section, enter a task name and select an identification template used by the identification task. Only enabled identification templates can be selected.

    2. In the Export Dimension section, select Asset Type or Asset Instance:

      • Asset Type: Select the asset types to export.

      • Asset Instance: Select the specific asset instances to export.

  4. Click OK.

After you create the export task, its status appears in the export task list. Larger datasets take longer to export.

Download exported results

  1. Wait for the Export Status to change to Finished.

  2. Click Download in the Actions column of the target export task.

    image

Important

Download the exported data within three days of completion. After three days, the task expires and the exported data is no longer available.

What's next