FAQ and solutions for sensitive data scanning and detection - Data Security Center

This page answers common questions about how Data Security Center (DSC) scans and identifies sensitive data in your authorized assets.

Scan behavior

Does data scanning affect my database performance?

No. Full scans have minimal impact on database performance and do not affect your workloads. Incremental scans process only modified data files, so their impact is negligible.

To reduce the impact further, extend the full scan cycle or schedule scans during off-peak hours.

What triggers a full scan vs. an incremental scan?

DSC runs a full scan when you complete asset authorization and manually initiate a full scan, or when a scheduled full scan fires. After the initial scan, DSC runs incremental scans automatically when data changes:

Databases and MaxCompute: DSC re-scans a table when columns are added or removed. Adding or removing rows does not trigger a re-scan.
OSS: DSC re-scans objects within 24 hours of being added or modified. Removing objects from a bucket does not trigger a re-scan.
Identification rules: If you create, delete, enable, or disable an identification rule, DSC re-scans all data in all authorized assets.

How long does a scan take to start after I authorize a data asset?

DSC starts scanning within 2 hours of authorization. Total scan time depends on data volume. Scans take longer when a database has more than 10,000 tables or an OSS bucket contains more than 1 petabyte of objects.

Track scan progress on the Overview page in the DSC console. For details, see View information on the Overview page.

For more information, see Data Security Center overview.

Does DSC save any data from my assets?

No. DSC logs in to authorized assets and performs data sampling to identify sensitive data, but does not store any data from those assets.

Why does the console show only part of the data scanned after a scheduled full scan?

This is expected behavior. For custom periodic identification tasks, the first run is a full scan. Subsequent runs scan only incremental data within the configured scope.

To trigger a fresh full scan after updating identification rules, reconfigure the custom scan task.

Supported asset types and scan scope

What types of data assets can DSC scan?

DSC scans structured data, unstructured data, and big data assets. For a complete list, see Supported data asset types.

Category	Supported services
Structured data	ApsaraDB RDS, PolarDB, PolarDB for Xscale (PolarDB-X), PolarDB-X 2.0, ApsaraDB for Redis, ApsaraDB for MongoDB, ApsaraDB for OceanBase, self-managed databases
Unstructured data	Object Storage Service (OSS), Simple Log Service (SLS)
Big data	Tablestore, MaxCompute, AnalyticDB for MySQL, AnalyticDB for PostgreSQL

How does DSC scan OSS?

Scan type	Scope	Data object unit
First scan	All objects in authorized buckets	`<OSS bucket>/<Object name>`
Incremental scan	Objects added or modified since the last scan	`<OSS bucket>/<Object name>`

DSC re-scans a modified object within 24 hours of the change. Unmodified objects are not re-scanned. To manually trigger a re-scan, see Identification tasks.

How does DSC scan SLS?

Each scan covers data stored between 00:00 and 24:00 on the previous day. The scan object unit is <Simple Log Service project>/<Logstore>/<Time interval>, where each 5-minute period counts as one time interval.

To scan a broader time range, create a custom identification task and specify the scope. See Create a custom identification task.

How does DSC scan structured data such as a MaxCompute project?

DSC scans both field names and field values. If field values alone are not enough to determine sensitivity, DSC also checks the field name. For example, DSC scans the name and values of the age field together.

First scan: Scans all tables after authorization.
Incremental scan: Scans added tables and re-scans tables whose schema has changed.

Can DSC identify sensitive data in compressed packages and text files in OSS?

Yes. To see all supported file types, go to the File Type tab on the Identification Configuration page.

Limitations

Does ApsaraDB for Redis support sensitive data identification?

No. DSC provides only the security baseline check feature for ApsaraDB for Redis. To check the security configuration of your Redis instance, see Security baseline check.

Can the scan results for ApsaraDB for MongoDB be accurate to specific fields?

No. ApsaraDB for MongoDB uses distributed file storage, and the minimum storage unit is a document. Scan results are scoped to the document level, not the field level.

Do MongoDB collection scans affect production services?

Collection scans have a minor impact on database performance and do not disrupt application operations under normal circumstances. To reduce the impact, increase the scan interval or schedule scans during off-peak hours.

Can DSC identify encrypted data assets?

Yes, if transparent data encryption is enabled for the data asset, it can be identified by DSC.

Billing

What are the billing rules for scanning OSS and SLS?

DSC uses a subscription billing model. Scans consume the purchased storage protection capacity. Deduction rules differ by edition:

Enterprise Edition: Storage protection capacity is deducted based on the sizes of authorized OSS buckets and 50% of the sizes of authorized SLS projects.
Value-added Plan: Data scanning and identification are not supported.

For full pricing details, see DSC console.

When is scanning billed?

Scenario	What gets scanned	Billing
First authorization	All data in the asset	Charged for a full scan
Columns added or removed in a MaxCompute or database table	The affected table	Charged for the full scan in the data asset
Rows added or removed	No automatic re-scan	Not charged
Objects added or modified in an OSS bucket	The added or modified objects	Charged for the added or modified objects
Objects removed from an OSS bucket	No automatic re-scan	Not charged
Identification rules changed (created, deleted, enabled, or disabled)	All data in all authorized assets	Charged for a full scan across all assets

Why is my identification task stuck in the waiting state?

Free Edition has quotas of 5 GB of stored data and 100 database tables. When these quotas are exhausted, new identification tasks queue but cannot run. To continue using sensitive data identification, purchase DSC. See Purchase DSC.

Identification templates and tasks

Why can't I select the common identification template?

The common identification template does not need to be selected manually. When you create an identification task using a built-in template, DSC uses the common identification template by default. See View and configure identification templates and Use an identification task to scan sensitive data.

Can I export sensitive data identification results?

Yes. See Identify sensitive data by using identification tasks.

Can I query sensitive data through an API?

Yes. The following API operations return instance names, database names, table names, column names, and risk levels.

Operation	Description
DescribeOssObjects	Queries OSS objects. Returns the instance ID (`InstanceId`), bucket name (`BucketName`), object ID (`FileId`), and risk level ID (`RiskLevelId`).
DescribeInstances	Queries data asset instances. Returns the asset ID (`Id`) and name (`Name`).
DescribeTables	Queries tables. Returns table information (`Items`), including the table name (`Name`) and risk level (`RiskLevelId`).