What types of data assets can SDDP scan for sensitive data?

Sensitive Data Discovery and Protection (SDDP) can scan data assets that store structured data or unstructured data.

Released in Supported data asset
July 2019 ApsaraDB RDS for MySQL database, which stores structured data
MaxCompute project, which stores structured data
Object Storage Service (OSS) bucket, which stores unstructured data

How long does it take to scan data in my data asset after I authorize SDDP to access the data asset?

SDDP starts to scan data in your data asset within 2 hours after it is authorized to access the data asset. The time taken to scan your data depends on the data volume. If a data asset contains a large number of tables, for example, more than 10,000 tables, or if the total size of objects stored in your OSS bucket is large, for example, more than 1 PB, the scan period is longer. During a scan, the scan results are progressively updated on the Overview page in the SDDP console. For more information, see Use the Overview page.

How does SDDP scan data in an unstructured data asset, for example, an OSS bucket?

SDDP scans objects stored in OSS for sensitive data in the following way:

  • First scan: After you authorize SDDP to scan an OSS bucket, SDDP scans all objects stored in the OSS bucket during the first scan.
  • Scan of incremental data: If you add objects to or modify objects stored in the OSS bucket, SDDP scans the new or modified objects.

Can SDDP rescan an OSS object after the object is scanned?

It depends. If the object remains unchanged, SDDP will not rescan it. If you modify the object, SDDP rescans the object within 4 to 8 hours after the modification.

SDDP will provide the manual scan feature, which allows you to manually scan objects stored in specified OSS buckets.

How does SDDP scan data in a structured data asset, for example, a MaxCompute project or an RDS database?

SDDP scans the names and values of fields, for example, the age field, in MaxCompute or RDS, and determines whether the fields are sensitive. If SDDP cannot determine whether a field is sensitive based only on the values of the field, SDDP also checks the name of the field to determine whether the field is sensitive.

  • First scan: After you authorize SDDP to scan a MaxCompute project or an RDS database, SDDP scans all tables in the project or database during the first scan.
  • Scan of incremental data: If you add tables to the database or project, SDDP scans the new tables. If you modify the schema of an existing table by changing fields, SDDP scans the table again.

Does SDDP log on to a database to obtain data?

Yes, but only with your authorization. If authorized, SDDP connects to a MaxCompute project or logs on to an RDS database and samples data to detect sensitive data. SDDP does not save any data from the MaxCompute project or RDS database.

When will a scan be triggered?

SDDP automatically scans data in an authorized data asset in the scenarios listed in the following table.

Scenario Scan logic Billing
You authorize SDDP to scan your data asset for the first time. SDDP scans all data in the data asset. SDDP charges you for a full scan on data in the data asset.
You change data in a data asset after SDDP has scanned data in the data asset with authorization. If you add fields to or delete fields from a MaxCompute or RDS table, SDDP automatically rescans the table. If you add rows to or delete rows from a MaxCompute or RDS table, SDDP does not automatically rescan the table. SDDP charges you for a full scan on data in the data asset.
If you add objects to or modify objects stored in an OSS bucket, SDDP automatically scans the new or modified objects.
Note If you only delete objects from an OSS bucket, SDDP does not automatically rescan the bucket.
SDDP charges you for scanning the new or modified objects.
You change sensitive data detection rules, including adding, deleting, enabling, or disabling rules. SDDP automatically scans all data in all authorized data assets. SDDP charges you for a full scan on data in all authorized data assets.

Will SDDP skip certain data in a scan?

Yes, SDDP does not scan an OSS object if its size reaches 200 MB. SDDP only scans OSS objects whose size is less than 200 MB.

Note A package is considered as a single OSS object. If the total size of all files in a package reaches 200 MB, SDDP does not scan the package.