Classification and grading principles - Data Management - Alibaba Cloud Documentation Center

DMS classification and grading-based scanning detects sensitive data in your databases. It automatically applies classification and grading tags to fields that match identification rules, protects fields with a high security level, and clearly displays sensitive fields in the scan results.

Principles

DMS classification and grading-based scanning is a two-layer process consisting of identification model-based scanning and classification and grading-based scanning. The system first scans table fields and their data by using identification models to determine their information type, such as a name or a date. Then, based on the results, the classification and grading scan applies rules from the classification and grading template that is associated with the instance. This categorizes the fields for business purposes and automatically assigns a security level and a data masking algorithm to each field.

Although classification and grading-based scanning uses the output of identification model-based scanning, the two processes are independent and do not interfere with each other.

Identification model scanning

Identification model-based scanning supports the following two identification methods:

Data content identification (regular expression match)

DMS classifies a field by matching its content with an identification model. For example, if an identification model is named 'ID Card' and a field's data matches the ID card validation algorithm, DMS tags the field as the ID card type.

To ensure scanning efficiency, DMS identifies data content from a random sample of data. If the matched data in the sample exceeds a specific threshold, the system classifies the field accordingly.

Metadata identification

DMS classifies a field by matching its field name with an identification model. For example, when the built-in ID card identification model detects a field name such as id_card in a table, it tags the field as the ID card type.

Identification results

DMS saves up to three identification results for a field.

Note

DMS provides built-in identification models, and you can create custom ones. Custom identification models support only data content identification.
Identification models can be disabled or enabled (default). The system uses only enabled identification models to scan fields.

Classification and grading scanning

During a classification and grading-based scan, DMS matches each field against a set of classification rules. If a field meets a rule's criteria, DMS applies that rule's classification to the field.

Classification rule evaluation

First, the system filters for all enabled classification rules within the classification and grading template. Then, for each field, it evaluates the field in the following three steps:

Match identification models. DMS checks for an overlap between the field's matched identification models and those specified in the classification rule.

For example, if Model A and Model B match a field, and a classification rule specifies Model B and Model C, their intersection is Model B. Because an overlap exists, DMS proceeds to the next step. If no overlap exists, the rule does not apply, and DMS evaluates the next rule.
Evaluate the identification scope. DMS checks whether the field's metadata, such as its database name, table name, field name, and field description, matches the identification scope of the rule.

If the metadata matches, the classification from this rule is temporarily saved as a candidate result for the field, and DMS proceeds to evaluate the next classification rule against the field.
Assign the final classification. After all classification rules are evaluated against the field, DMS assigns the classification.

If only one rule matches, DMS applies its classification to the field. If multiple rules match, DMS sorts them by their security level in ascending order and applies the classification from the rule with the highest security level.

The following diagram illustrates the classification and grading-based scanning process for a single field.

Classification rule parameters

Before running a classification and grading-based scan, you can customize classification rules. This section describes the key parameters:

The configuration form also includes the Rule Name (for example, "Customer Name"), data category (for example, "Personal Information"), and Rule Description fields.

Security level: You can customize the security level for fields. A higher level indicates that the data is more sensitive and important. For more information, see Field security level.
Identification model: Specifies the identification models that the rule uses. You can select multiple models, which are evaluated with OR logic.
Identification scope: Filters fields based on their metadata. The relationship between the scope criteria can be AND or OR. If you select AND, the field's metadata must match all scope criteria. If you select OR, the metadata needs to match only one of the criteria.

Data Management:DMS classification and grading scanning