Create and manage identification rules - Dataphin - Alibaba Cloud Documentation Center

Identification rules are essential for detecting sensitive data within Dataphin. They allow you to configure security measures for business data with high security requirements and promptly identify sensitive information. This topic explains how to create and manage identification rules.

Limits

By default, identification rules do not automatically scan view objects. However, you can enable view scanning in the rule execution configuration or manually add or batch import the identification results of views.

Permission description

Security administrators and custom global roles with Classification Rule-Management permissions can create and manage identification rules.
You can manage only the rules for which you are the owner, including editing, deleting, resetting, testing, changing the owner, manually running identification rules, and modifying their effective status.

Create identification rules

On the Dataphin home page, in the top menu bar, select Administration > Data Security.
In the left-side navigation pane, select Data Identification > Classification Rule. On the Classification Rule page, click the Create Identification Rule button.

In the Create Identification Rule dialog box, configure the parameters.

Parameter	Description
Basic Configuration
Identification Rule Name	The name of the identification rule. The name must meet the following requirements: Include Chinese characters, letters, numbers, and underscores (_). Cannot exceed 12 characters.
Identification Rule Description	Custom remarks for the identification rule. Cannot exceed 128 characters.
Data Classification and Grading
Data Class	You can select data classifications through the data classification directory. You can choose all classifications, all classifications under a specified directory, or specified data classifications. All Classifications: Refers to all effective data classifications under the current tenant. All Classifications Under Specified Directory: All effective data classifications under the specified directory and its subdirectories. Specified Data Classification: Filter the current directory and its subdirectories for all effective data classifications based on the parent directory. To add more data classifications, click Add A Group Of Classifications to add multiple directories. Note When the compute engine is StarRocks and the field type is HLL (HyperLoglog), content-based identification is not supported.
Scan Scope
Scan Scope	Identification methods support AND and OR methods. Identification types include Business Unit, Project, and Data Table. Identification conditions support All, Belongs To, Does Not Belong To, Contains, Does Not Contain, Regex (case-insensitive), and Regular Expression. All: Selects the entire scope within the current Dataphin. Belongs To/does Not Belong To: Select multiple specific resources. Contains/does Not Contain: Keyword matching. For example, to match a user information table, enter user_info. Regex (case-insensitive): Enter a regular expression in the input box. For example, if you need to match all names containing test, the regular expression is defined as `.test.`, with case-insensitive processing for scan results. Regular Expression: Enter a regular expression in the input box. For example, if you need to match all names containing test, the regular expression is defined as `.test.`. Note The rules for the scan scope should not exceed 5, and the relationship should not exceed 2 layers. The selection of data sections and projects should not exceed 100 objects.

Click OK to complete the creation of the identification rule.
After the identification rule is created, it appears in the identification rule list with the effective status enabled by default. The data will be automatically scanned according to the timed scheduling set in the rule execution configuration starting the next day.

Identification rule list

The identification rule list displays the name, data classification, owner, update time, and the rule's effective status. You can click the Description button to view the introduction and description of identification rules, data sampling, identification results, and identification management.
You can perform a quick search based on the identification rule name keyword or filter accurately based on data classification, owner, or by selecting 'only view mine'.

You can perform the following operations on the target identification rule.

Operation	Description
Effective Or Not	Turn on or off the switch under the effective column. After enabling, the identification rule will execute and generate execution records according to the timed scan scheduling time and real-time scan switch in the rule execution configuration. After disabling, you can manually trigger the rules that need to be executed based on business conditions. Note Setting the rule to ineffective will not affect the previously generated identification results.
Reset	Click Reset in the operation column or click Reset at the bottom. After resetting, the system will clear the existing tagging results for the data within the selected rule identification scope and then re-execute an identification as the latest identification result.
View Details	Click View Details in the operation column to view the configuration details of the identification rule.
Edit	Click Edit in the operation column to modify the identification rule information.
Manual Run	Only effective identification rules support this operation. Click the More icon in the operation column, select Manual Run, or click Manual Run at the bottom to manually run the selected identification rules. If the automatic inheritance configuration based on lineage is enabled, automatic inheritance identification results can be generated based on data lineage. For detailed instructions, see Inheritance based on lineage description. Specifically, when manually running identification rules in batches, you can manually run them regardless of whether the identification rules are effective. The supported rule execution scope includes All Rules (including Ineffective Rules) and Only Effective Rules.
Copy	Click Copy in the operation column to quickly copy the identification rule, equivalent to cloning.
Change Owner	Click the More icon in the operation column, select Change Owner, or click Change Owner at the bottom to change the owner of the identification rule. After selecting the new owner, click OK. Identification rules can only be transferred to security administrators.
Delete	Click the More icon in the operation column, select Delete, or click Delete at the bottom. After deleting the rule, the generated identification results will not be affected. The classification and grading tagging of all identification data that applied this rule will be deleted, and the related tagging deletion operation will take effect the next day.
Test	Click Test at the bottom. You can select the project or data table to be tested. The test will classify and grade the extracted sample data and tag the rules. You can select up to 10 projects or 10 tables. After the test is completed, you can click View Test Results to view the result details. Note The test only displays the results for the extracted sample data and does not actually tag them. The test run will also perform data scanning and calculation, consuming computing resources. It is recommended to accurately set the test scope. The execution progress will vary based on the number and complexity of the rules selected. Please be patient. The test is only used to determine whether a single identification rule can identify sensitive data. The actual execution will determine multiple rules that meet the conditions and finally determine an identification rule based on priority. Therefore, the test tagging results may not be consistent with the actual rule tagging results.

Manually trigger identification rules

On the Classification Rule page, click Manual Scan to open the Manual Scan dialog box.

In the Manual Scan dialog box, configure the parameters.

Parameter

Description

Scan Scope

Support selecting the scan scope of identification rules through Full Database Scan, Project Scan, or Table Scan.

Full Database Scan: Suitable for scenarios where all identification rules within Dataphin need to be executed immediately to scan data.
Project Scan: Select the project to be scanned. Suitable for scenarios where all identification rules under a specific project need to be executed immediately to scan data.
Table Scan: Select the data tables to be scanned under the project, not exceeding 10 data tables. Suitable for scenarios where all identification rules for data tables under a specific project need to be executed immediately to scan data.

Rule Execution Scope

Support selecting the execution scope of identification rules through Only Effective Rules or All Rules (including Ineffective Rules).

Only Effective Rules: Refers to all identification rules within Dataphin with an execution status of effective.
All Rules (including Ineffective Rules): Refers to all identification rules within Dataphin (regardless of whether the identification rules are effective).

Note

To set up your system correctly, you need to enable automatic inheritance in the configuration settings and select the Rule Execution Trigger scenario. For more information, see Automatic Inheritance Configuration.
After enabling, manual scans of identification rules will automatically inherit the sensitivity level of direct upstream fields based on field lineage, enhancing the scan's comprehensiveness and consistency of associated data identification results. If not enabled, the automatic inheritance based on field lineage will not occur.
Enabling automatic inheritance scanning expands the scan scope, which may increase computing resource consumption. Please configure based on actual business needs.

Click OK to initiate the scan for the selected asset objects.
You can check the progress in Execution History. The duration of the data scan process will depend on the size of the data selected. Please be patient.

What to do next

After the identification rule is created, you can adjust the scan method of the identification rule based on business conditions. For more information, see Configure the scheduling cycle of identification rules, Manually Trigger Identification Rules. In addition, you can also enable automatic inheritance configuration. For more information, see Automatic Inheritance Configuration.
You can view the sensitive data identified by the identification rule in the execution record list. For more information, see Manage Identification Results.