Overview of asset security - Dataphin - Alibaba Cloud Documentation Center

Asset security equips Dataphin with comprehensive capabilities for sensitive data detection and protection across the entire data lifecycle. It employs data classification, grading, and masking to help customers develop a robust data security framework, ensuring secure and compliant data utilization.

Prerequisites

The asset security value-added service is purchased and the current tenant has activated the asset security module. For activation, see Tenant management.

Scenarios

Dataphin supports a variety of scenarios for data security protection:

Scenario 1: Sensitive Data Protection in Business Data
Asset security's sensitive data detection and protection features can be used to mask plaintext sensitive data, such as changing "Zhang San" to "*San," thereby safeguarding business data.
Scenario 2: Data Warehouse Construction in the Development Environment
When transferring sensitive data from production to development environments, asset security's built-in sensitive data detection and masking rules can automatically obscure sensitive data. This ensures that such data remains within the secure confines of the production environment and is not exposed to less secure development areas.
Scenario 3: Flexible Use of Masking Whitelist
The masking whitelist feature of asset security allows designated users to view unmasked data during specified times. For instance:
- Company executives may need to access plaintext financial data for a limited period. They can be added to the masking whitelist with a set validity period.
- In the realm of e-commerce, certain promotions may necessitate the display of actual sales figures. Users can be placed on the masking whitelist for a specified duration to view the sales data for a particular day.

Advantages

Rich Built-in Resources: Dataphin offers an extensive array of built-in resources, including data classification and grading, sensitive data detection rules, and masking algorithms. These resources facilitate the swift establishment of a fundamental data security system.
Flexible Customization of Detection and Masking Support: Users can tailor detection rules to specific needs based on range and priority, with manual adjustments also available. This flexibility enables the creation of a comprehensive, multi-level, and multi-realm detection rule system. Masking rules allow for the use of various algorithms and the customization of their parameters to suit different data masking requirements.
Close Integration with Production and Development Scenarios: Asset security is seamlessly integrated with data forwarding scenarios within the development and production processes, ensuring data security throughout the entire Dataphin data development cycle.

Terms

Module	Concept	Explanation
Sensitive Data Detection	Data Grading	Data grading involves assigning sensitivity levels to data, ranging from L1 (public) to L4 (top secret) within Dataphin's built-in system. Custom grading tailored to an enterprise's specific needs is also supported.
	Data Classification	Data classification categorizes data by its usage realm to help differentiate sensitivity levels. For instance, company business data is typically more sensitive than production workshop sensor data. Dataphin's built-in system includes classifications like company data, business data, and personal data, with support for custom classifications as well.
	Detection Rules	Detection rules serve as automated policies for identifying sensitive fields. In practical production environments, with thousands of tables and tens of thousands of fields, manually labeling the sensitivity of each field is impractical. Dataphin offers a feature for automatically detecting sensitive fields based on rules, enabling the automatic identification of sensitive fields by analyzing field names or content. Furthermore, detection rules allow for the configuration of rule priority and scan range, among other detailed settings, to enhance the establishment of a comprehensive detection rule system.
	Identification Record	The identification record module logs the outcomes of all detection rule executions, detailing which rule was triggered by a field, and the corresponding sensitivity level and classification of the sensitive data. For fields necessitating special attention, the module allows manual modification of detection rules to guarantee precise and useful detection results.
Sensitive Data Protection	Data Masking Rules	Data masking rules define how detected sensitive fields are protected. Methods like redaction and hashing are available, with the ability to bind masking rules to detection rules on a one-to-one basis. The rules can be restricted to specific projects if different treatments are needed for the same type of field.
	Data Masking Algorithm	This module showcases all supported masking algorithms, including redaction (e.g., "Zhang San" becomes "San") and hashing* (e.g., salted MD5).
	Dynamic De-identification	Does not alter the underlying data storage; data is masked only during consumption. This is typically used in ad hoc queries for data analysis, transferring production data to development in data development, and providing data services in data consumption scenarios.
	Static De-identification	Modifies the underlying data storage directly; data is encrypted or masked at the storage level, such as with common pn_md5. This is typically applied in encrypting sensitive data during data integration and masking application layer data during data warehouse layering construction.
	Dynamic Masking Whitelist	This is used in scenarios where temporary real data query access is granted to certain users for specific business objectives. Common applications include troubleshooting in data development and disclosing sales data during particular events (e.g., Double 11).

Asset security usage flow

For managing data classifications, see Manage Data Classification and . For data grading tasks, refer to Manage Data Grading and .
Start by defining data classification and grading.
For guidance on how to create, configure, and manually trigger detection rules, along with manage them, see this document and .
Dataphin allows users to define custom rules for detecting sensitive data. These rules are automatically applied to scan data daily, but can also be manually triggered for immediate scanning, aiding in the construction of an effective data security system.
Managing Detection Results and .
Review the results from the detection rules.
For more information on the data masking algorithm and the , see the referenced documents.
Choose a suitable data masking algorithm from those available within Dataphin.
Manage Dynamic Masking Rules and .
Set up masking rules for sensitive fields to protect them.

Notes

Implementing asset security to obscure sensitive data can affect the development, querying, and analysis of such data. The main scenarios impacted include:

Data Query
For instance, local life service providers may experience numerous complaints in a specific area. However, due to data masking, only city-level addresses are visible, which hinders the ability to pinpoint the exact streets where complaints are concentrated, thus affecting operational decision-making.
Production Data Writing to Development Environment, Test Environment Data Preparation
Consider a script task that requires identifying the number of digits in a phone number. If the data table used by the script employs the MD5 masking algorithm, an 11-digit phone number is transformed into a 32-digit string, rendering the script unable to perform its intended function.

Therefore, when utilizing asset security, it is crucial to thoroughly assess both security regulations and business requirements to maintain operational continuity while upholding data compliance and security.