Data masking is a key data security measure. DataWorks supports static data masking, dynamic data masking, and engine-level masking to help organizations protect sensitive data in various scenarios. You can configure specific masking rules and policies to precisely mask sensitive data, which ensures data security during data development and analysis.
Function introduction
The data masking feature protects sensitive data during its use and transfer and prevents direct exposure in unauthorized scenarios. Based on data classification and grading results, data masking applies various masking or transformation algorithms to identified sensitive data. This ensures that data is presented securely in different scenarios.
Static data masking
Function: Permanently replaces and masks sensitive data when it is written from a source to a destination. The masked data is stored in the destination data source, and the raw data is permanently removed.
Scenario: This method is primarily used for real-time sync tasks in DataWorks Data Integration. For example, you can mask real user data from a production database and then sync the masked data to a developer or test environment for secure use.
Features:
Physical replacement: The masking result is permanent, which provides high security.
Digital watermarking: Supports the embedding of invisible digital watermarks during the masking process. If a data leak occurs, the watermark can be used to trace the source of the leak.
Dynamic data masking
Function: Masks sensitive data in real-time based on preset policies when a user queries or accesses the data. The physically stored raw data remains unchanged. Different users see different results when they access the same data.
Scenario: This method is used to control data visibility for different user roles in a production environment. For example, when a customer service representative queries the user table, a phone number is displayed as "138****1234". Their supervisor, however, can see the complete phone number.
Features:
On-demand masking: Does not change the raw data. This provides high flexibility and balances data security with business availability.
Multilayer protection:
Application layer masking: The policy takes effect only when data is accessed through specific DataWorks modules.
Engine-level masking (MaxCompute/Hologres): The policy is enforced at the database engine layer. It takes effect regardless of the access tool used and has the highest priority.
Core configuration: For both static and dynamic data masking, you must create masking rules. A rule specifies a masking method, such as hashing, masking, or replacement, for a specific data type, such as a phone number. For dynamic data masking, you must also configure masking policies. These policies define which users trigger the rules and under what conditions.
Limitations
Applicable users: This feature is available to users of DataWorks Professional Edition or Enterprise Edition. You must also enable the new data security features for DataWorks in Security Center.
Supported regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), and Japan (Tokyo).
Supported compute engines: MaxCompute and Hologres.
Prerequisites
The Alibaba Cloud account or a RAM user that you use must meet one of the following conditions:
The Alibaba Cloud account or RAM user is attached with the AliyunDataWorksFullAccess policy.
The Alibaba Cloud account or RAM user is assigned the tenant security administrator role of DataWorks.
The Alibaba Cloud account or RAM user is assigned the tenant administrator role of DataWorks.
You have completed the steps in the New user guide.
Feature entry point
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, click Go to Security Center.
In the navigation pane on the left, choose .
Dynamic data masking policies
Add a masking rule
DataWorks industry templates include predefined masking rules for common data types. To create custom masking rules, you must first disable the corresponding rules for those data types in the industry templates.
On the Data desensitization page, click the Dynamic desensitization tab.
On the Dynamic Data Masking tab, click the Rules tab.
In the upper-left corner, click the New Rule button to configure a dynamic masking method. The key parameters are described below:
Field
Description
Data Type
The data type to be masked.
Desensitization mode
The method used to mask the data when a user accesses this data type.
NoteEnter Raw Data to check if the Data after desensitization meets your expectations.
Apply to desensitization strategy
The scope where this masking rule applies. The scope includes user, feature scope, and data.
After you finish the configuration, click Confirm to save the masking rule.
Add a data masking policy
Click the Dynamic desensitization tab, and then click the Desensitization strategy tab.
In the upper-left corner, click the New Policy button to configure the policy.
Configure the Effective Conditions.
A masking rule is triggered when its effective conditions are met. The parameters are described below:
Configuration Item
Description
Policy Name
The name of the masking policy.
User Scope
The policy can apply to all users or specific users.
DataWorks Function
The policy takes effect when sensitive data is accessed through specified DataWorks features: Data Map, DataAnalysis, or Data Studio.
Covered Items
The masking rule takes effect when a user accesses sensitive data in the specified projects or databases.
Data Type
The masking rule takes effect when a user accesses the specified sensitive data types. The rule includes one or more sensitive data types.
ImportantA masking rule for this data type must be configured and enabled first.
Configure the Exception conditions (whitelist).
A masking rule is not triggered if its exception conditions are met. The parameters are described below:
Configuration Item
Description
Data Type
The data type of the target data. When users access data of these types, masking is not performed.
NoteA masking rule must be configured and enabled for the data type first.
Whitelisted users
One or more RAM users or user groups. When these users access the specified data types, masking is not performed.
Effective Time Range
The effective period for the exception condition (whitelist), which you can set to a specific Time period or make Permanent.
Adjust the policy priority: In the Operation column, click More and select Move Up or Move Down to change the matching order of the masking policies.
Enable dynamic data masking
For data types that have masking rules configured in an enabled workspace, DataWorks evaluates the masking policies in order and applies the first policy that is hit.
You must enable dynamic data masking for the workspace. After this feature is enabled, the dynamic data masking policies for Data Development and DataAnalysis take effect.
On the Dynamic desensitization tab, click the Workspace Management tab.
On the Workspace Management tab, you can enable or disable a single workspace in the Status column. You can also select multiple workspaces and click Batch Enable or Batch Disable in the lower-left corner.
Engine-level masking
Engine-level masking is supported for MaxCompute and Hologres. The configuration process is similar to that for dynamic desensitization policies, but the supported masking algorithms are different. For more information, see Dynamic data masking policies.
Static data masking
Static data masking physically replaces sensitive data when it is written to a destination. The masked data permanently overwrites the raw data.
Static data masking rules apply only to real-time sync tasks in DataWorks Data Integration where this feature is configured. This feature is enabled by default. You can disable it if necessary.
On the Data desensitization page, click the Static desensitization tab.
In the upper-left corner, click the New Rule button to configure a static masking rule. The key parameters are described below:
Configuration Item
Description
Data Type
Select the data type to apply the masking rule to, such as "Bank Card Number". You can select an existing type or add a new one.
Desensitization rule name
Give the rule a clear and understandable name.
Desensitization mode
Select a masking algorithm.
Masking: Precisely define the character ranges to mask or preserve. For example,
From left to right,maskcharacters 1 to 2, anddo not maskcharacters 3 to 4.Hashing: Set a salt value to increase the security of the hash encryption.
Custom Format-preserving Transformation: Set the masking feature value and the character set for replacement.
Data watermark
If enabled, the system embeds an invisible watermark into the data during the masking operation. This watermark can be used to trace sensitive data. For more information, see Data Traceability.
Enabled
Select Enable Now or Not Enabled. Only enabled rules take effect in relevant sync tasks.
Effect Verification
A verification tool is provided. Enter sample data in the Raw Data box, click Verify now, and check if Data after desensitization meets your expectations.
After you finish the configuration, click Confirm to save the rule.