Dynamic data masking lets users with query access to an E-MapReduce (EMR) cluster see transformed values instead of raw sensitive data — without altering the underlying data. At query time, Data Security Guard intercepts the result and applies masking rules based on the user identity and the masking scenario. This topic walks through an end-to-end example: creating a test table, configuring sensitive data identification and masking rules, and verifying that masked values appear in query results.
How it works
Data Security Guard syncs EMR metadata once per day and uses sensitive data identification rules to classify fields (for example,
phone,email,gender) as sensitive.When a user queries the table in Data Studio or Data Map, Data Security Guard matches the query context to a masking scenario and applies the corresponding masking rule to each sensitive field.
The user sees the masked value (for example,
138****8888) in the result set. The underlying data is unchanged.
Limitations
EMR clusters support only the sensitive data identification and dynamic data masking features of Data Security Guard. Other Data Security Guard features are not supported.
Sensitive data identification and data masking are supported only by specific types of EMR clusters and tables. For details, see the Which types of Hive tables support data preview in Data Map? section in the "Data governance" topic.
Metadata at the Data Security Guard side is updated with a one-day delay. The EMR table to be masked must be created at least one day before you configure masking.
Only exclusive resource groups for scheduling are supported. For details, see Exclusive resource groups for scheduling.
Prerequisites
Before you begin, make sure that you have:
An EMR cluster with the required table permissions
Access to DataWorks Data Security Guard (your Alibaba Cloud account must be granted the required permissions)
(Conditional) If Lightweight Directory Access Protocol (LDAP) or Kerberos authentication is enabled for your EMR cluster and Ranger or DLF-Auth manages table permissions, a mapping configured between your Alibaba Cloud account and the EMR cluster account. The mapped account must have access to the target tables. For details, see Data Studio (legacy version): Associate an EMR computing resource.
Note: By default, Data Security Guard uses the EMR cluster account that maps to your Alibaba Cloud account to sample data.
Masking scenarios in this example
Data Security Guard organizes masking rules into a two-level hierarchy. Level-1 scenarios are fixed and define the display context. Level-2 scenarios are custom scenarios you create under a level-1 scenario.
This example uses two level-2 scenarios to cover the most common query surfaces:
| Level-2 scenario name | Based on level-1 scenario | Best used for |
|---|---|---|
development demonstration | Data development / Data map display desensitization | Previewing data in Data Map and Data Studio |
SQL analysis | Data analysis and display desensitization | Running ad hoc queries and analytical workloads |
Step 1: Create an EMR table
Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select the workspace and click Go to Data Development.
In the DATASTUDIO pane, click the Create icon and choose Create Node > EMR > EMR Hive.
In the node editor, run the following SQL statement to create the
onefall_test_dsgtable:CREATE TABLE IF NOT EXISTS onefall_test_dsg ( username STRING, gender STRING, phone STRING, email STRING, card_no STRING, address STRING, zip_code STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';Import test data into the table.
Download data.csv.
Import the data using one of the following methods:
From an EMR cluster node: Upload
data.csvto a node in the EMR cluster and run:LOAD DATA LOCAL INPATH '/…/data.csv' OVERWRITE INTO TABLE onefall_test_dsg;From an Object Storage Service (OSS) bucket: Upload
data.csvto an OSS bucket and run:LOAD DATA INPATH 'oss://bucket-name.Endpoint/…/data.csv' OVERWRITE INTO TABLE onefall_test_dsg;
Wait until the next day before proceeding. Data Security Guard syncs EMR metadata once per day, so the table must exist for at least one day before masking takes effect.
Step 2: Create sensitive data identification rules
DataWorks uses sensitive data identification rules to classify fields in EMR tables as sensitive. You must publish these rules before configuring masking rules.
In this example, you create rules to identify the gender, phone, and email fields in onefall_test_dsg.
Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select the workspace and click Go to Data Development.
Click the
icon in the upper-left corner, then choose All Products > Data Governance > Data Security Guard. Click Try Now to go to the Data Security Guard page.Note: If your account is not yet granted the required permissions, you are redirected to the authorization page. Complete authorization before proceeding.
In the left-side navigation pane, choose Rule Configuration > Sensitive Data Identification. The Data Identification Rules tab appears.
In the BuildInClassificationTemplate section on the left, select the data category for the sensitive field types you want to create. For details, see Configure sensitive data detection rules and run tasks.
In the upper-right corner of the tab, click Sensitive Field Type to open the configuration panel. Create a sensitive field type for each of
gender,phone, andemail. Using the field names fromonefall_test_dsgas the type names makes them easier to identify. For details, see Configure sensitive data detection rules and run tasks.After creating the rules, click Batch Publish in the upper-right corner and select all three rules to publish them.

Step 3: Configure data masking rules
For each of the three sensitive field types (gender, phone, email), configure a masking rule that applies to both level-2 scenarios. The masking mode differs by field:
| Field | Sensitive field type | Data masking rule name | Masking scenarios | Masking mode |
|---|---|---|---|---|
gender | gender | gender | development demonstration, SQL analysis | Characters to replace → Replace with random value |
email | development demonstration, SQL analysis | Hash → Encryption algorithm: MDS, Salt value: 5, Data watermarking: off | ||
phone | phone | phone | development demonstration, SQL analysis | Masking out → Redaction mode: Show first three and last four characters |
Note: This example uses three different masking modes to illustrate the options. For a full description of each mode, see the Configure the data masking method section in "Create a data masking rule".
To configure the rules:
Log on to the DataWorks console and go to the Data Security Guard page. For details, see Overview.
Click Try Now to open the Data Security Guard homepage.
In the left-side navigation pane, choose Rule Configuration > Data Masking Management.
Create the two level-2 scenarios. For details, see Create a data masking scenario.
For each of the three sensitive field types, click Masking Rule in the upper-right corner and configure the rule using the values in the table above. For general instructions, see Create a data masking rule.
Step 4: Run the sensitive data identification task
After publishing the identification rules, run the task manually to classify fields in onefall_test_dsg without waiting for the next scheduled sync.
In the left-side navigation pane of Data Security Guard, choose Rule Configuration > Sensitive Data Identification.
In the upper-left corner, click Run Task. In the Enable sensitive data identification tasks panel, configure the following parameters:
Parameter Value for this example Task type Manual Tasks Account used for identification Alibaba Cloud Account Content identification Content recognition Sampling quantity 100 (default) Scanning range Partial data — select the workspace and database containing onefall_test_dsg
Click Run in the lower-right corner to start the task.
To view the task progress and results, go to the Task Execution Records tab on the Sensitive Data Identification page.
Verify the masking results
After the identification task completes, the gender, phone, and email fields are masked based on the rules you configured. The following table shows the expected transformation for each field:
| Field | Raw value (example) | Masked value |
|---|---|---|
gender | Male | A random replacement value |
phone | 13812348888 | 138****8888 |
email | user@example.com | A hashed string (MDS algorithm) |
Preview masked data in Data Map
Log on to the DataWorks console. In the left-side navigation pane, choose Data Governance > Data Map, then click Go to Data Map.
In the left-side navigation pane of the Data Map page, click the
icon. In the top navigation bar dropdown, select E-MapReduce. Enter onefall_test_dsgin the search box.Click the table name to open the table details page, then click the Data Preview tab.

The gender, phone, and email fields are masked based on the rules configured in Step 3.
View masked data in Data Studio
Masking in Data Studio page query results is controlled by the Mask Data in Page Query Results setting. Enable it before testing:
Log on to the DataWorks console. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select the workspace and click Go to Data Development.
In the left-side navigation pane of the Data Studio page, click the
icon to open the Settings page.Click Security Settings and Others, then turn on Mask Data in Page Query Results in the Data Security section.
Test with an ad hoc query
Log on to the DataWorks console. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select the workspace and click Go to Data Development.
In the left-side navigation pane, click the
icon. In the Ad Hoc Query pane, hover over the
icon and choose Create > EMR Hive.Run the following query:
SELECT * FROM onefall_test_dsg;The
gender,phone, andemailfields in the result set are masked.
What's next
To learn more about sensitive data identification rule configuration, see Configure sensitive data detection rules and run tasks.
To create additional masking rules or scenarios, see Create a data masking rule and Create a data masking scenario.
To apply masking to production workloads, make sure your exclusive resource group for scheduling is configured. See Exclusive resource groups for scheduling.