If a user has permissions to query specific sensitive data in an E-MapReduce (EMR) cluster but you don't want them to view complete sensitive data, you can enable the dynamic data masking feature for EMR to dynamically mask sensitive data in query results. This topic describes how to enable the dynamic data masking feature for EMR and provides examples for reference.
Limits
EMR clusters support only the sensitive data identification and data masking features of Data Security Guard.
The sensitive data identification and data masking features are supported only by specific types of EMR clusters and tables. For more information, see Which types of Hive tables can be previewed in Data Map?.
The metadata at the Data Security Guard side is updated with a delay of one day. If you want to mask EMR data, the EMR data that you want to mask must be created one day earlier.
Only exclusive resource groups for scheduling are supported. For more information, see Exclusive resource groups for scheduling.
Preparations
Prerequisites
By default, Data Security Guard uses the EMR cluster account that maps to your Alibaba Cloud account to sample data. If Lightweight Directory Access Protocol (LDAP) or Kerberos authentication is enabled for your EMR cluster and Ranger or DLF-Auth is used to manage table permissions, you must configure a mapping between the Alibaba Cloud account and the EMR cluster account. You must ensure that the mapped EMR cluster account has the required permissions to access tables in the EMR cluster. For more information, see DataStudio (old version): Associate an EMR computing resource.
Prepare data
Create an EMR table
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
On the Data Development page, click Create and select to create a Hive node.
Modify the node code and create the
onefall_test_dsgtable.CREATE TABLE IF NOT EXISTS onefall_test_dsg ( username STRING ,gender STRING ,phone STRING ,email STRING ,card_no STRING ,address STRING ,zip_code STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY',' ;Import test data to the
onefall_test_dsgtable.Download the test data file data.csv.
Import the test data.
Upload the data.csv file to a node in the EMR cluster and execute the following SQL statement to load the test data:
LOAD DATA LOCAL INPATH '/…/data.csv' OVERWRITE INTO TABLE onefall_test_dsg;Upload the data.csv file to an Object Storage Service (OSS) bucket and execute the following SQL statement to load the test data:
LOAD DATA INPATH 'oss://bucket-name.Endpoint/…/data.csv' OVERWRITE INTO TABLE onefall_test_dsg ;
Update metadata at the Data Security Guard side
The metadata at the Data Security Guard side is updated with a delay of one day. After you create and publish the onefall_test_dsg table, you must wait until the next day before you perform the data masking operation.
Configure data masking
Step 1: Create a sensitive data identification rule
DataWorks uses sensitive data identification rules to identify sensitive fields in E-MapReduce tables. You must configure sensitive data identification rules before you configure data masking rules. For more information, see Configure a sensitive data identification rule and run a sensitive data identification task.
Go to the Data Identification Rules tab
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Click the
icon in the upper-left corner. Then, choose . On the page that appears, click Try Now to go to the Data Security Guard page. NoteIf your Alibaba Cloud account is granted the required permissions, you can directly access the homepage of Data Security Guard.
If your Alibaba Cloud account is not granted the required permissions, you are redirected to the authorization page of Data Security Guard. You can use the features of Data Security Guard only after your Alibaba Cloud account is granted the required permissions.
In the navigation pane on the left, click . The Data Detection Rules page appears.
Configure sensitive data identification rules
This example shows how to create a sensitive data detection rule to identify and desensitize the gender, phone, and email fields in the onefall_test_dsg table created in the Data Preparation module.
Specify a data category for the sensitive field types that you want to create.
In the Built-in Classification Template area on the left, select the data category of the newly added sensitive field. For more information, see Configure a sensitive data identification rule and run a sensitive data identification task.
Create a sensitive field type and configure a sensitive data identification rule for this type.
In the upper-right corner, click Sensitive Field Type. The sensitive data identification rule configuration page is displayed. For more information, see Configure a sensitive data identification rule and run a sensitive data identification task.
NoteTo help you understand sensitive field types, you can configure them as the
onefall_test_dsgtable's field names:gender,phone, andemail.After you configure the Data Identification Rules, click Batch Publish in the upper-right corner and select the created rules to publish them.

Step 2: Configure data masking management
DataWorks lets you configure data masking rules to mask sensitive fields in E-MapReduce tables. For more information, see Create a data masking rule.
Go to the Data Masking Management page
Log on to the DataWorks Console and go to the Data Security Guard page. For more information, see Data Security Guard.
Click Try Now. The Data Security Guard Homepage appears.
In the navigation pane on the left, click . On the Data Masking Management page, you can create a new scenario type and configure data masking rules.
Create a data masking scenario
DataWorks provides several fixed, level-1 data masking scenarios. These include dynamic data masking scenarios, such as Data Development/Data Map Display Masking, Data Analysis Display Masking, MaxCompute Engine-layer Masking, and Hologres Engine-layer Masking, and the static data masking scenario of Data Integration Static Masking. You cannot add, edit, or delete these built-in scenarios. However, you can create custom level-2 scenarios based on the level-1 scenarios to meet your business requirements. For more information, see Create a data masking scenario.
This example focuses on the Data Development/Data Map Display Masking and Data Analysis Display Masking scenarios.
Level-2 scenario name under Data Development / Data Map Display Desensitization:
Development Display.The level-2 scenario name for Data Analysis And Display Desensitization is
SQL analysis.
Create a data masking rule
After you create a data masking scenario, you can click Masking Rule in the upper-right corner to create a data masking rule. Repeat the steps to create data masking rules for the gender, phone, and email sensitive field types. For more information, see Create a data masking rule.
Select a data masking scenario.
On the Data Masking Management page, select Masking Scenario as , and click + Masking Rule on the right.
Create a data masking rule.
On the Create Data Masking Rule page, you can configure items such as Sensitive Field Type, Data Masking Rule Name, Data Masking Scenario, and Data Masking Method. For more information, see Data Masking Rule Configuration.
The following table describes the configuration of the data masking rule for each created sensitive field type.
Parameter
Description
gender
email
phone
Sensitive Field Type
gender
email
phone
Data Masking Rule Name
gender
email
phone
Data Masking Scenario
development demonstration,SQL analysisdevelopment demonstrationandSQL analysisdevelopment demonstration,SQL analysisMasking Mode
Characters to replace
Replacement Position
Replace All
Replacement Position
Replace with Random Value
HASH encryption
Data watermark
Turned Off
Encryption algorithm
MDS
Salt value
5
Redaction
Masking Method
NoteMultiple data masking methods are available. This example uses Characters To Replace, HASH, and Masking Out. For more information, see Configure the data masking method.
Step 3: Enable sensitive data identification
After Data Security Guard in the production environment obtains the EMR metadata every day, Data Security Guard calls the DataWorks API operations to obtain the sample data of the table and identify sensitive fields based on the sensitive data identification rules. In this example, you can manually enable the sensitive data identification rules to identify sensitive fields.
In the navigation pane on the left, click . The Sensitive Data Identification page appears.
In the upper-left corner of the Sensitive Data Identification page, click Run Task. In the Enable Sensitive Data Identification Task panel, configure the parameters.
Task Type: one-time task.
Account Used For Identification: The current account is used to sample and scan data. The range of data that can be sampled varies based on the account permissions. In this example, Alibaba Cloud Account is selected.
Content Identification: Set it to Content recognition or metadata recognition. In this example, Content recognition is selected.
Sampling Quantity: You can specify a custom number of samples. We recommend that you use the default value of 100.
Scan Scope: Set to Custom Scope to specify the projects or databases to be scanned.

In this example, the table name is
onefall_test_dsg.
After you select the scanning range, click Run in the lower-right corner of the panel to start the sensitive data identification task.
NoteOn the Sensitive Data Identification page, you can click Task Execution Records to view the execution details of the sensitive data identification task.
View the execution results of SQL statements
Preview the data masking result of the EMR table
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, click Go to Data Map.
Click the
button on the left. On the search page that appears, click the drop-down list at the top of the page and select the E-MapReduce data source. Then, enter the table name onefall_test_dsgin the search box.Click the name of the onefall_test_dsg table to go to the details page of the table. Then, click the Data Preview tab to preview the table data.

On the Data Preview tab, the fields in the table are masked based on the configured sensitive data identification rules and data masking rules.
View the data masking result on the Data Studio page
Whether you can view the data masking result on the Data Studio page is controlled by the configuration of the Mask Data in Page Query Results parameter in the Data Security section on the Security Settings and Others tab in Data Studio. You can perform the following steps to configure the parameter:
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
In the navigation pane on the left of the Data Studio page, click the
icon. The Settings page appears.On the Settings page, click Security Settings And Others and turn on the switch.
Test the masking effect of queried data
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
In the navigation pane on the left, click the
icon. In the Ad Hoc Query pane, click the
icon and select to create an ad hoc query node.Query the
onefall_test_dsgtable in the node and view the masking effect of the table on the Data Development page.SELECT * FROM onefall_test_dsg;