For sensitive data that contains key information in a MaxCompute project, if you expect such sensitive data to be visible only to specific users, you can enable the dynamic data masking feature of MaxCompute. This way, sensitive data can be hidden or replaced in real time when unauthorized users access or view the data. This prevents leaks of sensitive data. This topic describes how to enable the dynamic data masking feature of MaxCompute and provides examples of using the feature for reference.
Background information
The dynamic data masking feature of MaxCompute depends on Data Security Guard of DataWorks. You must activate Data Security Guard of DataWorks before you can enable the dynamic data masking feature for a MaxCompute project.
After you enable the dynamic data masking feature for a MaxCompute project, you can configure data masking rules for the project based on the data identification rules that are configured in DataWorks. The masking rules define the types of data that you want to mask. When you query sensitive data from the MaxCompute client or Logview, the returned data is masked based on the configured masking rules. The dynamic data masking feature can effectively protect sensitive data, such as mobile phone numbers, ID card numbers, bank card numbers, license plate numbers, and IP addresses. After the dynamic data masking feature is enabled, only sensitive data in the query results is masked and the data that is stored at the underlying layer is not affected.
We recommend that you use the data identification rules that are preset in DataWorks. For more information about how to configure custom data identification rules, see Configure sensitive data identification rules.
Limits
You can use the dynamic data masking feature only if DataWorks Professional Edition or a more advanced edition is used. If you use DataWorks Basic Edition, upgrade DataWorks to an appropriate edition based on your business requirements. For more information about differences among DataWorks editions, see Differences among DataWorks editions.
The underlying data masking service of MaxCompute can be used for MaxCompute projects that reside in the following regions: China (Beijing), China (Shanghai), China (Hangzhou), China (Chengdu), China (Shenzhen), China North 2 Ali Gov 1, China East 2 Finance, China (Hong Kong), Singapore, Germany (Frankfurt), Malaysia (Kuala Lumpur), and US (Silicon Valley).
The configuration of the underlying data masking service of MaxCompute takes effect for sessions. When you perform data queries for a session, you must add the SET commands related to masking service calls to make the masking configuration take effect.
The underlying data masking service of MaxCompute cannot be used to mask primary keys in MaxCompute tables.
The underlying data masking service of MaxCompute can be used only to mask fields of the STRING type.
The data masking feature can be used only if data already exists in the MaxCompute project and the data is created for 24 hours.
Preparations
Prepare a MaxCompute project and data for masking. For more information, see Create a MaxCompute project and Import data to tables.
Go to the Data Security Guard page and activate Data Security Guard. For more information, see the "Go to the Data Security Guard page" section in Overview.
On the Terms of Service page, read the terms, select I have read and agree to all the preceding terms, and then click Activate.
Apply for a whitelist.
Use your Alibaba Cloud account to submit a ticket to apply for access to external networks from your MaxCompute project. You can call the data masking service only after the application is approved.
If no access control is imposed on the destination IP address or endpoint, you can use your MaxCompute project to access the destination IP address or endpoint after the application is approved. The application processing period does not exceed three business days.
The following sample code shows the format of the application content.
Project name (name of the project for which you want to enable data masking): data_shield_hz Log address: Description: Enable an endpoint whitelist for the project to ensure that a specific user-defined function (UDF) can access the endpoints in the whitelist. Region: China (Hangzhou) Destination endpoints: dsg-cn-hangzhou.data.aliyun.com and dsg-oss-dic-ori-hz.oss-cn-hangzhou.aliyuncs.com. Port numbers: 80 and 443
The endpoints vary with regions. The following content lists the endpoints that correspond to different regions.
China (Shanghai): dsg-cn-shanghai.data.aliyun.com, dsg-oss-dic-ori.oss-cn-shanghai.aliyuncs.com China (Hangzhou): dsg-cn-hangzhou.data.aliyun.com, dsg-oss-dic-ori-hz.oss-cn-hangzhou.aliyuncs.com China (Beijing): dsg-cn-beijing.data.aliyun.com, dsg-oss-dic-ori.oss-cn-beijing.aliyuncs.com China (Chengdu): dsg-cn-chengdu.data.aliyun.com, dsg-oss-dic-ori-cd.oss-cn-chengdu.aliyuncs.com China (Shenzhen): dsg-cn-shenzhen.data.aliyun.com, dsg-oss-dic-ori-sz.oss-cn-shenzhen.aliyuncs.com China North 2 Ali Gov 1: dsg-cn-north-2-gov-1.data.aliyun.com, dsg-oss-dic-ori-north-2-gov-1.oss-cn-north-2-gov-1-internal.aliyuncs.com China East 2 Finance: dsg-cn-shanghai-finance-1.data.aliyun.com, dsg-oss-dic-ori-sh-fin-1.oss-cn-shanghai.aliyuncs.com China (Hong Kong): dsg-cn-hongkong.data.aliyun.com, dsg-oss-hongkong.oss-cn-hongkong.aliyuncs.com Singapore: dsg-ap-southeast-1.data.aliyun.com, dsg-oss-ap-southeast-1.oss-ap-southeast-1.aliyuncs.com US (Silicon Valley): dsg-us-west-1.data.aliyun.com, dsg-oss-us-west-1.oss-us-west-1.aliyuncs.com Malaysia (Kuala Lumpur): dsg-ap-southeast-3.data.aliyun.com, dsg-oss-ap-malaysia.oss-ap-southeast-3.aliyuncs.com Germany (Frankfurt): dsg-eu-central-1.data.aliyun.com, dsg-oss-eu-central-1.oss-eu-central-1-internal.aliyuncs.com
Enable the data masking feature
Select a data masking scenario.
Log on to the DataWorks console and go to the Data Security Guard page. For more information, see the "Go to the Data Security Guard page" section in Overview.
In the left-side navigation pane, choose .
On the Data Masking page, click Layer masking of the MaxCompute engine in the Masking Scene section.
NoteTo show the data masking effect in the DataWorks console, you must enable masking of displayed data in DataStudio and Data Map in the DataWorks console.
For more information about how to create a data masking scenario, see Create a data masking scenario.
Optional: If the data that is specified by the masking rule does not need to be masked for specific users, configure a masking rule whitelist.
On the Data Masking page, click the Whitelist tab.
In the upper-right corner of the Whitelist tab, click Add Account.
In the Add Account dialog box, configure the Rule, Account, and Effective From parameters.
NoteIf a user account in the whitelist queries data out of the time range that is specified in the whitelist, sensitive data in the query results is still masked.
View the execution results of SQL statements
Use the DataStudio page in the DataWorks console
Turn off the data masking switch. For more information, see the "Go to the Security Settings and Others tab" section in Configure settings on the Security Settings and Others tab.
Execute an SQL statement for data queries.
Before you execute an SQL statement, run SET commands to call the underlying masking service in the current session. The following code shows the SET commands that are used to call the underlying masking service in different regions.
NoteThe underlying data masking service of MaxCompute can be used only at the session level.
China (Shanghai) set odps.output.field.formatter={"name":"aegis:<SchemaName>:masking_v2","param":["alias","index"]}; set odps.isolation.session.enable=true; set odps.internet.access.list=dsg-cn-shanghai.data.aliyun.com:80,dsg-cn-shanghai.data.aliyun.com:443,dsg-oss-dic-ori.oss-cn-shanghai.aliyuncs.com:80,dsg-cn-shanghai.data.aliyun.com:443; China (Hangzhou) set odps.output.field.formatter={"name":"aegis_hz:<SchemaName>:masking_v2","param":["alias","index"]}; set odps.isolation.session.enable=true; set odps.internet.access.list=dsg-cn-hangzhou.data.aliyun.com:80,dsg-cn-hangzhou.data.aliyun.com:443,dsg-oss-dic-ori-hz.oss-cn-hangzhou.aliyuncs.com:80,dsg-oss-dic-ori-hz.oss-cn-hangzhou.aliyuncs.com:443; China (Beijing) set odps.output.field.formatter={"name":"aegis_bj:<SchemaName>:masking_v2","param":["alias","index"]}; set odps.isolation.session.enable=true; set odps.internet.access.list=dsg-cn-beijing.data.aliyun.com:80,dsg-cn-beijing.data.aliyun.com:443,dsg-oss-dic-ori.oss-cn-beijing.aliyuncs.com:80,dsg-oss-dic-ori.oss-cn-beijing.aliyuncs.com:443; China (Chengdu) set odps.output.field.formatter={"name":"aegis_cd:<SchemaName>:masking_v2","param":["alias","index"]}; set odps.isolation.session.enable=true; set odps.internet.access.list=dsg-cn-chengdu.data.aliyun.com:80,dsg-cn-chengdu.data.aliyun.com:443,dsg-oss-dic-ori-cd.oss-cn-chengdu.aliyuncs.com:80,dsg-oss-dic-ori-cd.oss-cn-chengdu.aliyuncs.com:443; China (Hong Kong) set odps.output.field.formatter={"name":"aegis_hk:<SchemaName>:masking_v2","param":["alias","index"]}; set odps.isolation.session.enable=true; set odps.internet.access.list=dsg-cn-hongkong.data.aliyun.com:80,dsg-cn-hongkong.data.aliyun.com:443,dsg-oss-hongkong.oss-cn-hongkong.aliyuncs.com:80,dsg-oss-hongkong.oss-cn-hongkong.aliyuncs.com:443;
The following table describes the parameters in the preceding commands.
Parameter
Description
odps.output.field.formatter
The MaxCompute masking function that you want to call. To use this function, you must make sure that the field that you want to mask is of the STRING type.
aegis_hz:<SchemaName>:masking_v2: the function name.
The SchemaName parameter specifies whether to configure a three-layer schema model for the MaxCompute project. If the three-layer schema model is configured, you must specify the SchemaName parameter. For more information about schemas, see Schema-related operations.
["alias","index"]: the parameters. These are default parameters.
odps.isolation.session.enable
Specifies whether to enable calls at the session level. After the session ends, the data masking feature becomes ineffective.
odps.internet.access.list
The list of endpoints that are accessed when you execute the specified function. The endpoints are used to query the masking information preconfigured in Data Security Guard.
The following code shows a sample script for querying data from a MaxCompute project whose SchemaName is default in the China (Hangzhou) region after the underlying data masking service is enabled for the project.
set odps.output.field.formatter={"name":"aegis_hz:default:masking_v2","param":["alias","index"]}; set odps.isolation.session.enable=true; set odps.internet.access.list=dsg-cn-hangzhou.data.aliyun.com:80,dsg-cn-hangzhou.data.aliyun.com:443,dsg-oss-dic-ori-hz.oss-cn-hangzhou.aliyuncs.com:80,dsg-oss-dic-ori-hz.oss-cn-hangzhou.aliyuncs.com:443; select * from table;
View the masking result on the DataStudio page.
Use the MaxCompute client (odpscmd)
Configure the endpoints.
Before you execute an SQL statement, configure the endpoints that you want to access in the Config file of the MaxCompute client.
The following code shows the endpoints that correspond to different regions.
China (Shanghai) set odps.internet.access.list=dsg-cn-shanghai.data.aliyun.com:80,dsg-cn-shanghai.data.aliyun.com:443,dsg-oss-dic-ori.oss-cn-shanghai.aliyuncs.com:80,dsg-cn-shanghai.data.aliyun.com:443; China (Hangzhou) set odps.internet.access.list=dsg-cn-hangzhou.data.aliyun.com:80,dsg-cn-hangzhou.data.aliyun.com:443,dsg-oss-dic-ori-hz.oss-cn-hangzhou.aliyuncs.com:80,dsg-oss-dic-ori-hz.oss-cn-hangzhou.aliyuncs.com:443; China (Beijing) set odps.internet.access.list=dsg-cn-beijing.data.aliyun.com:80,dsg-cn-beijing.data.aliyun.com:443,dsg-oss-dic-ori.oss-cn-beijing.aliyuncs.com:80,dsg-oss-dic-ori.oss-cn-beijing.aliyuncs.com:443; China (Chengdu) set odps.internet.access.list=dsg-cn-chengdu.data.aliyun.com:80,dsg-cn-chengdu.data.aliyun.com:443,dsg-oss-dic-ori-cd.oss-cn-chengdu.aliyuncs.com:80,dsg-oss-dic-ori-cd.oss-cn-chengdu.aliyuncs.com:443; China (Hong Kong) set odps.internet.access.list=dsg-cn-hongkong.data.aliyun.com:80,dsg-cn-hongkong.data.aliyun.com:443,dsg-oss-hongkong.oss-cn-hongkong.aliyuncs.com:80,dsg-oss-hongkong.oss-cn-hongkong.aliyuncs.com:443;
The following table describes the parameters in the preceding commands.
Parameter
Description
odps.internet.access.list
The list of endpoints that are accessed when you execute the specified function. The endpoints are used to query the masking information preconfigured in Data Security Guard.
The following code is the sample code in the Config file for a MaxCompute project whose SchemaName is default in the China (Hangzhou) region.
project_name=data_shield_hz # app access id and key are optional for individual users # app_access_id=<app_accessid> # app_access_key=<app_accesskey> access_id=AccessKey ID access_key=AccessKey secret # this endpoint is for office environment end_point=http://service.odps.aliyun.com/api # this url is for odpscmd update update_url=http://odps.alibaba-inc.com/official_downloads # download sql results by instance tunnel use_instance_tunnel=true # the max records when download sql results by instance tunnel instance_tunnel_max_record=10000 set odps.internet.access.list=dsg-cn-hangzhou.data.aliyun.com:80,dsg-cn-hangzhou.data.aliyun.com:443,dsg-oss-dic-ori-hz.oss-cn-hangzhou.aliyuncs.com:80,dsg-oss-dic-ori-hz.oss-cn-hangzhou.aliyuncs.com:443;
Execute an SQL statement for data queries.
Before you execute an SQL statement, run SET commands to call the underlying masking service in the current session. The following code shows the SET commands that are used to call the underlying masking service in different regions.
NoteThe underlying data masking service of MaxCompute can be used only at the session level.
China (Shanghai) set odps.output.field.formatter={"name":"aegis:<SchemaName>:masking_v2","param":["alias","index"]}; set odps.isolation.session.enable=true; China (Hangzhou) set odps.output.field.formatter={"name":"aegis_hz:<SchemaName>:masking_v2","param":["alias","index"]}; set odps.isolation.session.enable=true; China (Beijing) set odps.output.field.formatter={"name":"aegis_bj:<SchemaName>:masking_v2","param":["alias","index"]}; set odps.isolation.session.enable=true; China (Chengdu) set odps.output.field.formatter={"name":"aegis_cd:<SchemaName>:masking_v2","param":["alias","index"]}; set odps.isolation.session.enable=true; China (Hong Kong) set odps.output.field.formatter={"name":"aegis_hk:<SchemaName>:masking_v2","param":["alias","index"]}; set odps.isolation.session.enable=true;
The following table describes the parameters in the preceding commands.
Parameter
Description
odps.output.field.formatter
The MaxCompute masking function that you want to call. To use this function, you must make sure that the field that you want to mask is of the STRING type.
aegis_hz:<SchemaName>masking_v2: the function name.
The SchemaName parameter specifies whether to configure a three-layer schema model for the MaxCompute project. If the three-layer schema model is configured, you must specify the SchemaName parameter. For more information about schemas, see Schema-related operations.
["alias","index"]: the parameters. These are default parameters.
odps.isolation.session.enable
Specifies whether to enable calls at the session level. After the session ends, the data masking feature becomes ineffective.
The following code shows a sample script for querying data from a MaxCompute project in the China (Hangzhou) region after the underlying data masking service is enabled for the project.
set odps.output.field.formatter={"name":"aegis_hz:default:masking_v2","param":["alias","index"]}; set odps.isolation.session.enable=true; select * from table;
View the masking result.
Disable the underlying data masking service
Execute the following SQL statements to disable the underlying data masking service:
set odps.output.field.formatter=;
select * from table;
If you configure a data masking scenario in DataWorks, do not select the destination MaxCompute project. For more information, see "Configure a data masking scenario" in Create a data masking scenario.