This topic describes how to create a data masking rule in Data Security Guard so that DataWorks can dynamically and statically mask sensitive data in the results of ad hoc queries.
Prerequisites
- DataWorks Professional Edition or a more advanced edition is activated. For more information, see Differences among DataWorks editions.
- Mask Data in Page Query Results is turned on for your workspace in the DataWorks console. For more information, see Enable data masking for workspaces on the Security Settings and Others tab.
- The resource package of the masking_v2 method is obtained. The method can be used to perform underlying data masking on MaxCompute projects. For more information, see Appendix: Use the masking_v2 method to perform underlying data masking on MaxCompute projects. Note Only the following regions support underlying data masking on MaxCompute projects: China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), China (Hong Kong), Singapore, and Germany (Frankfurt). You must submit a ticket before you can perform underlying data masking on MaxCompute projects.
Background information
DataWorks supports dynamic data masking and static data masking.Type | Description | Data masking scenario |
---|---|---|
Dynamic data masking | DataWorks masks sensitive data in query results. | DataWorks provides several data masking scenarios such as Global Config, DataWorks Studio Config, Hologres Config, DataWorks Analysis Config, and MaxCompute Config. These are typical scenarios of dynamic data masking. For more information, see Create a data masking rule in the Global Config scenario. |
Static data masking | DataWorks masks sensitive data before sensitive data is stored in a database. | DataWorks provides the DataWorks Data Integration Config scenario. This is a typical scenario of static data masking. For more information, see Create a data masking rule in the DataWorks Data Integration Config scenario. |
Select a data masking scenario
- Go to the Data Security Guard page. For more information, see Overview.
- In the left-side navigation pane, choose . On the Data Masking page, select a data masking scenario from the Masking Scene drop-down list based on your business requirements. DataWorks provides multiple scenarios. You can also create a custom scenario.
- Global Config: The data masking rules and whitelists that are configured in the Global Config scenario will take effect in other scenarios, such as DataWorks Studio Config, Hologres Config, DataWorks Analysis Config, and MaxCompute Config.
- DataWorks Studio Config:
- After you configure data masking rules, the sensitive data that you query on the DataStudio page is masked.
- After you configure data masking rules, the sensitive data that you preview on the DataMap page is masked.
- After you configure data masking rules, the sensitive data that you query on the DataStudio page is masked.
- DataWorks Analysis Config: After you configure data masking rules, the sensitive data that you query on the SQL Query and SQLNotes pages of DataAnalysis is masked.
- Hologres Config: After you configure data masking rules, the sensitive data that you query from Hologres databases on the DataStudio and HoloStudio pages is masked. The data masking rules that are configured in the Hologres Config scenario take effect only in workspaces in the China (Hangzhou) and China (Beijing) regions. By default, the rules are not enabled in the Hologres Config scenario. To enable the rules, submit a ticket. Note Hologres does not support pseudonymization. If you configure a data masking rule that uses the pseudonymization method in the Global Config scenario, the sensitive data that you query from Hologres databases is masked with multiple asterisks (***).
- MaxCompute Config: After you configure data masking rules, the sensitive data that you query from MaxCompute projects by using all methods is masked. The data masking rules that are configured in the MaxCompute Config scenario take effect only in workspaces in the China (Shanghai) region. For more information about how to enable the dynamic data masking feature in the MaxCompute Config scenario, see Dynamic data masking.
- Custom data masking scenario: You can create a custom data masking scenario by performing the following steps: Click Masking Scene at the bottom of the Masking Scene drop-down list. In the New dialog box, configure the Scene Name and Scene Code parameters. The scenario name can contain only letters, digits, underscores (_), and hyphens (-). The scenario code can contain only digits and letters.
- Create a data masking rule. After you select a data masking scenario, you can create a data masking rule in this scenario. The following list provides the links to the sections that describe how to create data masking rules in different scenarios:
- For more information about how to create a dynamic data masking rule in scenarios such as Global Config, DataWorks Studio Config, or Hologres Config, see Create a data masking rule in the Global Config scenario.
- For more information about how to create a static data masking rule in the DataWorks Data Integration Config scenario, see Create a data masking rule in the DataWorks Data Integration Config scenario.
Create a data masking rule in the Global Config scenario
The following example shows how to create a data masking rule in the Global Config scenario. To create a rule in the Hologres Config, DataWorks Studio Config, DataWorks Analysis Config, or MaxCompute Config scenario, you can also follow the steps in this example.
- On the Data Masking page, set the Masking Scene parameter to Global Config(_default_scene_code).
- Optional. Select one or more MaxCompute projects or Hologres databases and authorize Data Security Guard to mask data for the MaxCompute projects or Hologres databases. Click Select Desensitization Project or Select desensitization database. In the dialog box that appears, select one or more projects or databases, click the rightwards arrow to add them to the section on the right, and select the option button.Note This step is required only in the Hologres Config and MaxCompute Config scenarios.
- Create a data masking rule.
- Configure a whitelist.
- After you create and configure the data masking rules, the sensitive data that you query on specific pages such as DataStudio, DataMap, and DataAnalysis is masked based on the rules. For more information, see Select a data masking scenario.
Create a data masking rule in the DataWorks Data Integration Config scenario
- On the Data Masking page, set the Masking Scene parameter to DataWorks Data Integration Config(dataworks_data_integration_desense_code).
- Create a data masking rule.
- After you create a data masking rule, you can add the rule when you create and configure a real-time synchronization node for data in a single table. For more information, see Configure data de-identification.
Appendix: Use the masking_v2 method to perform underlying data masking on MaxCompute projects
The dsg_fin_demo project is the project on which you want to perform underlying data masking.
- Submit a request for adding the IP addresses or endpoints of Data Security Guard and Object Storage Service (OSS) to the whitelist of the dsg_fin_demo project. Fill out a request form by using an Alibaba Cloud account.
If external access to the IP address or endpoint of the project is not restricted, Data Security Guard and OSS can access the dsg_fin_demo project after the request is approved. The request processing period does not exceed three business days.
Request content:The IP address or endpoint of the project varies based on regions. If the IP address or endpoint of the project that Data Security Guard and OSS want to access is not included in the following regions, you can submit a ticket to Data Security Guard. Ports 80 and 443 are used.Project name (the name of the project whose data you want to mask): dsg_fin_demo Log address: Request reason: Add the IP addresses or endpoints of Data Security Guard and OSS to the whitelist of the dsg_fin_demo project to enable the created function to access the IP address or endpoint of the project when the function is run. Region: China (Shanghai) IP addresses or endpoints that want to access the project: dsg-cn-shanghai.data.aliyun.com, dsg-oss-dic-ori.oss-cn-shanghai.aliyuncs.com Ports: 80 and 443
China (Shanghai): dsg-cn-shanghai.data.aliyun.com, dsg-oss-dic-ori.oss-cn-shanghai.aliyuncs.com China (Hangzhou): dsg-cn-hangzhou.data.aliyun.com, dsg-oss-dic-ori-hz.oss-cn-hangzhou.aliyuncs.com China (Beijing): dsg-cn-beijing.data.aliyun.com, dsg-oss-dic-ori.oss-cn-beijing.aliyuncs.com China (Chengdu): dsg-cn-chengdu.data.aliyun.com, dsg-oss-dic-ori-cd.oss-cn-chengdu.aliyuncs.com China (Shenzhen): dsg-cn-shenzhen.data.aliyun.com, dsg-oss-dic-ori-sz.oss-cn-shenzhen.aliyuncs.com China North 2 Ali Gov: dsg-cn-north-2-gov-1.data.aliyun.com, dsg-oss-dic-ori-north-2-gov-1.oss-cn-north-2-gov-1-internal.aliyuncs.com China East 2 Finance: dsg-cn-shanghai-finance-1.data.aliyun.com, dsg-oss-dic-ori-sh-fin-1.oss-cn-shanghai.aliyuncs.com China (Hong Kong): dsg-cn-hongkong.data.aliyun.com, dsg-oss-hongkong.oss-cn-hongkong.aliyuncs.com Singapore: dsg-ap-southeast-1.data.aliyun.com, dsg-oss-ap-southeast-1.oss-ap-southeast-1.aliyuncs.com US (Silicon Valley): dsg-us-west-1.data.aliyun.com, dsg-oss-us-west-1.oss-us-west-1.aliyuncs.com Malaysia (Kuala Lumpur): dsg-ap-southeast-3.data.aliyun.com, dsg-oss-ap-malaysia.oss-ap-southeast-3.aliyuncs.com Germany (Frankfurt): dsg-eu-central-1.data.aliyun.com, dsg-oss-eu-central-1.oss-eu-central-1-internal.aliyuncs.com
- You can submit a ticket to Data Security Guard for performing underlying data masking on the desired MaxCompute project.
- Go to the Data Masking tab in Data Security Guard to select the MaxCompute projects whose data you want to mask. Access the Data Masking page of a workspace in which you want to run the masking_v2 function within a tenant, select the MaxCompute Config data masking scenario, and then select the MaxCompute projects whose data you want to mask.Note If the MaxCompute project whose data you want to mask is not added to the Masked Projects section, an error occurs when the masking_v2 function is run.
- Execute SQL statements to check whether data masking is successful.
- Disable underlying data masking on MaxCompute projects. Execute the following statement to disable underlying data masking that is performed by using the masking_v2 method:
Move the projects whose data you do not want to mask to the Unmasked Projects section. This way, underlying data masking will not be performed on the projects.set odps.output.field.formatter=; select * from table;