If a user has the permissions to query specific sensitive data in a MaxCompute project but you do not want the user to view complete sensitive data, you can enable the dynamic data masking feature of MaxCompute. This way, MaxCompute can dynamically mask sensitive data in the query results. This topic describes how to enable the dynamic data masking feature of MaxCompute and provides an example.

Background information

The dynamic data masking feature of MaxCompute depends on Data Security Guard of DataWorks. You must activate Data Security Guard of DataWorks before you can enable the dynamic data masking feature for a MaxCompute project.

After you enable the dynamic data masking feature for a MaxCompute project, you can configure data masking rules for the project based on the data identification rules that are preconfigured in DataWorks. After you configure data masking rules, sensitive data in the query results is masked based on the rules when you query data in MaxCompute. The dynamic data masking feature can effectively protect sensitive information, such as mobile phone numbers, ID card numbers, bank card numbers, license plate numbers, and IP addresses. After the dynamic data masking feature is enabled, only sensitive data in the query results is masked and the data that is stored in the underlying storage is not affected.

We recommend that you use the data identification rules that are preconfigured in DataWorks. For more information about how to customize data identification rules, see Manage sensitive field types.

Limits

The dynamic data masking feature can be configured only for a MaxCompute project that resides in the China (Shanghai) region.

Procedure

To use the dynamic data masking feature in a MaxCompute project, perform the following steps:

  1. Step 1: Activate Data Security Guard of DataWorks
  2. Step 2: Enable the dynamic data masking feature for the MaxCompute project
  3. Step 3: Use the dynamic data masking feature

Step 1: Activate Data Security Guard of DataWorks

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. In the top navigation bar, select the region in which the workspace that you want to manage resides. Find the workspace and click Data Development in the Actions column.
  4. Click the Icon icon in the upper-left corner and choose All Products > Data governance > Data Security Guard.
  5. Click Try now to go to the Data Security Guard page.
    Note
    • If you have activated Data Security Guard by using your Alibaba Cloud account, the Data Security Guard homepage appears.
    • If you have not activated Data Security Guard by using your Alibaba Cloud account, the page for activating Data Security Guard appears.
  6. On the Terms of Service page, read the terms, select I have read and agree to all the preceding terms, and then click Activate.
    Notice You must use your Alibaba Cloud account to activate Data Security Guard.

Step 2: Enable the dynamic data masking feature for the MaxCompute project

  1. Add the MaxCompute project for which you want to mask sensitive data.
    1. Log on to the DataWorks console. Go to the Data Security Guard page.
    2. In the left-side navigation pane, choose Rule Change > Data Masking.
    3. On the Data Masking page, select MaxCompute Config(maxcompute_desense_code) from the Masking Scene drop-down list and click Select Desensitization Project next to Masking Scene.
    4. In the Authorize The Desensitization Of Account dialog box, select the project for which you want to mask sensitive data in the Not Desensitized Project section and click the rightwards arrow to display the project in the Desensitized project section. Then, select I agree to authorize data protection umbrella to desensitize the maxcompute underlying layer of the above projects and click OK.
  2. Create a masking rule.
    Note When you create a masking rule, we recommend that you select a sensitive field type whose Identification rule definition method is Predefined.
  3. Optional:If the data that is specified by the masking rule does not need to be masked for specific users, configure a masking rule whitelist.
    1. On the Data Masking page, click the Whitelist tab.
    2. In the upper-right corner of the Whitelist tab, click Add Account.
    3. In the Add Account dialog box, configure the Rule, Account, and Effective From parameters.
      Note If a user in the whitelist queries data out of the time range that is specified in the whitelist, sensitive data in the query results is masked.
  4. Log on to the MaxCompute client as the project owner or the project administrator. Then, run the following command to install the masking package:
    install package aegis.aegis_package;
    You can run the show packages; command to view the masking packages that are installed on the MaxCompute client. Example:
    +-------------+------------+
    | PackageName | CreateTime |
    +-------------+------------+
    +---------------+--------------------+--------------------------+--------+
    | PackageName   | SourceProject      | InstallTime              | Status |
    +---------------+--------------------+--------------------------+--------+
    | aegis_package | aegis              | 2022-02-24T11:19:34+0800 | OK     |
    +---------------+--------------------+--------------------------+--------+
    | systables     | information_schema | 2021-04-12T16:29:14+0800 | OK     |
    +---------------+--------------------+--------------------------+--------+

Step 3: Use the dynamic data masking feature

You can enable the dynamic data masking feature at the session or project level.
  • If you want to enable the dynamic data masking feature at the session level, you can add the following masking settings before the SQL statement and commit them together with the SQL statement. Sensitive data in the result of the SQL statement is masked.
    set odps.output.field.formatter=aegis:masking_v1;
    set odps.isolation.session.enable=true;
  • If you want to enable the dynamic data masking feature at the project level, you can execute the following statement to configure the masking parameters for your project. After the configuration is complete, sensitive data in the result of all the SQL statements in the project is masked.
    setproject odps.output.field.formatter=aegis:masking_v1;

Example

After the dynamic data masking feature is enabled for a MaxCompute project, sensitive data in the query results can be masked when you query data in MaxCompute. For example, an IP address masking rule is configured for Project A. The iptest table of Project A contains the following data:

+--------+-------------+
| name   | ip         |
+--------+-------------+
| a      | 192.0.2.10  |
| b      | 198.51.2.0  |
+--------+-------------+
You can execute the following statements to perform a query for masking sensitive data:
set odps.output.field.formatter=aegis:masking_v1;
set odps.isolation.session.enable=true;
select * from iptest;
The following result is returned:
+--------+----------------+
| name   | ip            |
+--------+----------------+
| a      | 192.0.***.*    |
| b      | 198.51.***.*   |
+--------+----------------+