Use DSC masking algorithms to mask, encrypt, or replace sensitive data and save the masked data -

Overview

The following table shows an example of masked data.

Raw data			Masked data
Name	Mobile phone number	ID card number	Name	Mobile phone number	ID card number
Zhang Sansan	13900001234	111222190002309000	Zhang**	139****1234	111###########9000
Li Sisi	13900001111	150802202207214000	Li**	139****1111	150###########4000
Wang Wuwu	13900002222	120105195001066000	Wang**	139****2222	120###########6000

To achieve this result, follow these four steps:

Create OSS buckets and upload a file: Create a source bucket and a destination bucket, and then upload a tabular file containing sensitive data to the source bucket.
Authorize DSC to access the OSS buckets: Grant DSC permissions to read data from and write data to the OSS buckets.
Create a data masking task: Create a task to configure the masking algorithm and rules for sensitive fields in the source file and specify the storage location for the masked file.
Start the data masking task: Run the task to mask sensitive data in the tabular file stored in the source bucket and save the masked file to the destination bucket.

Prerequisites

You have purchased a Data Security Center instance and granted it the required permissions to access other Alibaba Cloud services.

The data masking feature is available only in the Enterprise Edition of DSC. Because this tutorial focuses on masking data in OSS files, you only need to enable the OSS Data Management service and select the minimum specification for OSS Protection Capacity. You can disable the Database Management and Value-added Module services.
You have activated Object Storage Service (OSS).

Step 1: Create OSS buckets and upload a file

1.1 Create the source and destination OSS buckets

In the Object Storage Service (OSS) console, go to the Buckets page and click Create Bucket.
In the Create Bucket panel, configure the following parameters, leave the others at their defaults, and then click Create. This bucket will be used as the source bucket.

For Region, select China (Hangzhou). For storage class, select Standard. For data redundancy type, select local redundant storage (LRS) (recommended). Keep Block Public Access set to Enabled. For Access Control List (ACL), select Private. For resource group, select default resource group.
Repeat the preceding steps to create another bucket to use as the destination bucket.

1.2 Upload a tabular file to the source OSS bucket

In the OSS console, go to the Buckets page and click the name of the source bucket.
On the Files page, click Upload Object.
Click Select Files, choose a local file (this tutorial uses the sample file userdata.csv, which contains sensitive information such as names, mobile phone numbers, and ID card numbers), and then click Upload Object. Wait for the upload to complete.

Step 2: Authorize DSC to access the OSS buckets

Log on to the Data Security Center (DSC) console.
In the left-side navigation pane, choose Asset Center.
On the Asset Center page, click OSS in the Unstructured Data section on the left, and then click Asset Authorization Management.
On the Asset Authorization Management page, click Asset synchronization.
After the asset synchronization is complete, find the newly created OSS bucket and click Authorization in the Actions column.

Step 3: Create a data masking task

In the DSC console, go to the Data Masking page and click Add Desensitization Task. Follow the on-screen instructions to configure the task.

3.1 Configure the source file for masking

Enter a task name, and then configure the masking source as the sensitive file userdata.csv in the source OSS bucket. For csv files, you must specify the column separator as a comma. The sample file in this tutorial contains a header row.

3.2 Configure masking rules for sensitive fields

On the Data Masking Algorithm tab, the header row fields from userdata.csv are automatically displayed. This tutorial demonstrates how to apply Masking to the Name, Mobile phone number, and ID card number fields.

Enable the data masking switch for each field and select Masking.

For the Name field, select redaction > retain the first n and last m characters. For the Mobile phone number field, select redaction > redact characters x to y. For the ID card number field, select redaction > retain the first n and last m characters.
Click View and Modify Parameters for Masking, configure the algorithm rules, and then click Save. This tutorial uses the following masking rules:
- Name: Retain the first character and mask the rest with an asterisk (*).
- Mobile phone number: Mask with an asterisk (*) from the 4th to the 7th character.
- ID card number: Mask with a number sign (#), retaining the first 3 and last 4 characters.

3.3 Configure the masked file location

Watermarks are not supported for OSS data sources. Directly configure the destination bucket to store the masked file. In this tutorial, the file is saved as a Result set. You can customize the filename, but the file type must be csv, xls, or txt.

3.4 Configure the task trigger

For OSS file masking tasks, only the How the task is triggered (Required) setting applies.

Set How the task is triggered (Required) to Manual Only.
Click Submit.

Step 4: Start the data masking task

4.1 Run the task

On the Task Configurations tab of the Static Desensitization page, find the newly created data masking task and click Start in the Actions column.
On the Static Desensitization page, click the Status sub-tab. Wait until the task progress reaches 100% and the status changes to Successful.

4.2 Verify the results

Go to the Buckets page in the OSS console, click the name of the destination bucket, and find the masked file in the file list. The filename is in the format <DestinationFilename>_<TaskExecutionTime>.<FileType>. For example, in usernews_20240808150643.csv, 20240808150643 indicates that the task was executed at 15:06:43 on August 8, 2024. You can click Download to obtain the file.
After the download is complete, open the file to verify that the name, mobile phone number, and ID card number data is masked.

The data is masked as follows: names retain only the first character (e.g., "Zhang**"), the middle four digits of mobile phone numbers are hidden (e.g., "139****1234"), and the middle digits of ID card numbers are replaced with number signs (e.g., "111###########9000").

Summary

You can mask raw data stored in an OSS bucket and then save it to a destination bucket for secure sharing. After data masking, sensitive content is not directly exposed, reducing the risk of data abuse and privacy violations even if the shared data is leaked. The masked data can be used for scenarios such as data analysis, model training, and business report sharing without compromising personal privacy.

Flexible masking algorithms

Data masking relies on algorithms and their rules. DSC supports various masking algorithms, including hashing, redaction, substitution, transformation, encryption, decryption, and shuffling. Each algorithm offers multiple configuration options, so you can choose the most suitable one for your business scenario.

On the Data Masking page, select the Masking Configuration > Masking Algorithms tab to view descriptions and configure each algorithm. For example, for hashing, DSC supports four rules: MD5, SHA1, SHA256, and HMAC. You can enter a salt value and click Test to verify the masking effect, then click Submit to save the configuration.

Efficient rule configuration

DSC also supports masking templates. To improve efficiency, you can group frequently used masking algorithms into a template and apply it when configuring data masking rules.

For more information, see Configure masking templates and algorithms.

Schedule data masking tasks

While the task in this tutorial is run manually, data masking tasks can also be scheduled to run at specific times, such as hourly, daily, weekly, or monthly. This ensures that updated data is promptly masked and ready for use.

In the How the task is triggered section, you can select Manual Only, Scheduled Only, or Manual + Scheduled. If you choose a scheduled option, you can set the trigger frequency in the Task Schedule Configuration section: for Hourly, set the minute; for Daily and Weekly, set the specific time and day; for Monthly, set the date and hour.