Mask sensitive data in OSS tabular files using Data Security Center - Object Storage Service

You can use the static data masking feature of Data Security Center (DSC) to mask sensitive data in structured TXT, CSV, XLSX, and XLS files. The source files are located in an OSS Bucket, and the masked files are saved to a destination OSS Bucket. This process enables secure data sharing.

Solution overview

Example of masked data:

Data before masking			Data after masking
Name	Phone number	ID card number	Name	Phone number	ID card number
Zhang San San	1390000****	111222190002309000	Zhang**	139****1234	111###########9000
Li Si Si	13900001111	150802202207214000	Li**	139****1111	150###########4000
Wang Wu Wu	13900002222	120105195001066000	Wang**	139****2222	120###########6000

This data masking process consists of four steps:

Create OSS Buckets and upload a file: Create a source OSS Bucket and a destination OSS Bucket. Upload a tabular file that contains sensitive data to the source OSS Bucket.
Connect the OSS Bucket to DSC: Grant DSC authorization to access the OSS Bucket. This allows DSC to read from and write to the OSS Bucket.
Add a data masking task: Create a data masking task. Configure the masking algorithms and rules for sensitive fields in the source file. Also, specify the storage location for the masked file.
Start the data masking task: Start the task to mask the sensitive data in the source file. The masked file is then saved to the destination OSS Bucket.

Prerequisites

You have purchased a Data Security Center instance and granted Data Security Center access to other Alibaba Cloud resources.
The data masking feature is available only in the Enterprise instance of Data Security Center. Therefore, you must purchase the Enterprise Edition. This example focuses on data masking for OSS files. You only need to enable the OSS Data Management service and select the minimum OSS Data Volume. You can disable the Database Management and Value-added Modules services.
You have activated Object Storage Service.

Step 1: Create OSS Buckets and upload a file

1.1. Create a source OSS Bucket and a destination OSS Bucket

On the Bucket List page of the OSS console, click Create Bucket.
In the Create Bucket panel, configure the required parameters and keep the default settings for the other parameters. Then, click Create. This bucket will serve as the source OSS bucket.
Repeat these steps to create another OSS Bucket to use as the destination OSS Bucket.

1.2. Upload the tabular file to the source OSS Bucket

On the Bucket List page of the OSS console, click the name of the source OSS Bucket.
On the Files page, click Upload File.
Click Scan Files and select a local file. This example uses the userdata.csv file, which contains sensitive information such as names, phone numbers, and ID card numbers. Then, click Upload File and wait for the upload to complete.

Step 2: Connect the OSS Bucket file to DSC

Log on to the Data Security Center console.
In the navigation pane on the left, choose Asset Center.
On the Asset Center page, in the left-side Unstructured Data area, click OSS and then click Asset Authorization Management.
On the Asset Authorization Management page, click Sync Assets.
After the assets are synchronized, find the newly created OSS Bucket and click Authorize in the Actions column.

Step 3: Add a data masking task

On the Data Masking page in the Data Security Center console, click Add Data Masking Task and follow the on-screen instructions to configure the task.

3.1. Configure the source file for data masking

Enter a name for the task. Set the data masking source to the sensitive file userdata.csv in the source OSS Bucket. The example file in this topic is a csv file with a header row. Set the column delimiter to a comma.

3.2. Configure masking rules for sensitive fields

On the Masking Algorithm page, the headers from userdata.csv are automatically populated. In this example, Redaction is applied to the name, phone number, and ID card number fields.

Enable data masking for each field and select Redaction.
Next to Redaction, click View and Modify Parameters. Configure the algorithm rules and click Save. This example uses the following masking rules:
- Name: Mask with *. Keep the first character.
- Phone number: Mask with *. Mask characters from the 4th to the 7th position.
- ID card number: Mask with #. Keep the first 3 and last 4 characters.

3.3. Configure the storage location for the masked file

OSS data sources do not support watermarks. Configure the task to store the masked file directly in the destination Bucket. In this example, the file is saved as a result set. You can customize the file name. The file extension must be csv, xls, or txt.

3.4. Configure the trigger method for the data masking task

For OSS file data masking tasks, only the Task Trigger Method setting applies. All other parameter settings are ignored.

Set Task Trigger Method to Manual Only.
Click Submit.

Step 4: Start the data masking task

4.1. Run the task

On the Static Data Masking page, click the Task Configuration tab. Find the new data masking task and click Start in the Actions column.
On the Static Data Masking tab, click the Task Status subtab. Wait until the task progress is 100% and the status is Succeeded.

4.2. Verify the masking result

Go to the Bucket List page in the OSS console. Click the name of the destination Bucket. In the file list, find the masked file. The file name is in the format <object_file_name>_<task_running_time>.<file_type>. For example, in the file name usernews_20240808150643.csv, 20240808150643 indicates that the task ran at 15:06:43 on August 8, 2024. Click Download to retrieve the file.
After the download is complete, open the file to verify that the name, phone number, and ID card number fields contain masked data.

Summary

You can mask raw data stored in an OSS Bucket and save the masked data to a destination OSS Bucket for sharing. After data masking, if the shared data is leaked, sensitive content is not exposed. This reduces the risk of data misuse and privacy breaches. The masked data can be used for scenarios such as data analytics, model training, and business report sharing without exposing private information.

Flexible selection of masking algorithms

Data masking is performed based on masking algorithms and their corresponding rules. DSC supports various algorithms, such as hashing, redaction, substitution, rounding, encryption, data decryption, and shuffling. Each algorithm provides multiple methods for configuring rules. You can select different algorithms to meet the needs of different business scenarios.

Improve the efficiency of masking rule configuration

DSC also provides a data masking template feature. You can group frequently used masking algorithms for a specific scenario into a template. When you configure static data masking rules, you can apply existing templates to improve configuration efficiency.

For more information, see Configure data masking templates and algorithms.

Scheduled data masking tasks

Data masking tasks can be scheduled to run at specific intervals, such as hourly, daily, weekly, or monthly. This ensures that updated data is promptly masked and ready for use.