Data classification and grading - DataWorks - Alibaba Cloud Documentation Center

Data classification and grading is a key prerequisite for data security. This feature helps you identify and categorize sensitive data types and their corresponding security levels. It detects sensitive information within your organization's data assets and assigns security levels based on data sensitivity. This process helps you understand the sensitive content in your data assets and provides a foundation for data management and protection. Knowing the sensitive data you possess lets you manage access permissions, apply data masking, and audit data access to improve your overall data security.

Function introduction

Data classification and grading is the foundation and starting point for all data protection features in the DataWorks Security Center. Its core goal is to help you automatically discover and tag sensitive data scattered across various data sources. This process answers two key questions: "What sensitive data do I have?" and "Where is it?"

Step 1: Configure data classification and grading rules
First, you must define a set of identification standards for sensitive data. These standards include the following:
- Data Grading: Labels data sensitivity, such as S1 (Public) and S2 (Internal).
- Data Classification: Groups data by business category, such as Personal Information and Financial Data.
- Data Type: Defines specific types of sensitive data, such as Phone Number and ID Card Number. When you create a data type, you must assign it to a data classification and specify a data grade.
- Identification Rules: This is the core of automated discovery. You can set powerful identification rules for each data type. The following identification methods are supported:
  - Identify by content: Matches data content using regular expressions or built-in algorithms, such as ID card validation.
  - Identify by field name/comment: Matches field names or comments using regular expressions.
Step 2: Create an identification task
Create an identification task to apply your defined rules and scan specified data sources, such as MaxCompute and Hologres. You can run tasks immediately as a one-time scan or schedule them as a periodic (daily, weekly, or monthly) scan for continuous monitoring.
Step 3: Generate identification results
After the task runs, the system generates a detailed checklist of identification results. For periodic tasks, the results take effect on T+1. This checklist is your sensitive data asset catalog. It clearly lists which field in which table was identified as a specific sensitive data type. If there are any identification biases, you can make manual revisions to ensure the final accuracy of the catalog.

Finally, this identified and confirmed catalog of sensitive data assets serves as the precise input for all downstream advanced security policies, such as data masking, threat monitoring, and access auditing.

Limitations

Applicable users: This feature is available to users who have activated DataWorks Standard, Professional, or Enterprise Edition and have selected the new data security features in Security Center.
Supported regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), and China (Chengdu).
Supported compute engine: MaxCompute.

Prerequisites

The Alibaba Cloud account or a RAM user that you use must meet one of the following conditions:
- The Alibaba Cloud account or RAM user is attached with the AliyunDataWorksFullAccess policy.
- The Alibaba Cloud account or RAM user is assigned the tenant security administrator role of DataWorks.
- The Alibaba Cloud account or RAM user is assigned the tenant administrator role of DataWorks.
You have completed the steps in New user guide.

Feature entry point

Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Security Center. On the page that appears, click Go to Security Center.
In the navigation pane on the left, choose Sensitive Data Protection > Data classification grading.

Configure data classification

Go to the data classification page
1. On the data classification grading page, click the Data classification tab.
2. The Data Classification tree is on the left, and the Data Type that belong to Data Classification is on the right. Click a branch in the classification tree to view the Data Type for the selected category. You can then perform View, Edit, and Delete operations on data types in the Actions column.

Add a data type

Important

The system includes built-in templates for Data Classification and Data Type. You can edit these templates as needed.

On the Data classification page, click New Data Type in the upper-left corner.

Configure the following parameters:

Parameter	Description
Data Type	Enter a name for the data type. The name must be globally unique. DataWorks marks data (columns) that meets the identification rules with this data type.
Data Classification	Specify the data classification to which the data type belongs.
Data Level	Specify the security level for this data type. DataWorks marks data (columns) that meets the identification rules with this data level.
Identification Rules	When an identification rule is met, DataWorks marks the identification result on the data (column). Three types of identification rules are supported: Data Content Identification, Field Name Identification, and Field Annotation Identification. Each rule must be set and verified independently. Satisfy any rule: The identification rule is met if any of the rules is hit. Also meet the following rules: The identification rule is met only if all rules are hit.
Data Type Description	Enter a custom description for the data type based on your business scenario.

After you configure the parameters, you can apply the rule immediately or save it.
1. Effective Immediately: Saves the configuration and immediately applies the identification rule. When a data identification task runs, it marks data columns that match the rule with this data type.
2. Save Only: Saves the configuration, but the identification rule does not take effect. When a data identification task runs, it does not mark data with this data type.

Delete a data type: You can delete only custom data types. You cannot delete built-in data types.
Important
The following effects occur when you delete a data type:
- Delete the historical recognition result; The new recognition task will no longer recognize the data type.
- Delete the rule of the data type in the desensitization policy.
- Delete the data access record of the data type.
- Delete the rules related to the data type in the security risk identification rules.

Configure Data Grading

DataWorks supports up to 10 security levels. You can modify the description of each level as needed. A higher number indicates a higher security level.

Go to the data grading page: On the Data Classification Grading page, click the Data Grading tab.
Edit data grading: Click the Edit button in the upper-left corner of the page to modify the Detailed description for each level.
Save data grading: After you modify the detailed descriptions, click the Save button in the upper-left corner to save the data grading settings.

Identify Tasks

Go to the Identify Tasks page: On the Data Classification Grading page, click the Identify Tasks tab.

Create a identification task

On the Identify Tasks tab, click New Task in the upper-left corner.

Configure the following parameters:

Parameter	Description
Task Name	A custom name for the data classification and grading identification task.
Data Source Type	Select a data source type. MaxCompute and Hologres are supported.
Task Type	Single Task: Runs only once. Periodic Tasks: Runs repeatedly at a fixed time. Important Periodic tasks identify only new data (columns). You can use a one-time task to re-evaluate historical identification results. DataWorks supports only one periodic task.
Identification Range	Specify the scope of data for the identification task to cover. The minimum scope is a data table. If you set Data Source Type to MaxCompute, you can select a project or a data table. If you set Data Source Type to Hologres, you can select a database or a data table. You must select a Data Source from an instance that is attached to a specific Workspace. Then, select a Resource Group to authenticate network connectivity.
Sampling Quantity	The amount of data to sample from each column when the task runs. A larger sample size improves identification accuracy but increases task duration. The maximum value is 200.
Data Sampling Using	When the identification task runs, DataWorks can only use the specified account to access data. If the specified account does not have the required permissions, sampling and identification will fail. Important Ensure the specified account has permission to access the table names, column names, column descriptions, and column data within the specified identification scope.

After you configure the parameters, click Confirm to save the task.

Edit a data identification task
To reconfigure a periodic identification task, on the Identify Tasks tab, click Edit in the Operation column for the task.
Important
You cannot edit one-time tasks. To change a one-time task, you must delete it and create a new one.
View a data identification task
1. On the Identify Tasks tab, find the desired task and click View in the Operation column to open the task details page.
2. On the task details page, click the number next to Running Records to view the Start execution time and End execution time for each run.
Delete data identification tasks
On the Identify Tasks tab, you can delete a single task or a batch of tasks.
- Delete a single task:
  Find the task that you want to delete and click Delete in the Operation column.
- Delete tasks in batches:
  Select the tasks that you want to delete and click Batch Delete in the lower-left corner.
Important
- Deleting a data identification task does not stop a task that is currently running.
- After a periodic task is deleted, it will no longer run.
- After a data identification task is deleted, the identification results from its historical runs are retained.

View data classification and grading results

Important

Data identification retrieves the latest table schema information early every morning. Therefore, new fields, tables, or databases are classified and graded the following morning.

On the Data classification grading page, click the Identification results tab. On this tab, you can view the results for table fields after an identification task has run.

View data classification and grading results

On the Identification Results page, you can view the data classification and grading results for your data assets. The following information is displayed:

Identity Information	Description
Data Source Type	The data engine to which the data asset belongs.
Instance/Project/Database	The name of the instance, project, or database to which the data asset belongs.
Table	The name of the data table to which the data asset belongs.
Field	The name of the column in the data asset.
Data classification	The data type identified by the task, or the type revised by a user.
Data Type	The classification directory of the data type, or the directory revised by a user. The path is displayed in the format of `Level-1 directory/Level-2 directory/...`.
Data Grading	The security level corresponding to the data type, or the level revised by a user.
Judgment mode	System identification: The result is determined by a data identification task. Revision: The result is revised by a user.
Update time	The time when the result was last identified by the system or revised by a user.

Revise data classification and grading results
On the Identification Results page, you can delete or revise the classification and grading results in the Actions column. You can revise the identification results for a data asset in one of the following two ways:
- Overwrite with a new scan: Create a new one-time identification task to re-evaluate the results for assets within a specific scope.
- Manual revision: Manually revise the identification results for a data asset. To do this, perform the following steps:
  1. On the Identification Results tab, use the search bar to filter for the data asset whose results you want to modify. Then, click Revision in the Operation column.
  2. In the Revise dialog box, manually select a new data type.