All Products
Search
Document Center

DataWorks:Data classification

Last Updated:Mar 26, 2026

Data classification and grading automatically scans your data assets to discover sensitive fields, assigns them business categories and security levels, and produces a confirmed catalog that drives downstream policies such as data masking, threat monitoring, and access auditing.

How it works

The feature follows a three-step workflow:

  1. Define identification standards. Set up the taxonomy — security levels (Data Grading), business categories (Data Classification), specific sensitive types (Data Type), and the detection rules that identify them.

  2. Run an identification task. Apply your rules to selected data sources (MaxCompute or Hologres) via a one-time or periodic scan.

  3. Review and confirm results. Inspect the field-level catalog produced by the task. Correct any misidentifications manually, then use the confirmed catalog as input for data masking, threat monitoring, and access auditing policies.

Note

Data Grading and Data Classification serve distinct purposes. Data Grading answers "how sensitive is this field?" (for example, S1 Public vs. S2 Internal). Data Classification answers "what kind of data is this?" (for example, Personal Information vs. Financial Data). Each Data Type belongs to exactly one classification and one grading level.

Limitations

  • Supported editions: Standard Edition, Professional Edition, or Enterprise Edition, with the new data security features enabled in Security Center.

  • Supported regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, and Indonesia (Jakarta).

  • Supported compute engines: MaxCompute and Hologres.

Prerequisites

Before you begin, ensure that you have:

  • An Alibaba Cloud account or RAM user that meets one of the following conditions:

    • Attached with the AliyunDataWorksFullAccess policy

    • Assigned the tenant security administrator role of DataWorks

    • Assigned the tenant administrator role of DataWorks

  • Completed all steps in the New user guide

Access the feature

  1. Log on to the DataWorks console. In the top navigation bar, select the target region. In the left navigation pane, choose Data Governance > Security Center, then click Go to Security Center.

  2. In the left navigation pane of Security Center, choose Sensitive Data Protection > Data classification grading.

Configure data classification

Data classification lets you define the taxonomy of sensitive data in your organization. The system includes built-in templates for both Data Classification and Data Type that you can edit as needed.

Add a data type

A data type is the core unit of the taxonomy. Each data type belongs to a classification, carries a grading level, and has one or more identification rules.

  1. On the Data classification grading page, click the Data classification tab.

  2. Click New Data Type in the upper-left corner.

  3. Configure the following parameters.

    Parameter Description
    Data Type A globally unique name for this data type. DataWorks tags matched columns with this name.
    Data Classification The business category this data type belongs to.
    Data Level The security level assigned to columns that match this data type.
    Identification Rules One or more rules that detect this data type. Three rule methods are supported: Data Content Identification (matches column values using regular expressions or built-in algorithms such as ID card validation), Field Name Identification (matches column names using regular expressions), and Field Annotation Identification (matches column comments using regular expressions). Set the rule logic to Satisfy any rule (a match on any single rule triggers identification) or Also meet the following rules (all rules must match).
    Data Type Description A custom description for your business context.
  4. Click Effective Immediately to save and immediately apply the identification rule — when an identification task runs, it marks matched columns with this data type. Click Save Only to save the rule without activating it — when an identification task runs, it does not mark data with this data type.

Delete a data type

Only custom data types can be deleted. Built-in data types cannot be deleted.

Important

Deleting a data type has the following effects:

  • Historical identification results for this data type are deleted.

  • Future identification tasks no longer detect this data type.

  • Desensitization policy rules referencing this data type are deleted.

  • Data access records for this data type are deleted.

  • Security risk identification rules referencing this data type are deleted.

Configure data grading

DataWorks supports up to 10 security levels. A higher number indicates a higher security level. The default levels use labels such as S1 (Public) and S2 (Internal). You can modify the description of each level as needed.

  1. On the Data classification grading page, click the Data Grading tab.

  2. Click Edit in the upper-left corner.

  3. Update the Detailed description for each level as needed.

  4. Click Save.

Create and manage identification tasks

Identification tasks apply your configured rules to scan specified data sources. Two task types serve different needs:

  • Single Task: Runs once, on demand. Use this to perform a full historical scan or to re-evaluate results within a specific scope.

  • Periodic Tasks: Runs on a fixed daily, weekly, or monthly schedule. Use this for ongoing monitoring of new data. Periodic tasks identify only new data (columns). Only one periodic task can exist at a time.

Important

Periodic tasks identify only new data (columns). To re-evaluate historical results, create a Single Task covering the same scope.

Create an identification task

  1. On the Data classification grading page, click the Identify Tasks tab.

  2. Click New Task in the upper-left corner.

  3. Configure the following parameters.

    Parameter Description
    Task Name A custom name for this task.
    Data Source Type Select MaxCompute or Hologres.
    Task Type Single Task (runs once) or Periodic Tasks (runs on a daily, weekly, or monthly schedule).
    Identification Range The scope of data to scan. The minimum scope is a data table. For MaxCompute, select a project or table. For Hologres, select a database or table, then select the Data Source from an instance attached to a workspace and a Resource Group to authenticate network connectivity.
    Sampling Quantity The amount of data to sample from each column when the task runs. A larger sample improves accuracy but increases task duration. Maximum value: 200.
    Data Sampling Using The account used to access data during sampling. This account must have permission to read table names, column names, column comments, and column data within the identification scope.
  4. Click Confirm.

Edit an identification task

Only periodic tasks can be edited. On the Identify Tasks tab, click Edit in the Operation column for the task.

Important

One-time tasks cannot be edited. To change a one-time task's configuration, delete it and create a new one.

View task details and run history

On the Identify Tasks tab, click View in the Operation column. On the task details page, click the number next to Running Records to see the start and end times for each run.

Delete identification tasks

On the Identify Tasks tab, delete tasks individually or in batches:

  • Single delete: Find the task and click Delete in the Operation column.

  • Batch delete: Select multiple tasks and click Batch Delete in the lower-left corner.

Important
  • Deleting a task does not stop a run already in progress.

  • After a periodic task is deleted, it stops running on its schedule.

  • Historical identification results from deleted tasks are retained.

View and revise identification results

Note

Data identification retrieves the latest schema information early every morning. New fields, tables, or databases are classified and graded starting the following morning. For periodic tasks, the results take effect on T+1.

View results

  1. On the Data classification grading page, click the Identification results tab.

  2. Review the results table. Each row represents an identified field with the following information.

    Column Description
    Data Source Type The compute engine (MaxCompute or Hologres).
    Instance/Project/Database The instance, project, or database containing the field.
    Table The table containing the field.
    Field The column name.
    Data classification The data type identified or manually revised.
    Data Type The classification directory path, in the format Level-1 directory/Level-2 directory/....
    Data Grading The security level assigned to the field.
    Judgment mode System identification (set by a task) or Revision (set manually).
    Update time When the field was last identified or revised.

Revise results

If identification results are inaccurate, correct them using one of two methods:

  • Overwrite with a new scan: Create a one-time identification task to re-evaluate results within a specific scope.

  • Manual revision: On the Identification results tab, use the search bar to find the field, click Revision in the Operation column, and select a new data type in the Revise dialog box.

Next steps

After you confirm the identification results, use the sensitive data catalog as input for:

  • Data masking: Apply masking policies to protect sensitive columns.

  • Threat monitoring: Detect anomalous access patterns on sensitive data.

  • Access auditing: Audit who accessed which sensitive fields and when.