All Products
Search
Document Center

DataWorks:Data tracing

Last Updated:Dec 04, 2025

Data tracing extracts watermark information from leaked data files. It helps your organization identify key details about a potential data breach, such as the person responsible and the time of the leak. The feature provides comprehensive tracking capabilities that allow you to take effective measures quickly after a data leak occurs. This reduces potential threats and improves your organization's data security and compliance. This topic describes the data tracing feature and how to use it.

Data tracing workflow

The data tracing feature lets you analyze leaked files to pinpoint the source of a data breach and identify the specific operator and operation. For this feature to work, the following three interconnected prerequisites must be met:

Condition 1: Data is identified as sensitive

This is the foundation for all protection and tracing features. You must first identify the relevant data as sensitive in Sensitive Data Protection > Data classification grading.

  • To do this: Ensure that the target data field, such as user_phone, is scanned by a detection task and successfully marked with a specific data type, such as 'Phone Number'.

Condition 2: Data is masked during transmission

The tracing capability does not work for all data. It relies heavily on applying data masking at specific stages of data forwarding.

  • To do this: In the Sensitive Data Protection > Data Desensitization module, configure a masking rule for the 'Phone Number' data type and configure a masking policy to define its scope.

Condition 3: Digital watermarking is enabled in the masking rule

This is the core technical requirement for tracing. Digital watermarking is not enabled by default.

  • Procedure: When you configure or edit a data desensitization rule, you must explicitly set the Data Watermark option to enabled. This ensures that the system embeds an invisible watermark that contains traceability information, such as the operator, time, and SQL query, into the masked data during the masking process.

A data file must complete all three preceding steps to be traceable. For example, after you query and export a masked and watermarked CSV file from a module such as Data Analysis or Data Development, the Data Tracing feature can successfully parse the file to track its data breach path.

Limitations

  • Applicable users: This feature is available to users of DataWorks Professional Edition or Enterprise Edition. You must also enable the new data security features for DataWorks in Security Center.

  • Supported regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), and Japan (Tokyo).

  • Supported compute engines: MaxCompute and Hologres.

Prerequisites

  • The Alibaba Cloud account or a RAM user that you use must meet one of the following conditions:

    • The Alibaba Cloud account or RAM user is attached with the AliyunDataWorksFullAccess policy.

    • The Alibaba Cloud account or RAM user is assigned the tenant security administrator role of DataWorks.

    • The Alibaba Cloud account or RAM user is assigned the tenant administrator role of DataWorks.

  • You have completed the steps in New user guide.

Create a data tracing task

  1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Security Center. On the page that appears, click Go to Security Center.

  2. In the navigation pane on the left, choose Sensitive Data Protection > Data Traceablity.

  3. On the Data Tracing page, click New Task in the upper-left corner to create a data tracing task.

    Note
    • Data tracing tasks support the upload of .csv files.

    • The file size cannot exceed 200 MB.

    • The file must contain more than 500 data entries.

View Data Tracing results

Important

This feature can only trace data from operations for which digital watermark was enabled during data masking configuration.

When the Task Status of a data tracing task is Completed, click View in the Operation column to view the tracing results.

If a potential leak source is detected, the following information is provided:

Field

Description

Watermark similarity

The higher the watermark similarity, the higher the probability that this operation caused the data breach.

Operator

The account used for the operation. This can be the logon account, or the RAM user or Alibaba Cloud account specified as the default access identity for the data source.

Operation Time

The time when the operation occurred.

Project

The name of the project or database that was accessed.

Behavior

The type of operation. If the operation is an SQL statement, you can copy the full statement.

Delete a data tracing task

You can delete a single data tracing task from the Actions column or select multiple tasks to delete them in a batch. After a task is deleted, you can no longer download its tracing file or view its results.