The data lineage feature in DataWorks extracts watermark information from a leaked CSV file to identify who is likely responsible for a data breach.
Prerequisites
Before you begin, make sure you have:
-
A data detection rule that defines which data assets are subject to watermarking — without it, the data watermark feature cannot be enabled. See Configure a data detection rule and run a detection task.
-
The data watermark feature enabled on the target data detection rule — DataWorks embeds a watermark in every query and download operation on data that hits the rule. See Create a data masking rule.
How it works
The data watermark and lineage workflow has three stages:
-
Enable — Turn on the data watermark feature for a data detection rule in the Data Masking module of Data Security Guard. From this point on, DataWorks automatically embeds a unique watermark in every query and download operation on matching data. The watermark records the user's access behavior and uniquely identifies each access.
-
Embed — Each time a user queries or downloads data that hits the rule, DataWorks automatically embeds a watermark that records the user's access behavior and uniquely identifies each access.
-
Trace — After a data breach occurs, upload the leaked CSV file and start a lineage task. DataWorks extracts the watermark to surface the most likely responsible party.
Understanding this sequence explains why the prerequisites and time-based limitations below matter: watermarks are only embedded from the moment the feature is enabled.
Limitations
-
DataWorks supports data lineage only for CSV files smaller than 200 MB.
-
Only users with the security administrator role can use the data lineage feature.
-
Data lineage covers only access operations performed after the data watermark feature is enabled. Operations that occurred before the feature was turned on are not traceable, and starting a lineage task later does not change this.
For example, if you enable the data watermark feature on February 1, any queries performed on January 31 cannot be traced — even if you run a lineage task afterward.
Create and run a lineage task
-
In the left navigation pane, click Data Traceability.
-
Click Create Data Lineage Task.
-
In the Lineage Task dialog box, click Upload File and select the leaked CSV file from your computer. After the upload completes, you can Replace or Download the uploaded file.

-
Click Start Lineage to begin the task.
The lineage task may take some time to complete.
View possible leak sources
The Data Lineage page lists all completed lineage tasks with their Lineage Date and Lineage File. Tasks are sorted from newest to oldest. To find a specific task, enter a keyword in the search box — the search uses fuzzy matching against file names.
To investigate a specific task, click the
icon in the Actions column. The lineage details page helps you answer two questions: who accessed the data, and when and how they accessed it.
-
Likelihood — DataWorks ranks each potential owner by the probability that their access produced the leaked file. Start with the highest-likelihood entry.
-
Operation Time — The timestamp of the access operation.
-
Operation Command — The specific operation performed (for example, a query or download).
FAQ
The lineage task completed, but Possible Leak Source shows No Result. What should I check?
There are three common reasons:
-
Insufficient data volume. The watermark extraction algorithm requires enough unique data to reconstruct the watermark reliably. Use a file that contains more than 500 unique data entries and run the task again.
-
The data does not belong to your tenant. Confirm that the file you uploaded was originally produced from data in your own tenant, not from an external organization.
-
The file does not contain watermark information. This happens when the data was accessed before the data watermark feature was enabled, or when the breach originated in an external system rather than through DataWorks. Check whether the data watermark feature is enabled for the relevant data detection rule — see Create a data masking rule. If the feature was not enabled at the time of access, those operations are not traceable.