The data lineage feature in DataWorks lets you extract watermark information from a leaked data file. This helps you identify the owner who may be responsible for the data breach. This topic describes how to create a data lineage task and use it to find the responsible owner.
Prerequisites
A data detection rule has been created. For more information, see Configure a data detection rule and run a detection task.
The data watermark feature must be enabled on the target data detection rule. For more information, see Create a data masking rule.
Background information
In DataWorks, you can use the Data Masking module in Data Security Guard to enable the data watermark feature for a data detection rule. After this feature is enabled, DataWorks automatically generates a watermark for all operations, such as queries and downloads, on data that hits the rule. The watermark records the user's access behavior and uniquely identifies each access. If a data breach occurs, you can use the data lineage feature to extract the watermark from the leaked data. This helps you identify the owner who is potentially responsible for the breach.
Limits
DataWorks supports data lineage only for CSV files smaller than 200 MB.
Only users with the security administrator role can use the data lineage feature.
DataWorks provides data lineage only for data access operations that occur after the data watermark feature is enabled.
NoteFor example, if you query Table A before the data watermark feature is enabled, the data lineage feature cannot trace that query operation. The operation remains untraceable even if you later enable the data watermark feature and start a lineage task for the data file.
Create and run a data lineage task
In the left navigation pane, click Data Traceability to open the Data Traceability page.
Create a data lineage task.
Click the Create Data Lineage Task button.
In the Lineage Task dialog box, click Upload File to upload an object file for lineage tracing.
NoteDataWorks supports data lineage only for CSV files smaller than 200 MB.
You can export or download a data file from DataWorks to your computer and then upload it for the data lineage task. You can also save data from an external system as a CSV file and upload the file.
Once the object file is uploaded, you can Replace or Download the file.

Click Start Lineage to begin the lineage task.
NoteThe data lineage task may take some time to complete.
View possible leak sources
On the Data Lineage page, you can view the Lineage Date and Lineage File for all completed lineage tasks. You can also examine the lineage details for a task to identify potential data leak sources.

All lineage tasks are sorted by Lineage Date from newest to oldest, which makes it easier to find a specific task.
You can search for a data lineage task by its file name. The search supports fuzzy matching. After you enter a keyword, all data lineage tasks that contain the keyword are displayed.
Click the
icon in the Actions column of the target data lineage task to view its lineage details. You can identify the owner most likely responsible for the data leak based on the Likelihood, Operation Time, and Operation Command values from the DataWorks analysis.
FAQ
If a lineage task completes but No Result is displayed for Possible Leak Source, the possible reasons and solutions are as follows:
Reason 1: The data volume in the file is insufficient. This prevents the watermark information from being reverted.
Solution: The data watermark feature requires sufficient data to generate a reliable watermark. This ensures the lineage task can accurately revert the watermark and identify the owner responsible for the breach. Use a file that contains more than 500 unique data entries for lineage tracing.
Reason 2: The breached data does not belong to your tenant.
Solution: Confirm the source of the data. Make sure that the data you are tracing belongs to your tenant.
Reason 3: The file for data lineage tracing does not contain watermark information.
Solution:
Check whether the data watermark feature is enabled for the object file. DataWorks supports data lineage only for data access operations performed after the data watermark feature is enabled. To view and enable the data watermark feature, see Create a data masking rule.
The file that you are tracing was not involved in the data breach. The breach may have been caused by operations that were performed in other external systems.