DataWorks provides the Data traceability feature that can extract the watermark information of the data in a leaked data file. This helps you trace the users who may cause data leaks. This topic shows you how to create a tracing task and use the tracing task to trace the users who may cause data leaks.
Prerequisites
- A data identification rule is created. For more information, see Identify sensitive data.
- The data watermark feature is enabled for the data identification rule. For more information, see Create a data masking rule.
Background information
On the Data Masking page of Data Security Guard, you can enable the data watermark feature for the data identification rule. After the data watermark feature is enabled, the data that hits the data identification rule is automatically watermarked when operations are performed on the data. For example, if the data is queried or downloaded, the data is automatically watermarked. Watermarks uniquely mark user activities. If the data is leaked, you can use the Data traceability feature to extract the watermark of the leaked data. Then, you can trace the users who may cause data leaks based on the watermark information.Limits
The file that is used to trace leak sources must be a CSV file less than 200 MB in size.
- You can trace only the operations that are performed after the data watermark feature is enabled. Note For example, if you query Table A before the data watermark feature is enabled, you cannot trace this data query by using the Data traceability feature.
Create and run a data tracing task
- Go to the Data Security Guard page.
- In the left-side navigation pane, click Data traceability. The Data traceability page appears.
- Create a data tracing task.
- Click Start tracing to start the tracing task. Note Wait until the tracing task is completed.
View possible leak sources
- You can sort all tracing tasks in chronological or reverse chronological order based on the Traceability date column.
- You can search for a tracing task by the name of a tracing file. Fuzzy match is supported. After you can enter a keyword in the search field and press the Enter key, DataWorks displays the tracing tasks whose names contain the keyword.
FAQ
- Cause 1: The data amount of the tracing file is insufficient. As a result, the watermark information cannot be restored.
Solution: Make sure that the tracing file contains a sufficient data amount. This way, after the tracing task is run, the watermark generated by the data watermark feature can be reliably restored. Then, the possible users that cause data leaks can be traced. We recommend that you upload a tracing file that contains more than 500 unduplicated data entries.
- Cause 2: The leaked data does not belong to the current tenant.
Solution: Check the data source and make sure that the leaked data belongs to the current tenant.
- Cause 3: The tracing file does not contain watermark information. Solution:
- Check whether the data watermark feature is enabled for the tracing file. You can trace only the operations that are performed after the data watermark feature is enabled. For more information about how to view and enable the data watermark feature, see Create a data masking rule.
- The tracing file contains no leaked data. The data may be leaked due to the operations from external environments.