DataWorks provides the Data traceability feature that can extract the watermark information of the data in a leaked data file. This helps you trace the users who may cause data leaks. This topic shows you how to create a tracing task and use the tracing task to trace the users who may cause data leaks.

Prerequisites

  1. A data identification rule is created. For more information, see Identify sensitive data.
  2. The data watermark feature is enabled for the data identification rule. For more information, see Create a data masking rule.

Background information

On the Data Masking page of Data Security Guard, you can enable the data watermark feature for the data identification rule. After the data watermark feature is enabled, the data that hits the data identification rule is automatically watermarked when operations are performed on the data. For example, if the data is queried or downloaded, the data is automatically watermarked. Watermarks uniquely mark user activities. If the data is leaked, you can use the Data traceability feature to extract the watermark of the leaked data. Then, you can trace the users who may cause data leaks based on the watermark information.

Limits

  • The file that is used to trace leak sources must be a CSV file less than 200 MB in size.

  • You can trace only the operations that are performed after the data watermark feature is enabled.
    Note For example, if you query Table A before the data watermark feature is enabled, you cannot trace this data query by using the Data traceability feature.

Create and run a data tracing task

  1. Go to the Data Security Guard page.
  2. In the left-side navigation pane, click Data traceability. The Data traceability page appears.
  3. Create a data tracing task.
    1. Click New data traceability task.
    2. In the Traceability tasks dialog box, click Upload to upload the file for which you want to trace leak sources.
      Note
      • The file that is used to trace leak sources must be a CSV file less than 200 MB in size.

      • You can export or download a data file from DataWorks and then upload the file when you create a tracing task. Alternatively, you can store the data in external environments to a CSV file and then upload the CSV file when you create a tracing task.
      After the file is uploaded, you can replace or download the file. Upload a file for the tracing task
  4. Click Start tracing to start the tracing task.
    Note Wait until the tracing task is completed.

View possible leak sources

On the Data traceability page, you can view the information about all tracing tasks, including the time when the tasks were run and the tracing files. You can also click the View Details icon in the Actions column of a tracing task to view possible leak sources. Tracing tasks
Note
  • You can sort all tracing tasks in chronological or reverse chronological order based on the Traceability date column.
  • You can search for a tracing task by the name of a tracing file. Fuzzy match is supported. After you can enter a keyword in the search field and press the Enter key, DataWorks displays the tracing tasks whose names contain the keyword.
To view the details of a tracing task, find the tracing task and click the View Details icon icon in the Actions column. In the dialog box that appears, you can trace the user who is most likely to cause a data leak based on the values in the Possible probability, Operating time, and Operation commands columns. Possible sources of data leaks

FAQ

Why no possible leak source is found after the tracing task is run? What can I do?
  • Cause 1: The data amount of the tracing file is insufficient. As a result, the watermark information cannot be restored.

    Solution: Make sure that the tracing file contains a sufficient data amount. This way, after the tracing task is run, the watermark generated by the data watermark feature can be reliably restored. Then, the possible users that cause data leaks can be traced. We recommend that you upload a tracing file that contains more than 500 unduplicated data entries.

  • Cause 2: The leaked data does not belong to the current tenant.

    Solution: Check the data source and make sure that the leaked data belongs to the current tenant.

  • Cause 3: The tracing file does not contain watermark information.
    Solution:
    • Check whether the data watermark feature is enabled for the tracing file. You can trace only the operations that are performed after the data watermark feature is enabled. For more information about how to view and enable the data watermark feature, see Create a data masking rule.
    • The tracing file contains no leaked data. The data may be leaked due to the operations from external environments.