You can use the data lineage feature of Data Security Guard to visualize the lineage of sensitive data, analyze abnormal associations between fields, and identify fields whose identification results are abnormal. The data lineage feature provides the information about the spread and impacts of sensitive data and helps efficiently identify sensitive data. This topic describes how to view data lineage.

Overview

The data lineage feature provides the following functionalities:
  • Visualizes the lineage of sensitive data.

    Data Security Guard provides a lineage graph for sensitive data based on the lineage between sensitive fields. The lineage graph helps you understand the source and destination of sensitive data.

  • Improves the efficiency of identifying sensitive data.

    An automatic identification task can be used to identify fields whose sensitive field types are consistent with the sensitive field type of the queried field based on the lineage between the fields. This greatly improves identification efficiency.

  • Analyzes abnormal lineage between fields.
    • Analyzes abnormal associations between fields.

      The system analyzes abnormal associations between sensitive fields based on their lineage. For example, the system can analyze an abnormal association such as SELECT_CONCAT or SELECT_SUBSTRING between fields. This way, users cannot bypass sensitive data identification and sensitive data use audit by concatenating or disassembling characters.

    • Identifies fields that are associated with the queried field but are of different sensitive field types from the queried field.

      The data lineage feature helps you identify fields that are associated with the queried field but are of the different sensitive field types from the queried field. For example, the queried field is A and the sensitive field type for field A is name. Fields B (name) and C (province) are associated with field A but field C is of a different sensitive field type from field A.

Limits

Only users of DataWorks Enterprise Edition or a more advanced edition can use the data lineage feature. For more information about how to upgrade the edition of DataWorks, see Billing of DataWorks advanced editions.

Go to the Data Lineage page

  1. Go to the Data Security Guard page.
    1. Log on to the DataWorks console and go to the Data Security Guard page. For more information, see Overview.
    2. Click Try now. The Data Security Guard homepage appears.
  2. Go to the Data Lineage page.
    You can use one of the following methods to go to the Data Lineage page:
    • Go to the Manually correct sensitive data identification results page, find the field whose lineage you want to view, and then click Analyze Lineage in the Actions column to go to the Data Lineage page.
    • In the left-side navigation pane of the Data Security Guard page, click Data Lineage. The Data Lineage page appears.

View data lineage

The Data Lineage page displays the lineage of sensitive data. Data lineage
Item Description
Analysis scenario The default analysis scenario is lineage of a single field. More analysis scenarios will be available in the future.
Filter To query your desired field, you can specify values for the conditions Project, Table, and Field name in the area marked with 2 in the preceding figure and click Query. The Data Lineage page displays one level of lineage for the desired field. The desired field is highlighted.
Filter conditions You can specify filter conditions in the area marked with 3 in the preceding figure.
  • Associated fields with inconsistent identification results
    If you select this check box, the lineage graph is automatically refreshed to display the fields that are associated with the queried field but are of different sensitive field types from the queried field. The graph also displays the edge relationships between the fields and the queried field.
    Note An edge relationship is the name of an SQL function that is used to create a field. Examples: SELECT and SELECT_LTRIM.
  • Association relation exception field

    If you select this check box, the lineage graph is automatically refreshed to display the fields that have abnormal associations such as SELECT_CONCAT or SELECT_SUBSTRING with other fields. The graph also displays the edge relationships between the fields and other fields.

  • If you select both check boxes, the lineage graph is automatically refreshed to display the fields that are abnormally associated with the queried field and are of different sensitive field types from the queried field. The graph also displays the edge relationships between the fields and the queried field.
Lineage graph One level of lineage for the queried field is displayed in the area marked with 4 in the preceding figure. You can click the queried field to view the field information or click the button in the middle of an edge line between the queried field and another field to view the edge information.
  • View information about the queried field.
    Click the queried field. On the field details page, you can view the following field information: data location, sensitive field type, ancestor and descendant associated fields, and associations between the field and other fields, such as SELECT, SELECT_CONCAT, or SELECT_REPEAT. You can edit the identification results if the results are incorrect. You can change the sensitive field type of the queried field, and modify the information about the ancestor and descendant associated fields of the queried field, such as sensitive field type, data category, and sensitivity level.
    Note
    • If the queried field does not have ancestor or descendant associated fields, no data is displayed on the field details page.
    • If no values are displayed for the Sensitive Field Type, Categorization, and Sensitivity Level parameters, the queried field is not a sensitive field or its sensitive field type is not identified.
    • When you modify the Sensitive Field Type parameter, the data on the Identify sensitive data and Manually correct sensitive data identification results pages is simultaneously updated.
    • Only one level of ancestor fields and one level of descendant fields can be displayed for each queried field.
  • View edge information.
    Click the button in the middle of an edge line between two fields. A panel that displays the edge information appears on the right. You can view the following information in the panel: edge relationship, edge relationship type, SQL details, ancestor node, and descendant node. You can edit the identification results if the results are incorrect. You can select another field association from the Edge Relationship Type drop-down list in the Edge Relationship section, and select another sensitive field type from the Sensitive Field Type drop-down list in the Ancestor Nodes and Descendant Nodes sections.
    Note
    • If the edge relationship is abnormal, the Abnormal Association label is displayed in the Edge Relationship section of the panel. If the edge relationship is normal, this label does not appear.
    • Abnormal associations include SELECT_CONCAT and SELECT_SUBSTRING. These associations are formed when relevant personnel bypass the identification of sensitive data by concatenating or disassembling characters.
    • An edge relationship is the name of an SQL function that is used to create a field. Examples: SELECT and SELECT_LTRIM.
    • If no value is displayed for the Sensitive Field Type parameter, the queried field is not a sensitive field or its sensitive field type is not identified.

Correct the identification results for multiple fields at a time

You can use one of the following methods to correct the identification results for multiple fields:

  • View the details of the ancestor and descendant associated fields of the queried field by using the lineage graph. Select the fields whose identification results you want to correct and modify the sensitive field type, data category, or sensitivity level.
  • Go to the Manual Check page and select the fields whose identification results you want to correct. For more information, see Manually correct sensitive data identification results.