All Products
Search
Document Center

Dataphin:Custom data lineage configuration

Last Updated:Mar 05, 2025

Dataphin supports the automatic parsing of data lineage information for SQL-type computing tasks and logical table tasks. For non-SQL computing tasks, it allows for custom configuration of data lineage information to complete the data lineage chain. This topic describes how to customize the configuration of data lineage for non-SQL tasks.

Limits

  • Supports custom configuration of data lineage for non-SQL type computing tasks only.

  • A maximum of 20 data lineage groups can be configured, each with up to 5 inputs and outputs.

Data lineage configuration description

  • Data Lineage Group & Data Lineage: Each task can have multiple data lineage groups configured. Each input and output within a data lineage group corresponds one-to-one to establish data lineage relationships, and data lineage groups are independent of each other. Each input and output configuration will simultaneously generate table-level and field-level data lineage.

    For example, if input table 1 selects field a from table A, input table 2 selects field b from table B, and the output table selects fields c and d from table C, the following data lineage relationships will be generated:

    • Table-level data lineage: table A → table C; table B → table C.

    • Field-level data lineage: table A.a → table C.c; table B.b → table C.c; table A.a → table C.d; table B.b → table C.d.

  • Supports configuring the environment for input and output tables. If set to Automatic, the environment parameters will be automatically replaced during submission and publishing to generate the corresponding development and production environment data lineage.

  • The system parses the data lineage of the current existing data source tables when the task is submitted or published. If the task is resubmitted or republished, the data lineage will be updated to the latest version.

Configure data lineage

  1. On the Dataphin home page, select Development > Data Development from the top menu bar.

  2. On the Development page, select Project from the top menu bar (Dev-Prod mode requires selecting an environment).

  3. In the left-side navigation pane, select Data Processing > Script Task.

  4. Click the target computing task in the computing task list to open its tab.

  5. Click Lineage in the right sidebar to open the Lineage Configuration panel.

  6. In the Lineage Configuration panel, click + Add Lineage Group to configure the input and output tables.

    • Input Table Configuration

      1. In the data lineage group area, click Configure Input Table to open the Input Table Dialog Box and configure the parameters.

        Parameter

        Description

        Environment

        The environment to which the input and output tables belong. You can select Automatic, Development, or Production.

        • If the environment is set to Automatic, it is equivalent to referencing the table in the code using the space variable ${project name/section name}. The selectable range is the tables in the development environment. After the task is successfully submitted, the variable will be replaced with the development environment table and the data lineage will be parsed. After successful publishing, it will be automatically replaced with the corresponding production environment table and the data lineage will be parsed. If the corresponding production table does not exist, submission and publishing will not be blocked, but the production data lineage cannot be parsed.

        • If the environment is set to Development or Production, no variable replacement will be performed during submission and publishing, and the configuration will prevail.

        Input Table

        Supported data table types include the following: physical table, physical view, logical dimension table, logical fact table, logical aggregate table, logical tag table, logical view.

        Selected Range

        • Entire Table: The data lineage relationship is generated based on all fields of the corresponding table at the time of submission and publishing.

          Note

          The data lineage is only updated when the task is successfully submitted and published. Subsequent changes to the structure of the input and output tables will not automatically trigger a data lineage update.

        • Specified Fields: Select the specified fields in the corresponding table according to business needs.

      2. Click Confirm to complete the input table configuration.

    • Output Table Configuration

      1. In the data lineage group area, click Configure Output Table to open the Output Table Dialog Box and configure the relevant parameters.

        The parameters for the output table are the same as those for the input table.

      2. Click Confirm to complete the output table configuration.

  7. After configuring the input and output tables for each data lineage group, click Confirm in the Lineage Configuration panel to complete the data lineage configuration.

Submit and publish data lineage

  1. After completing the data lineage configuration, click Submit above the code editing area.

  2. In the Submitting Log dialog box, click Confirm And Submit.

  3. During the Object Existence Checking step in the Submitting Log, the system checks whether the input and output tables and fields referenced in the data lineage configuration exist.

    Note
    • The data lineage configuration only performs object checks during the submission stage. No additional checks are performed in the publishing environment.

    • The system parses the table and field data lineage relationships in the development environment when the task is submitted; it parses the relationships in the production environment when published. A single task submission or publishing supports parsing up to 100,000 data lineage relationships. If exceeded, they will not be recorded and cannot be displayed in the asset directory.

View data lineage

After the task is submitted and published, the custom-configured table-level and field-level data lineage relationships can be viewed on the product page of the data table. For more information, see the referenced document.