All Products
Search
Document Center

DataWorks:Query data lineage using the DataWorks OpenAPI

Last Updated:Nov 26, 2025

This document shows you how to use the DataWorks OpenAPI (2024-05-18) to programmatically query the lineage of data tables and fields. It provides specific API call examples and SDK code to help you get started quickly and perform automated and batch lineage analysis.

What is data lineage?

Imagine that you are looking at an important business report that shows a large increase in sales for this quarter. As a careful data analyst or manager, several questions will come to your mind:

  • How is this "sales" metric calculated?

  • What is the source business data? Is it from an order table or a payment transaction table?

  • What processing steps did the data go through from the source to the final report, such as cleaning, transformation, and aggregation?

  • If there is an error in this metric's data, what downstream reports or applications will it affect?

    image

Clear data lineage is essential. It provides the following core benefits:

  1. Data traceability and troubleshooting
    When you find data anomalies or errors, you can trace the lineage upstream to quickly locate the processing step or source data that caused the problem. This greatly reduces troubleshooting time.

  2. Impact analysis
    When you need to change a table schema, field, or calculation logic, you can analyze the lineage downstream. This lets you accurately assess which downstream data and business reports will be affected. This helps you avoid the unknown risks of a single change causing widespread issues.

  3. Data governance and credibility
    Clear lineage is the foundation for data asset management, data standard implementation, and Data Quality monitoring. It makes the entire data lifecycle transparent and increases business users' trust in the data.

  4. Cost optimization and asset inventory
    By analyzing lineage, you can identify data tables or computing tasks that have no downstream consumers. This provides a basis for data warehouse cost optimization and for unpublishing old assets.

DataWorks automatically parses and records the data lineage generated by various computing tasks, such as MaxCompute SQL and EMR Spark. The DataWorks OpenAPI lets you programmatically access this lineage information. You can then integrate lineage analysis into your own data management platform or automated O&M processes.

Preparations: Get the entity ID

Before you can query any lineage, you must first obtain the unique identifier for the target data (table or field). This identifier is the entity ID. The entity ID is the core credential for calling metadata and lineage-related APIs.

You can obtain an entity ID in one of two ways:

1. Get the ID from the DataWorks interface

For a small number of known tables or fields, copying the ID from the interface is the fastest method.

Get a table's entity ID

  1. Go to the Data Map module in DataWorks.

  2. Search for and open the details page of the table you want to query.

  3. In the Table Basic Information panel on the left, find the Entity ID and copy it.

    image

Get a field's entity ID

  1. On the table's details page, switch to the Lineage Information tab and select Field Lineage.

  2. In the field lineage graph, click the field node you are interested in.

  3. The field's details panel appears on the right. In the panel, find the Entity ID and copy it.

    image

2. Get IDs in batches using the API

When you need to obtain many entity IDs, manual operations are inefficient. In this case, you can use the OpenAPI to perform a batch query:

Use the ListLineages API to query lineage

After you obtain the entity ID, you can use the core ListLineages API to query its upstream and downstream lineage.

1. Core API parameters

The following table describes the key request parameters for the ListLineages API. You can test the API online in the OpenAPI Portal.

Parameter

Type

Description

SrcEntityId

String

Used to query downstream lineage. Pass the source (upstream) entity ID. The API returns all downstream lineage for that entity.

DstEntityId

String

Used to query upstream lineage. Pass the destination (downstream) entity ID. The API returns all upstream lineage for that entity.

SrcEntityName

String

Used with DstEntityId to perform a fuzzy search and filter upstream entities.

DstEntityName

String

Used with SrcEntityId to perform a fuzzy search and filter downstream entities.

NeedAttachRelationship

Boolean

Specifies whether to include detailed lineage relationship information in the response. Set this to true to get the full context.

Important
  • If you provide both SrcEntityId and DstEntityId, the API returns the lineage relationship between the specified upstream and downstream entities.

  • If SrcEntityId and DstEntityId are the same, the API returns a self-referencing lineage relationship for that entity.

2. Call examples

Assume that you have a MaxCompute table with the entity ID maxcompute-table:::test_project::test_table.

Example 1: Query the table's downstream lineage

To query all downstream tables of this table, you need to set it as the source:

  • SrcEntityId: maxcompute-table:::test_project::test_table

  • NeedAttachRelationship: true

To find only downstream tables with names that contain "report", you can add the DstEntityName parameter:

  • DstEntityName: report

Example 2: Query the table's upstream lineage

To find out which tables or tasks generated this table, you need to set it as the destination:

  • DstEntityId: maxcompute-table:::test_project::test_table

  • NeedAttachRelationship: true

You can also use the SrcEntityName parameter to filter upstream sources.

3. Understand the API response

After a successful call to ListLineages, you will receive a list of lineage relationships. Each relationship includes the source entity, the destination entity, and their association information.

Example of a single lineage relationship response (JSON):

{
  "SrcEntity": {
    "Id": "maxcompute-table:::test_project::table_from",
    "Name": "table_from",
    "Attributes": {
      "rawEntityId": "maxcompute-table:::test_project::table_from"
    }
  },
  "DstEntity": {
    "Id": "maxcompute-table:::test_project::table_to",
    "Name": "table_to",
    "Attributes": {
      "project": "test_project",
      "region": "cn-shanghai",
      "table": "table_to"
    }
  },
  "Relationships": [
    {
      "Id": "123456789:maxcompute-table.test_project.table_from:maxcompute-table.test_project.table_to:maxcompute.SQL.76543xxx",
      "CreateTime": 1761089163548,
      "Task": {
        "Id": "76543xxx",
        "Type": "dataworks-sql",
        "Attributes": {
          "engine": "maxcompute",
          "channel": "1st",
          "taskInstanceId": "12345xxx",
          "projectId": "123456",
          "taskId": "76543xxx"
        }
      }
    }
  ]
}

How to read the response:

  • SrcEntity and DstEntity: These represent the upstream and downstream entities of the lineage. You can obtain their Id and then call the GetTable or GetColumn API to retrieve more detailed metadata.

  • Relationships: This describes how SrcEntity and DstEntity are related.

    • Task: This describes the task that generated this lineage relationship. If the task is a DataWorks scheduling task, Task.Attributes will contain the taskId and taskInstanceId. You can use these IDs to call the GetTask API to obtain the task's detailed definition and running status.

Java SDK hands-on tutorial

The following example uses the Java SDK to show the complete process of a lineage query.

1. Prepare the environment

  • JDK version: Make sure you have JDK 8 or a later version installed.

  • Maven dependency: Add the following dependency to your project's pom.xml file. Replace ${latest.version} with the latest SDK version number.

<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>dataworks_public20240518</artifactId>
    <version>${latest.version}</version>
</dependency>

2. Complete code example

The following code shows how to initialize the client, query the upstream and downstream lineage of a specified table, and print key information.

import java.util.List;
import java.util.Map;

import com.aliyun.dataworks_public20240518.Client;
import com.aliyun.dataworks_public20240518.models.GetTableRequest;
import com.aliyun.dataworks_public20240518.models.GetTableResponse;
import com.aliyun.dataworks_public20240518.models.LineageEntity;
import com.aliyun.dataworks_public20240518.models.LineageRelationship;
import com.aliyun.dataworks_public20240518.models.LineageTask;
import com.aliyun.dataworks_public20240518.models.ListLineagesRequest;
import com.aliyun.dataworks_public20240518.models.ListLineagesResponse;
import com.aliyun.dataworks_public20240518.models.ListLineagesResponseBody.ListLineagesResponseBodyPagingInfo;
import com.aliyun.dataworks_public20240518.models.ListLineagesResponseBody.ListLineagesResponseBodyPagingInfoLineages;
import com.aliyun.dataworks_public20240518.models.Table;
import com.aliyun.tea.TeaException;

public class LineageQuerySample {
  /**
     * description