Use the DataWorks Open API (2024-05-18) to programmatically query upstream and downstream lineage for data tables and fields. This guide covers how to get entity IDs, call ListLineages, parse the response, and run a complete example with the Java SDK.
What is data lineage?
Imagine you're looking at a business report showing a large jump in quarterly sales. As a data analyst, several questions come up immediately:
-
How is the "sales" metric calculated?
-
Does it come from an order table or a payment transaction table?
-
What transformations — cleaning, aggregation, joins — did the data go through?
-
If this metric contains an error, which downstream reports or applications are affected?
Data lineage answers these questions. It records how data flows from source tables through transformations to its final consumers. DataWorks automatically captures lineage from computing tasks such as MaxCompute SQL and EMR Spark, and exposes it through the DataWorks Open API.
Use cases
| Business question | Lineage direction | API action |
|---|---|---|
| Which tables or jobs produced this table? | Upstream | Query with DstEntityId |
| Which downstream reports or tables depend on this table? | Downstream | Query with SrcEntityId |
| If I change this table's schema, what breaks? | Downstream | Query with SrcEntityId, review Relationships |
| Which tables in my project have no downstream consumers? | Downstream | Batch-query all tables, filter those with empty results |
| Where did this data anomaly originate? | Upstream | Trace iteratively from affected table |
Prerequisites
Before you begin, ensure that you have:
-
A DataWorks workspace with Data Map enabled
-
Access to the DataWorks Open API (
2024-05-18) -
(For SDK usage) JDK 8 or later and Maven
Get the entity ID
Every lineage query requires an entity ID — the unique identifier for a data table or field. The entity ID is the core input for all metadata and lineage APIs.
Entity IDs follow this format: maxcompute-table:::test_project::test_table
Get a table entity ID from the console
For a small number of known tables, copy the ID directly from the interface.
-
Go to the Data Map module in DataWorks.
-
Search for and open the details page of the target table.
-
In the Table Basic Information panel on the left, find the Entity ID and copy it.

Get a field entity ID from the console
-
On the table's details page, switch to the Lineage tab and select Field Lineage.
-
In the field lineage graph, click the field node.
-
In the field's details panel on the right, find the Entity ID and copy it.

Get entity IDs in batches using the API
For large-scale analysis, call these APIs instead of using the console:
-
Tables:
ListTables— returns a list of tables in Data Map along with their entity IDs -
Fields:
ListColumns— returns fields for a specific table along with their entity IDs
Query lineage with ListLineages
Parameters
Test the API interactively in the OpenAPI Explorer.
| Parameter | Type | Description |
|---|---|---|
SrcEntityId |
String | Pass the upstream entity ID to query downstream lineage. Returns all entities that depend on this entity. |
DstEntityId |
String | Pass the downstream entity ID to query upstream lineage. Returns all entities that produce this entity. |
SrcEntityName |
String | Used with DstEntityId. Applies a fuzzy name filter to upstream results. |
DstEntityName |
String | Used with SrcEntityId. Applies a fuzzy name filter to downstream results. |
NeedAttachRelationship |
Boolean | Set to true to include full relationship details — including the task that created each lineage edge — in the response. |
If you provide both SrcEntityId and DstEntityId, the API returns the lineage relationship between those two specific entities. If both values are the same entity ID, the API returns a self-referencing relationship.
Example 1: Query downstream lineage
Find all tables that consume test_table:
SrcEntityId: maxcompute-table:::test_project::test_table
NeedAttachRelationship: true
To filter results to downstream tables whose names contain "report", add:
DstEntityName: report
Example 2: Query upstream lineage
Find all tables and tasks that produced test_table:
DstEntityId: maxcompute-table:::test_project::test_table
NeedAttachRelationship: true
To filter upstream results by name, add SrcEntityName.
Response structure
A successful response contains a list of lineage relationship objects. Each object has this structure:
{
"SrcEntity": {
"Id": "maxcompute-table:::test_project::table_from",
"Name": "table_from",
"Attributes": {
"rawEntityId": "maxcompute-table:::test_project::table_from"
}
},
"DstEntity": {
"Id": "maxcompute-table:::test_project::table_to",
"Name": "table_to",
"Attributes": {
"project": "test_project",
"region": "cn-shanghai",
"table": "table_to"
}
},
"Relationships": [
{
"Id": "123456789:maxcompute-table.test_project.table_from:maxcompute-table.test_project.table_to:maxcompute.SQL.76543xxx",
"CreateTime": 1761089163548,
"Task": {
"Id": "76543xxx",
"Type": "dataworks-sql",
"Attributes": {
"engine": "maxcompute",
"channel": "1st",
"taskInstanceId": "12345xxx",
"projectId": "123456",
"taskId": "76543xxx"
}
}
}
]
}
Key fields:
-
SrcEntity/DstEntity: The upstream and downstream data entities. Use theIdfield to callGetTableorGetColumnfor full metadata. -
Relationships: The edges connecting the two entities. Each edge describes the task that wrote data from source to destination. -
Task: The computing or scheduling task that created this lineage edge. For DataWorks scheduling tasks,Task.AttributesincludestaskIdandtaskInstanceId. Pass these toGetTaskto retrieve the task definition and run status.
ListLineagesreturns one hop at a time. To trace multi-hop lineage — for example, finding the root source of a table that is itself derived from other tables — call the API iteratively, using theSrcEntity.IdorDstEntity.Idfrom each response as the input for the next call.
Java SDK tutorial
Set up the project
-
Make sure JDK 8 or later is installed.
-
Add the following dependency to your
pom.xml. Replace${latest.version}with the version from the SDK page.
<dependency>
<groupId>com.aliyun</groupId>
<artifactId>dataworks_public20240518</artifactId>
<version>${latest.version}</version>
</dependency>
Query lineage
The following example initializes the client, queries both upstream and downstream lineage for a specified table, and prints a human-readable summary.
All requests use the same pattern: set either SrcEntityId (for downstream) or DstEntityId (for upstream), enable NeedAttachRelationship, and iterate over the response to extract entity names and task IDs.
import java.util.List;
import java.util.Map;
import com.aliyun.dataworks_public20240518.Client;
import com.aliyun.dataworks_public20240518.models.GetTableRequest;
import com.aliyun.dataworks_public20240518.models.GetTableResponse;
import com.aliyun.dataworks_public20240518.models.LineageEntity;
import com.aliyun.dataworks_public20240518.models.LineageRelationship;
import com.aliyun.dataworks_public20240518.models.LineageTask;
import com.aliyun.dataworks_public20240518.models.ListLineagesRequest;
import com.aliyun.dataworks_public20240518.models.ListLineagesResponse;
import com.aliyun.dataworks_public20240518.models.ListLineagesResponseBody.ListLineagesResponseBodyPagingInfo;
import com.aliyun.dataworks_public20240518.models.ListLineagesResponseBody.ListLineagesResponseBodyPagingInfoLineages;
import com.aliyun.dataworks_public20240518.models.Table;
import com.aliyun.tea.TeaException;
public class LineageQuerySample {
// Replace with your actual entity ID from Data Map
private static final String TARGET_ENTITY_ID = "maxcompute-table:::test_project::test_table";
public static void main(String[] args) throws Exception {
// Initialize the client using credentials from environment variables
com.aliyun.teaopenapi.models.Config config = new com.aliyun.teaopenapi.models.Config()
.setAccessKeyId(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"))
.setAccessKeySecret(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"))
.setEndpoint("dataworks.cn-shanghai.aliyuncs.com"); // Replace with your region endpoint
Client client = new Client(config);
System.out.println("=== Downstream lineage (tables that consume " + TARGET_ENTITY_ID + ") ===");
queryLineage(client, TARGET_ENTITY_ID, null);
System.out.println("\n=== Upstream lineage (tables that produce " + TARGET_ENTITY_ID + ") ===");
queryLineage(client, null, TARGET_ENTITY_ID);
}
/**
* Queries one hop of lineage for a given entity.
* @param srcEntityId Set to query downstream lineage; leave null to query upstream.
* @param dstEntityId Set to query upstream lineage; leave null to query downstream.
*/
static void queryLineage(Client client, String srcEntityId, String dstEntityId) throws Exception {
ListLineagesRequest request = new ListLineagesRequest()
.setSrcEntityId(srcEntityId)
.setDstEntityId(dstEntityId)
.setNeedAttachRelationship(true);
try {
ListLineagesResponse response = client.listLineages(request);
ListLineagesResponseBodyPagingInfo pagingInfo = response.getBody().getPagingInfo();
if (pagingInfo == null || pagingInfo.getLineages() == null || pagingInfo.getLineages().isEmpty()) {
System.out.println(" No lineage relationships found.");
return;
}
for (ListLineagesResponseBodyPagingInfoLineages lineage : pagingInfo.getLineages()) {
LineageEntity src = lineage.getSrcEntity();
LineageEntity dst = lineage.getDstEntity();
System.out.printf(" %s --> %s%n",
src != null ? src.getName() : "unknown",
dst != null ? dst.getName() : "unknown");
// Print the task that created each lineage edge
if (lineage.getRelationships() != null) {
for (LineageRelationship rel : lineage.getRelationships()) {
LineageTask task = rel.getTask();
if (task != null && task.getAttributes() != null) {
Map<String, String> attrs = task.getAttributes();
System.out.printf(" created by task %s (instance: %s)%n",
attrs.getOrDefault("taskId", "N/A"),
attrs.getOrDefault("taskInstanceId", "N/A"));
}
}
}
}
} catch (TeaException e) {
System.err.printf("API error: %s - %s%n", e.getCode(), e.getMessage());
throw e;
}
}
}
Expected output:
=== Downstream lineage (tables that consume maxcompute-table:::test_project::test_table) ===
test_table --> sales_report
created by task 76543xxx (instance: 12345xxx)
=== Upstream lineage (tables that produce maxcompute-table:::test_project::test_table) ===
orders_raw --> test_table
created by task 65432yyy (instance: 11111yyy)
Once you have the entity IDs from the response, call GetTable for full table metadata or GetTask for the task definition.
What's next
-
Entity ID reference — learn the entity ID format and how to construct one
-
ListTablesAPI — batch-retrieve table entity IDs from Data Map -
ListColumnsAPI — batch-retrieve field entity IDs from a table -
GetTableAPI — get full metadata for a table entity -
GetColumnAPI — get full metadata for a field entity -
GetTaskAPI — get task definition and run status