All Products
Search
Document Center

DataWorks:A practical guide to the data lineage OpenAPI

Last Updated:Mar 26, 2026

Use the DataWorks Open API (2024-05-18) to programmatically query upstream and downstream lineage for data tables and fields. This guide covers how to get entity IDs, call ListLineages, parse the response, and run a complete example with the Java SDK.

What is data lineage?

Imagine you're looking at a business report showing a large jump in quarterly sales. As a data analyst, several questions come up immediately:

  • How is the "sales" metric calculated?

  • Does it come from an order table or a payment transaction table?

  • What transformations — cleaning, aggregation, joins — did the data go through?

  • If this metric contains an error, which downstream reports or applications are affected?

image

Data lineage answers these questions. It records how data flows from source tables through transformations to its final consumers. DataWorks automatically captures lineage from computing tasks such as MaxCompute SQL and EMR Spark, and exposes it through the DataWorks Open API.

Use cases

Business question Lineage direction API action
Which tables or jobs produced this table? Upstream Query with DstEntityId
Which downstream reports or tables depend on this table? Downstream Query with SrcEntityId
If I change this table's schema, what breaks? Downstream Query with SrcEntityId, review Relationships
Which tables in my project have no downstream consumers? Downstream Batch-query all tables, filter those with empty results
Where did this data anomaly originate? Upstream Trace iteratively from affected table

Prerequisites

Before you begin, ensure that you have:

  • A DataWorks workspace with Data Map enabled

  • Access to the DataWorks Open API (2024-05-18)

  • (For SDK usage) JDK 8 or later and Maven

Get the entity ID

Every lineage query requires an entity ID — the unique identifier for a data table or field. The entity ID is the core input for all metadata and lineage APIs.

Entity IDs follow this format: maxcompute-table:::test_project::test_table

Get a table entity ID from the console

For a small number of known tables, copy the ID directly from the interface.

  1. Go to the Data Map module in DataWorks.

  2. Search for and open the details page of the target table.

  3. In the Table Basic Information panel on the left, find the Entity ID and copy it.

    image

Get a field entity ID from the console

  1. On the table's details page, switch to the Lineage tab and select Field Lineage.

  2. In the field lineage graph, click the field node.

  3. In the field's details panel on the right, find the Entity ID and copy it.

    image

Get entity IDs in batches using the API

For large-scale analysis, call these APIs instead of using the console:

  • Tables: ListTables — returns a list of tables in Data Map along with their entity IDs

  • Fields: ListColumns — returns fields for a specific table along with their entity IDs

Query lineage with ListLineages

Parameters

Test the API interactively in the OpenAPI Explorer.

Parameter Type Description
SrcEntityId String Pass the upstream entity ID to query downstream lineage. Returns all entities that depend on this entity.
DstEntityId String Pass the downstream entity ID to query upstream lineage. Returns all entities that produce this entity.
SrcEntityName String Used with DstEntityId. Applies a fuzzy name filter to upstream results.
DstEntityName String Used with SrcEntityId. Applies a fuzzy name filter to downstream results.
NeedAttachRelationship Boolean Set to true to include full relationship details — including the task that created each lineage edge — in the response.
Important

If you provide both SrcEntityId and DstEntityId, the API returns the lineage relationship between those two specific entities. If both values are the same entity ID, the API returns a self-referencing relationship.

Example 1: Query downstream lineage

Find all tables that consume test_table:

SrcEntityId:            maxcompute-table:::test_project::test_table
NeedAttachRelationship: true

To filter results to downstream tables whose names contain "report", add:

DstEntityName: report

Example 2: Query upstream lineage

Find all tables and tasks that produced test_table:

DstEntityId:            maxcompute-table:::test_project::test_table
NeedAttachRelationship: true

To filter upstream results by name, add SrcEntityName.

Response structure

A successful response contains a list of lineage relationship objects. Each object has this structure:

{
  "SrcEntity": {
    "Id": "maxcompute-table:::test_project::table_from",
    "Name": "table_from",
    "Attributes": {
      "rawEntityId": "maxcompute-table:::test_project::table_from"
    }
  },
  "DstEntity": {
    "Id": "maxcompute-table:::test_project::table_to",
    "Name": "table_to",
    "Attributes": {
      "project": "test_project",
      "region": "cn-shanghai",
      "table": "table_to"
    }
  },
  "Relationships": [
    {
      "Id": "123456789:maxcompute-table.test_project.table_from:maxcompute-table.test_project.table_to:maxcompute.SQL.76543xxx",
      "CreateTime": 1761089163548,
      "Task": {
        "Id": "76543xxx",
        "Type": "dataworks-sql",
        "Attributes": {
          "engine": "maxcompute",
          "channel": "1st",
          "taskInstanceId": "12345xxx",
          "projectId": "123456",
          "taskId": "76543xxx"
        }
      }
    }
  ]
}

Key fields:

  • SrcEntity / DstEntity: The upstream and downstream data entities. Use the Id field to call GetTable or GetColumn for full metadata.

  • Relationships: The edges connecting the two entities. Each edge describes the task that wrote data from source to destination.

  • Task: The computing or scheduling task that created this lineage edge. For DataWorks scheduling tasks, Task.Attributes includes taskId and taskInstanceId. Pass these to GetTask to retrieve the task definition and run status.

ListLineages returns one hop at a time. To trace multi-hop lineage — for example, finding the root source of a table that is itself derived from other tables — call the API iteratively, using the SrcEntity.Id or DstEntity.Id from each response as the input for the next call.

Java SDK tutorial

Set up the project

  1. Make sure JDK 8 or later is installed.

  2. Add the following dependency to your pom.xml. Replace ${latest.version} with the version from the SDK page.

<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>dataworks_public20240518</artifactId>
    <version>${latest.version}</version>
</dependency>

Query lineage

The following example initializes the client, queries both upstream and downstream lineage for a specified table, and prints a human-readable summary.

All requests use the same pattern: set either SrcEntityId (for downstream) or DstEntityId (for upstream), enable NeedAttachRelationship, and iterate over the response to extract entity names and task IDs.

import java.util.List;
import java.util.Map;

import com.aliyun.dataworks_public20240518.Client;
import com.aliyun.dataworks_public20240518.models.GetTableRequest;
import com.aliyun.dataworks_public20240518.models.GetTableResponse;
import com.aliyun.dataworks_public20240518.models.LineageEntity;
import com.aliyun.dataworks_public20240518.models.LineageRelationship;
import com.aliyun.dataworks_public20240518.models.LineageTask;
import com.aliyun.dataworks_public20240518.models.ListLineagesRequest;
import com.aliyun.dataworks_public20240518.models.ListLineagesResponse;
import com.aliyun.dataworks_public20240518.models.ListLineagesResponseBody.ListLineagesResponseBodyPagingInfo;
import com.aliyun.dataworks_public20240518.models.ListLineagesResponseBody.ListLineagesResponseBodyPagingInfoLineages;
import com.aliyun.dataworks_public20240518.models.Table;
import com.aliyun.tea.TeaException;

public class LineageQuerySample {
    // Replace with your actual entity ID from Data Map
    private static final String TARGET_ENTITY_ID = "maxcompute-table:::test_project::test_table";

    public static void main(String[] args) throws Exception {
        // Initialize the client using credentials from environment variables
        com.aliyun.teaopenapi.models.Config config = new com.aliyun.teaopenapi.models.Config()
            .setAccessKeyId(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"))
            .setAccessKeySecret(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"))
            .setEndpoint("dataworks.cn-shanghai.aliyuncs.com");  // Replace with your region endpoint
        Client client = new Client(config);

        System.out.println("=== Downstream lineage (tables that consume " + TARGET_ENTITY_ID + ") ===");
        queryLineage(client, TARGET_ENTITY_ID, null);

        System.out.println("\n=== Upstream lineage (tables that produce " + TARGET_ENTITY_ID + ") ===");
        queryLineage(client, null, TARGET_ENTITY_ID);
    }

    /**
     * Queries one hop of lineage for a given entity.
     * @param srcEntityId  Set to query downstream lineage; leave null to query upstream.
     * @param dstEntityId  Set to query upstream lineage; leave null to query downstream.
     */
    static void queryLineage(Client client, String srcEntityId, String dstEntityId) throws Exception {
        ListLineagesRequest request = new ListLineagesRequest()
            .setSrcEntityId(srcEntityId)
            .setDstEntityId(dstEntityId)
            .setNeedAttachRelationship(true);

        try {
            ListLineagesResponse response = client.listLineages(request);
            ListLineagesResponseBodyPagingInfo pagingInfo = response.getBody().getPagingInfo();

            if (pagingInfo == null || pagingInfo.getLineages() == null || pagingInfo.getLineages().isEmpty()) {
                System.out.println("  No lineage relationships found.");
                return;
            }

            for (ListLineagesResponseBodyPagingInfoLineages lineage : pagingInfo.getLineages()) {
                LineageEntity src = lineage.getSrcEntity();
                LineageEntity dst = lineage.getDstEntity();
                System.out.printf("  %s --> %s%n",
                    src != null ? src.getName() : "unknown",
                    dst != null ? dst.getName() : "unknown");

                // Print the task that created each lineage edge
                if (lineage.getRelationships() != null) {
                    for (LineageRelationship rel : lineage.getRelationships()) {
                        LineageTask task = rel.getTask();
                        if (task != null && task.getAttributes() != null) {
                            Map<String, String> attrs = task.getAttributes();
                            System.out.printf("    created by task %s (instance: %s)%n",
                                attrs.getOrDefault("taskId", "N/A"),
                                attrs.getOrDefault("taskInstanceId", "N/A"));
                        }
                    }
                }
            }
        } catch (TeaException e) {
            System.err.printf("API error: %s - %s%n", e.getCode(), e.getMessage());
            throw e;
        }
    }
}

Expected output:

=== Downstream lineage (tables that consume maxcompute-table:::test_project::test_table) ===
  test_table --> sales_report
    created by task 76543xxx (instance: 12345xxx)

=== Upstream lineage (tables that produce maxcompute-table:::test_project::test_table) ===
  orders_raw --> test_table
    created by task 65432yyy (instance: 11111yyy)

Once you have the entity IDs from the response, call GetTable for full table metadata or GetTask for the task definition.

What's next