All Products
Search
Document Center

DataWorks:View data lineages

Last Updated:Mar 26, 2026

Data Map tracks the full lifecycle of your data — where it originates, how it moves through pipelines, and what depends on it. Use data lineages to debug broken pipelines, assess the impact of schema changes before applying them, and satisfy compliance audits that require traceable data flows.

From a table's or DataService Studio API's details page, the Lineage tab gives you both table-level and field-level lineage graphs, along with tools to act on what you find.

Note

Data Map builds lineages from scheduling jobs and data forwarding metadata. Lineages from temporary queries and other manual operations are not included. Offline data lineage is updated on a T+1 basis.

Table lineages

Access the lineage view

  1. Find a table in Data Map and open its details page.

  2. Click the Lineage tab.

From the Lineage tab, you can:

  • View table-level and field-level lineage graphs

  • Run impact analysis to see which downstream tables are affected by a change

  • Retrieve and download the list of descendant tables as a local file

  • Send change notifications by email

image.png

Limitations by data source

Not all data sources support lineage collection out of the box. Some require additional configuration; others have SQL-level restrictions.

E-MapReduce

  • To collect metadata for a DataLake or custom cluster, configure EMR-HOOK in the cluster first. Without EMR-HOOK, Data Map cannot display lineages for that cluster. See Configure EMR-HOOK for Hive.

  • Lineage is not available for Spark clusters created on the EMR on ACK page.

  • Lineage is available for EMR Serverless Spark clusters.

  • Lineage is not available for tasks developed using an EMR Presto node.

AnalyticDB for MySQL

Note

Submit a ticket to enable data lineage for your AnalyticDB for MySQL instance before using the feature.

AnalyticDB for MySQL supports real-time lineage when the metadata source is AnalyticDB for Spark. If the metadata source is AnalyticDB for Spark, data is collected automatically. To enable real-time lineage, set the following Spark parameter:

spark.sql.queryExecutionListeners = com.aliyun.dataworks.meta.lineage.LineageListener

SQL statements that do not generate lineage

Data Map cannot parse lineage from SQL that uses JOIN, UNION, the wildcard *, or subqueries. For example:

-- Lineage not supported: contains a wildcard (*)
INSERT INTO test SELECT * FROM test1, test2 WHERE test1.id = test2.id

-- Lineage not supported: contains a subquery
SELECT column1, column2 FROM table1 WHERE column3 IN (SELECT column4 FROM table2 WHERE column5 = 'value')

SQL statements that generate lineage

Lineage is captured for INSERT INTO, INSERT OVERWRITE, and CREATE TABLE AS SELECT when specific columns (not *) are listed:

-- Create a table from specific columns
CREATE TABLE test AS SELECT id, name FROM test1;

-- Insert specific columns with a filter
INSERT INTO test SELECT id, name FROM test1 WHERE name = 'test';

-- Overwrite using specific columns
INSERT OVERWRITE INTO db_name.test SELECT id, name FROM test1;

CDH

To display table lineage for CDH Spark SQL and CDH Spark nodes, add a Spark parameter to the relevant data transformation module in Management Center.

  1. Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose More > Management Center. Select your workspace from the drop-down list and click Go to Management Center.

  2. In the left navigation pane, click Cluster Management and locate the target CDH cluster.

  3. Click Edit Spark-related Parameter.

    image

  4. Add the following Spark parameter for the data transformation module. For example, to capture lineage for CDH Spark SQL and CDH Spark nodes under Operation Center > Auto Triggered Instances, add:

    FieldValue
    Spark Property Namespark.sql.queryExecutionListeners
    Spark Property Valuecom.aliyun.dataworks.meta.lineage.LineageListener
  5. Click OK.

Lindorm

Note

Lineage collection is only supported in instance mode. Connection string mode is not supported.

To display table lineage for Lindorm Spark and Lindorm Spark SQL nodes, add a Spark parameter in Management Center.

  1. Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose More > Management Center. Select your workspace and click Go to Management Center.

  2. In the left navigation pane, click Computing Resource and locate your Lindorm computing resource.

  3. Click Edit Spark-related Parameter.

  4. Add the following Spark parameter for the data transformation module. For example, to capture lineage for Lindorm Spark and Lindorm Spark SQL nodes under Operation Center > Auto Triggered Instances, add:

    FieldValue
    Spark Property Namespark.sql.queryExecutionListeners
    Spark Property Valuecom.aliyun.dataworks.meta.lineage.LineageListener
  5. Click OK.

Lineage support by data source

The table below shows which lineage types each data source supports, broken down by Data Integration (sync jobs) and Data Studio (SQL-based transformation jobs).

Legend: Y = supported, N = not supported

Data sourceData Integration — table-levelData Integration — field-levelData Studio — table-levelData Studio — field-level
AnalyticDB for MySQLBatch sync Y / Real-time sync NBatch sync Y / Real-time sync NINSERT INTO/OVERWRITE Y, CREATE AS SELECT Y, CREATE EXTERNAL TABLE NSame as table-level
AnalyticDB for PostgreSQLOffline sync Y / Real-time sync NBatch sync Y / Real-time sync NINSERT INTO/OVERWRITE Y, CREATE AS SELECT Y, CREATE EXTERNAL TABLE NSame as table-level
ClickHouseNNNN
CDH/CDPNNHive, Impala, Spark, Spark SQL: INSERT INTO/OVERWRITE Y, CREATE AS SELECT Y, CREATE EXTERNAL TABLE NSame as table-level
E-MapReduceBatch sync Y (OSS, Hive) / Real-time sync NBatch sync Y (OSS, Hive) / Real-time sync NHive, Spark (spark-submit), Spark SQL (Hudi format supported), Shell (Hive SQL via beeline): INSERT INTO/OVERWRITE Y, CREATE AS SELECT Y, CREATE EXTERNAL TABLE NSame as table-level
HologresBatch sync Y / Real-time sync Y (from MySQL, Kafka, or Log Service)Batch sync Y / Real-time sync NINSERT INTO/OVERWRITE Y, CREATE AS SELECT Y, CREATE EXTERNAL TABLE YSame as table-level
KafkaOffline sync Y / Real-time sync Y (to MaxCompute or Hologres)NNN
LindormNNINSERT INTO/OVERWRITE Y, CREATE AS SELECT Y, CREATE TABLE Y, CREATE TABLE LIKE YSame as table-level
MaxComputeBatch sync Y / Real-time sync Y (from MySQL, Kafka, PolarDB for MySQL, or Log Service)Batch sync Y / Real-time sync NINSERT INTO/OVERWRITE Y, CREATE AS SELECT Y, CREATE EXTERNAL TABLE YSame as table-level
MySQLBatch sync Y / Real-time sync Y (to MaxCompute or Hologres)Batch sync Y / Real-time sync NNN
OracleBatch sync Y / Real-time sync NBatch sync Y / Real-time sync NNN
OceanBaseOffline sync Y / Real-time sync NBatch sync Y / Real-time sync NNN
OSSBatch sync Y / Real-time sync NNNN
PolarDB for MySQLBatch sync Y / Real-time sync Y (to MaxCompute)Batch sync Y / Real-time sync NNN
PolarDB for PostgreSQLBatch sync Y / Real-time sync NBatch sync Y / Real-time sync NNN
PostgreSQLBatch sync Y / Real-time sync NBatch sync Y / Real-time sync NNN
StarRocksNNINSERT INTO/OVERWRITE Y, CREATE AS SELECT Y, CREATE EXTERNAL TABLE NSame as table-level
SQL ServerBatch sync Y / Real-time sync NOffline sync Y / Real-time sync NNN
Tablestore (OTS)Offline sync Y / Real-time sync NBatch sync Y / Real-time sync NNN

DataService Studio API lineages

Find a DataService Studio API in Data Map and open its details page. Click the Lineage tab to view the API's lineage.

image.png