All Products
Search
Document Center

DataWorks:Details of open data table structures

Last Updated:Feb 27, 2026

DataWorks open data provides tables and views across various dimensions to help you collect metadata. This topic describes the tables and views available in DataWorks open data and their detailed structures.

Metadata

This set of metadata tables and example metric statistics tables is generated by DataWorks based on the metadata of the current tenant's tables, tasks, instances, workspaces, members, and projects. The actual table structures are subject to dynamic adjustments based on business development. The content displayed on the system interface prevails.

Data asset metadata

Asset table issue details (asset_table_issues)

Partition field: dt

Description: Details of data governance issues for tables.

Field

Type

Description

tenant_id

string

DataWorks tenant.

meta_entity_id

string

ID of the corresponding metadata entity.

uuid

string

Unique key of the table.

meta_entity_type

string

Type of the corresponding metadata, such as maxcompute-table.

entity_type

string

Entity type, such as table, view, or materialized_view.

account_id

string

Alibaba Cloud account to which the asset belongs.

datasource_type

string

Data source type, such as EMR or MaxCompute.

datasource_id

string

Compute engine name (MaxCompute: projectName, EMR: clusterId, Hologres: databaseName).

catalog_name

string

The DLF data catalog when the metadata is from DLF.

database_name

string

Database name (EMR dbName).

schema_name

string

Schema name.

rule_id

string

Governance item identifier.

rule_name_zh

string

Chinese name of the governance item.

rule_name_en

string

English name of the governance item.

category

string

Dimension to which the item belongs.

deduct_score_tenant

string

Global score deduction, accurate to four decimal places.

deduct_score_owner

string

Individual score deduction, accurate to four decimal places.

cost

string

Wasted resources.

project_id

string

DataWorks project.

dt

string

Date partition (logical partition field) in the YYYYMMDD format.

Asset table metric details (asset_table_profiles)

Partition field: dt

Description: Details of table metrics.

Field

Type

Description

tenant_id

bigint

Source tenant ID.

meta_entity_id

string

ID of the corresponding metadata entity.

meta_entity_type

string

Type of the corresponding metadata, such as maxcompute-table.

entity_type

string

Entity type, such as table, view, or materialized_view.

account_id

string

Alibaba Cloud account to which the asset belongs.

datasource_type

string

Data source type, such as EMR or MaxCompute.

datasource_id

string

Compute engine name (MaxCompute: projectName, EMR: clusterId, Hologres: databaseName).

catalog_name

string

The DLF data catalog when the metadata is from DLF.

database_name

string

Database name (EMR dbName).

schema_name

string

Schema name.

uuid

string

Unique key of the table.

name

string

Table name.

owner

string

Asset owner.

last_access_timestamp

bigint

Time when the table was last accessed.

meta_modified_timestamp

bigint

13-digit timestamp indicating when the table metadata was modified.

data_modified_timestamp

bigint

13-digit timestamp indicating when the table data was modified.

create_timestamp

bigint

Time when the table was created.

comment

string

Table comment.

partition_keys

string

Partition key.

tags

string

Asset tags.

governance_rule_finding_count

bigint

Number of governance item issues.

governance_rule_finding_history_count

string

Historical number of asset governance items.

governance_health_score

string

Asset governance score.

governance_health_level

string

Asset governance score level.

is_partitioned

bigint

Indicates whether the table is a partitioned table.

content_size

bigint

Logical size.

record_num

bigint

Number of records.

life_cycle

string

Lifecycle.

partition_count

bigint

Number of partitions.

view_count_monthly

bigint

Number of views in the last month.

access_count

bigint

Number of accesses.

upstream_table_count

bigint

Number of upstream tables.

upstream_table_detail

string

Details of upstream tables.

downstream_table_count

bigint

Number of downstream tables.

downstream_table_detail

string

Details of downstream tables.

producing_project_ids

string

List of workspaces involved in table output.

producing_tasks_count

bigint

Number of nodes involved in table output.

producing_tasks_detail

string

Details of nodes involved in table output.

using_tasks_count

bigint

Number of nodes that use the table.

using_tasks_detail

string

Details of nodes that use the table.

quality_rule_count

bigint

Number of quality rules.

quality_monitor_count

bigint

Number of quality monitoring metrics.

quality_rule_7_days_failed_count

bigint

Number of failed quality rules.

quality_monitor_7_days_failed_count

bigint

Number of failed quality monitoring metrics.

dt

string

Date partition (logical partition field) in the YYYYMMDD format.

Asset task issue details (asset_task_issues)

Partition field: dt

Description: Details of data governance issues for tasks.

Field

Type

Description

tenant_id

string

DataWorks tenant ID.

node_id

string

Scheduling node ID.

node_name

string

Node name.

node_type

string

Task type: SQL, SQLCost, LOT, or CUPID.

node_owner

string

Base ID of the owner.

priority

string

Priority.

rule_id

string

Governance item identifier.

rule_name_zh

string

Chinese name of the governance item.

rule_name_en

string

English name of the governance item.

category

string

The governance domain to which it belongs.

deduct_score_tenant

string

Global score deduction, accurate to four decimal places.

deduct_score_owner

string

Individual score deduction, accurate to four decimal places.

cost

string

Governance benefits.

project_id

string

DataWorks project ID.

dt

string

Date partition (logical partition field) in the YYYYMMDD format.

Asset task metric details (asset_task_profiles)

Partition field: dt

Description: Details of task metrics.

Field

Type

Description

tenant_id

bigint

Source tenant ID.

data_asset_id

string

Asset ID within the module, corresponding to task.id.

name

string

Asset name, corresponding to task.name.

project_id

bigint

Workspace where the asset is located.

project_env

string

Environment. PROD: production. DEV: development.

owner

string

Asset owner.

create_user

string

Creator.

create_time

bigint

Creation time.

modify_user

string

Modifier.

modify_time

bigint

Modification time.

trigger_type

string

Trigger method type. Scheduler: triggered by a scheduling cycle. Manual: triggered manually.

trigger_recurrence_type

string

Normal: runs normally. Manual: manual task. Pause: paused. Skip: dry-run.

trigger_cron

string

cron expression.

type

bigint

Execution code type. For more information, see the node type codes in Node development.

script_parameters

string

Parameter information.

priority

bigint

Task priority. The value ranges from 1 to 8. A larger value indicates a higher priority. The default priority is 1.

trigger_start_time

bigint

Start date for scheduling.

trigger_end_time

bigint

End date for scheduling.

runtime_resource_group_id

bigint

ID of the resource group to which the node belongs.

runtime_cu

string

Computing CUs.

baseline_id

bigint

ID of the baseline to which the node belongs.

rerun_times

bigint

Number of times the task can be rerun.

rerun_interval

bigint

Rerun interval in milliseconds.

rerun_mode_type

string

AllAllowed: Rerun is allowed on both failure and success. FailureAllowed: Rerun is allowed only on failure. AllDenied: Rerun is not allowed on failure or success.

tags

string

Asset tags.

tags_count

bigint

Number of asset tags.

input_table_count

bigint

Number of input tables.

output_table_count

bigint

Number of output tables.

input_table_detail

string

Details of input tables.

output_table_detail

string

Details of output tables.

upstream_node_count

bigint

Number of upstream nodes.

downstream_node_count

bigint

Number of downstream nodes.

governance_rule_finding_count

bigint

Number of governance item issues.

governance_rule_finding_history_count

string

Historical number of asset governance items.

governance_health_score

string

Asset score.

governance_health_level

string

Asset score level.

engine_datasource_id

string

Compute engine ID.

engine_instance_count

bigint

Number of compute engine jobs.

engine_instance_run_time

bigint

Runtime of compute engine jobs.

engine_instance_comput_volume_cost

string

Computation volume.

engine_instance_cu_cost

string

Computing CUs.

engine_instance_cpu_cost

string

CPU consumption.

engine_instance_mem_cost

string

Memory consumption.

engine_instance_exist_data_skew

bigint

Data skew.

engine_instance_suggestions

string

Suggestions for data skew.

engine_instance_data_skew_ids

string

Job IDs with data skew.

engine_instance_ids

string

Job IDs.

task_instance_wait_time_cost_sum

bigint

Total wait time.

task_instance_wait_time_cost_max

bigint

Maximum instance wait time.

task_instance_run_time_cost_sum

bigint

Total runtime.

task_instance_run_time_cost_max

bigint

Maximum runtime.

task_instance_7_days_wait_time_cost_max

bigint

Maximum instance wait time in the last 7 days.

task_instance_7_days_run_time_cost_max

bigint

Maximum instance runtime in the last 7 days.

task_instance_count

bigint

Number of instances.

task_instance_7_days_failed_count

bigint

Number of failed instances.

task_instance_7_days_failed_day_count

bigint

Number of days with failures.

task_instance_7_days_frezeed_day_count

bigint

Number of days frozen.

task_instance_7_days_dry_run_day_count

bigint

Number of days with dry-runs.

quality_monitor_count

bigint

Number of Data Quality monitoring metrics.

quality_monitor_7_days_failed_count

bigint

Number of failed Data Quality monitoring metrics.

di_task_resource_group_id

string

ID of the data integration resource group to which the node belongs.

di_task_is_public_network

bigint

Indicates whether the data integration task uses Internet traffic.

di_task_concurrency

bigint

Concurrency.

di_task_total_records

bigint

Number of synchronized records.

di_task_total_bytes

bigint

Volume of synchronized data.

di_task_source_type

string

Source type.

di_task_target_type

string

Target type.

di_task_run_time_cost

bigint

Execution duration of the data integration task.

di_task_wait_time_cost

bigint

Wait time of the data integration task.

dt

string

Date partition (logical partition field) in the YYYYMMDD format.

Data Quality

Data quality rule instances (quality_rule_results)

Partition field: dt

Description: Data quality rule instances.

Field

Type

Description

id

bigint

Primary key ID.

scan_run_id

bigint

Quality monitoring instance ID.

rule_id

bigint

Rule ID.

rule_name

string

Rule name.

status

string

Rule check result: Pass, Error, Warn, Fail, or Running.

severity

string

Rule severity: High or Normal.

create_time

bigint

Creation time.

modify_time

bigint

Time of the last modification.

spec

string

Rule instance spec.

tags

array<string>

Rule instance tags.

tenant_id

bigint

DataWorks tenant ID.

project_id

bigint

DataWorks project ID.

meta_entity_id

string

Unique identifier of the Data Map table entity.

dt

string

Date partition in the YYYYMMDD format. The value can be any date from D days ago to yesterday.

Data quality rule metric details (quality_rules)

Partition field: dt

Description: Details of data quality rule metrics.

Field

Type

Description

id

bigint

Primary key ID.

scan_id

bigint

Quality monitoring ID.

rule_name

string

Rule name.

enabled

boolean

Indicates whether the rule is enabled.

severity

string

Business severity level of the rule. Enumeration values: High, Normal.

create_time

bigint

Creation time.

modify_time

bigint

Time of the last modification.

spec

string

Rule spec.

tags

array<string>

Rule tags.

tenant_id

bigint

DataWorks tenant ID.

project_id

bigint

DataWorks project ID.

meta_entity_id

string

Unique identifier of the Data Map entity.

pass_count

int

Number of times the rule check passed.

warn_count

int

Number of times the rule check triggered the warning threshold.

error_count

int

Number of times the rule check triggered the error threshold.

fail_count

int

Number of times the rule check failed.

dt

string

Date partition in the YYYYMMDD format. The value can be any date from D days ago to yesterday.

Data quality monitoring task instances (quality_scan_runs)

Partition field: dt

Description: Data quality monitoring task instances.

Field

Type

Description

id

bigint

Primary key ID.

scan_id

bigint

Quality monitoring ID.

name

string

Monitoring name.

status

string

Monitoring instance status: Pass, Warn, Error, Fail, or Running.

post_action_type

string

Action taken after the monitoring check. Enumeration values: Alert or BlockTaskInstance.

data_filter

string

The data scope actually used for sampling.

trigger_time

bigint

The scheduled time used by the task.

trigger_type

string

Trigger method for Data Quality monitoring: ByManual, BySchedule, or ByQualityNode.

create_time

bigint

Creation time.

modify_time

bigint

Time of the last update.

datasource_id

bigint

ID of the data source to which the table belongs.

datasource_type

string

Data source type.

computing_resource_id

bigint

Compute engine ID.

compute_resource_option

string

Computing resource used for running Data Quality monitoring.

spec

string

Quality monitoring spec.

tenant_id

bigint

DataWorks tenant ID.

project_id

bigint

DataWorks project ID.

owner

string

Owner of the quality monitoring task.

task_id

bigint

Scheduling task ID.

task_instance_id

bigint

Scheduling task instance ID.

meta_entity_id

string

Unique identifier of the Data Map entity.

table_name

string

Table name.

catalog_name

string

Name of the data catalog to which the table belongs.

schema_name

string

Name of the schema to which the table belongs.

database_name

string

Name of the database to which the table belongs.

cluster_id

string

ID of the cluster to which the table belongs.

dt

string

Date partition in the YYYYMMDD format. The value can be any date from D days ago to yesterday.

Data quality monitoring task metric details (quality_scans)

Partition field: dt

Description: Details of data quality monitoring task metrics.

Field

Type

Description

id

bigint

Primary key ID.

name

string

Monitoring name.

data_filter_type

string

Data scope type: ByPartition or ByWhere.

data_filter

string

Data scope expression.

trigger_type

string

Trigger method for Data Quality monitoring: ByManual, BySchedule, or ByQualityNode.

create_time

bigint

Creation time.

modify_time

bigint

Time of the last update.

computing_resource_id

bigint

Compute engine ID.

compute_resource_option

string

Computing resource used for running Data Quality monitoring.

spec

string

Data Quality monitoring spec.

related_tasks

array<bigint>

Scheduling tasks associated with the monitoring task.

tenant_id

bigint

DataWorks tenant ID.

project_id

bigint

DataWorks project ID.

owner

string

Owner of the quality monitoring task.

datasource_id

string

ID of the data source to which the table belongs.

datasource_type

string

Data source type.

meta_entity_id

string

Unique identifier of the Data Map entity.

table_name

string

Table name.

catalog_name

string

Name of the data catalog to which the table belongs.

schema_name

string

Name of the schema to which the table belongs.

database_name

string

Name of the database to which the table belongs.

cluster_id

string

ID of the cluster to which the table belongs.

related_scheduler_task_count

int

Number of associated scheduling tasks.

rule_count

int

Number of associated rules.

high_severity_rule_count

int

Number of associated strong rules.

normal_severity_rule_count

int

Number of associated weak rules.

enabled_rule_count

int

Number of enabled rules.

enabled_high_severity_rule_count

int

Number of enabled strong rules.

enabled_normal_severity_rule_count

int

Number of enabled weak rules.

rule_instance_count

int

Number of rule instances for today.

high_severity_rule_instance_count

int

Number of strong rule instances for today.

normal_severity_rule_instance_count

int

Number of weak rule instances for today.

high_severity_rule_instance_pass_count

int

Number of passed strong rule instances for today.

high_severity_rule_instance_warn_count

int

Number of strong rule instances with warnings for today.

high_severity_rule_instance_error_count

int

Number of strong rule instances with errors for today.

high_severity_rule_instance_fail_count

int

Number of failed strong rule instances for today.

normal_severity_rule_instance_pass_count

int

Number of passed weak rule instances for today.

normal_severity_rule_instance_warn_count

int

Number of weak rule instances with warnings for today.

normal_severity_rule_instance_error_count

int

Number of weak rule instances with errors for today.

normal_severity_rule_instance_fail_count

int

Number of failed weak rule instances for today.

block_task_instance_count

int

Number of blocked scheduling tasks for today.

alert_rule_count

int

Number of configured alert subscriptions.

sms_alert_rule_count

int

Number of configured text message alert subscriptions.

mail_alert_rule_count

int

Number of configured email alert subscriptions.

phone_alert_rule_count

int

Number of configured phone call alert subscriptions.

ding_alert_rule_count

int

Number of configured DingTalk alert subscriptions.

feishu_alert_rule_count

int

Number of configured Lark alert subscriptions.

weixin_alert_rule_count

int

Number of configured WeChat alert subscriptions.

webhook_alert_rule_count

int

Number of configured custom webhook alert subscriptions.

alert_times

int

Number of alerts triggered today.

sms_alert_times

int

Number of text message alerts triggered today.

mail_alert_times

int

Number of email alerts triggered today.

phone_alert_times

int

Number of phone call alerts triggered today.

ding_alert_times

int

Number of DingTalk alerts triggered today.

feishu_alert_times

int

Number of Lark alerts triggered today.

weixin_alert_times

int

Number of WeChat alerts triggered today.

webhook_alert_times

int

Number of custom webhook alerts triggered today.

dt

string

Date partition in the YYYYMMDD format. The value can be any date from D days ago to yesterday.

Data quality table metric details (table_quality_summary)

Partition field: dt

Description: Details of data quality table metrics.

Field

Type

Description

meta_entity_id

string

Unique identifier of the Data Map table entity.

project_id

bigint

DataWorks project ID.

table_name

string

Table name.

schema_name

string

Name of the schema to which the table belongs.

database_name

string

Name of the database to which the table belongs.

catalog_name

string

Name of the data catalog to which the table belongs.

datasource_id

bigint

ID of the data source to which the table belongs. This is NULL if Data Quality is not configured.

tenant_id

bigint

DataWorks tenant ID.

owner

string

Table owner.

scan_count

int

Number of configured quality monitoring tasks.

scheduler_related_scan_count

int

Number of quality monitoring tasks associated with scheduling.

scan_run_count

int

Number of quality monitoring task instances for today.

alert_scan_run_count

int

Number of quality monitoring task instances that triggered alerts today.

block_task_instance_scan_run_count

int

Number of quality monitoring task instances that blocked scheduling tasks today.

rule_count

int

Number of configured rules.

enabled_rule_count

int

Number of enabled rules.

high_severity_rule_count

int

Number of configured strong rules.

normal_severity_rule_count

int

Number of configured weak rules.

rule_instance_count

int

Number of rule instances for today.

high_severity_rule_instance_count

int

Number of strong rule instances for today.

normal_severity_rule_instance_count

int

Number of weak rule instances for today.

high_severity_rule_instance_pass_count

int

Number of times strong rule checks passed today.

high_severity_rule_instance_warn_count

int

Number of times strong rule checks resulted in warnings today.

high_severity_rule_instance_error_count

int

Number of times strong rule checks resulted in errors today.

high_severity_rule_instance_fail_count

int

Number of times strong rule checks failed today.

normal_severity_rule_instance_pass_count

int

Number of times weak rule checks passed today.

normal_severity_rule_instance_warn_count

int

Number of times weak rule checks resulted in warnings today.

normal_severity_rule_instance_error_count

int

Number of times weak rule checks resulted in errors today.

normal_severity_rule_instance_fail_count

int

Number of times weak rule checks failed today.

dt

string

Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Data catalogs (catalogs)

Field

Type

Description

datasource_type

string

Data source type, such as dlf or starrocks.

datasource_id

string

Data source identifier, such as a StarRocks cluster ID or the ID of the Alibaba Cloud account to which DLF belongs.

name

string

Data catalog name.

type

string

Data catalog type, such as Hive or Jdbc.

comment

string

Data catalog comment.

location

string

Directory path.

properties

string

Properties and parameters (JSON string).

owner

string

Owner of the data catalog. The value can be an Alibaba Cloud account UID or a database system account, depending on the data source type.

create_timestamp

bigint

13-digit creation timestamp.

update_timestamp

bigint

13-digit modification timestamp.

meta_entity_id

string

Unique identifier of the data catalog (API-friendly and compliant with metadata entity ID specifications).

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Databases (databases)

Field

Type

Description

datasource_type

string

Data source type, such as dlf, starrocks, maxcompute, holodb, or mysql.

datasource_id

string

Data source identifier, such as a StarRocks cluster ID, the ID of the Alibaba Cloud account to which DLF or MaxCompute belongs, or an RDS instance ID.

catalog_name

string

Data catalog name. This field has a value if the data source type supports data catalogs.

name

string

Database name.

type

string

Database type.

comment

string

Database comment.

location

string

Database path.

properties

string

Properties and parameters (JSON string).

owner

string

Database owner. The value can be an Alibaba Cloud account UID or a database system account, depending on the data source type.

is_external

boolean

Indicates whether the database is an external database.

create_timestamp

bigint

13-digit creation timestamp.

update_timestamp

bigint

13-digit modification timestamp.

meta_entity_id

string

Unique identifier of the database (API-friendly and compliant with metadata entity ID specifications).

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Schemas (schemas)

Field

Type

Description

datasource_type

string

Data source type, such as holodb, maxcompute, or postgresql.

datasource_id

string

Data source identifier, such as an RDS instance ID or the ID of the Alibaba Cloud account to which MaxCompute belongs.

catalog_name

string

Data catalog name. This field has a value if the data source type supports data catalogs.

database_name

string

Database name.

name

string

Schema name.

type

string

Schema type.

comment

string

Comment.

properties

string

Properties and parameters (JSON string).

owner

string

Schema owner. The value can be an Alibaba Cloud account UID or a database system account, depending on the data source type.

create_timestamp

bigint

13-digit creation timestamp.

update_timestamp

bigint

13-digit modification timestamp.

meta_entity_id

string

Unique identifier of the schema (API-friendly and compliant with metadata entity ID specifications).

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Tables (tables)

Field

Type

Description

datasource_type

string

Data source type, such as dlf, starrocks, maxcompute, holodb, or mysql.

datasource_id

string

Data source identifier, such as a StarRocks cluster ID, the ID of the Alibaba Cloud account to which DLF or MaxCompute belongs, or an RDS instance ID.

catalog_name

string

Data catalog name. This field has a value if the data source type supports data catalogs.

database_name

string

Database name.

schema_name

string

Schema name. This field has a value if the data source type supports schemas.

name

string

Table name.

type

string

Table type.

comment

string

Comment.

partition_keys

string

Partition key. For multi-level partitions, fields are separated by commas (,).

location

string

Table storage path.

properties

string

Properties and parameters (JSON string). For views, this is the view definition DDL.

owner

string

Table owner. The value can be an Alibaba Cloud account UID or a database system account, depending on the data source type.

content_size

bigint

Storage size in bytes.

data_retention

map<string,string>

Data retention period or lifecycle. The value varies based on the table type. For MaxCompute tables, the key is `lifecycle` and the value is the table's lifecycle, such as 365. For DLF tables, the key is `retention` and the value is the table's lifecycle, such as 91. This is not yet supported for other types. This document will be updated if support is added.

is_compressed

boolean

Indicates whether the data is compressed.

is_temporary

boolean

Indicates whether the table is a temporary table.

entity_type

string

Entity type, such as table, view, or materialized_view.

input_format

string

Input format.

output_format

string

Output format.

serde_parameters

string

SerDe parameters.

serialization_lib

string

Serialization library.

create_timestamp

bigint

13-digit table creation timestamp.

meta_modified_timestamp

bigint

13-digit timestamp indicating when the table metadata was modified.

data_modified_timestamp

bigint

13-digit timestamp indicating when the table data was modified.

last_access_timestamp

bigint

13-digit timestamp indicating when the table was last accessed.

business_description

string

Business description or Chinese name.

meta_entity_id

string

Unique identifier of the table (API-friendly and compliant with metadata entity ID specifications).

Examples:

  • maxcompute-table: Alibaba Cloud account ID::project_name:schema_name:table_name.

  • holo-table: Hologres instance ID::sample_database:public_schema:table_name.

  • starrocks-table: Cluster instance ID:default_catalog:sample_database::sample_table.

uuid

string

Table UUID, used to link to the DataWorks Data Map table details page.

business_tags

array<string>

Business tags. Tags set on the Data Map page are recorded in this field.

wikis

array<struct<`version`:bigint,`operator`:string,`update_timestamp`:bigint,`content`:string>>

Instructions for using the table (version: version number; operator: committer; update_timestamp: 13-digit update timestamp; content: content).

producing_tasks

array<bigint>

List of scheduling task IDs that produce data for the table. For more information, see the tasks table.

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Fields (columns)

Field

Type

Description

datasource_type

string

Data source type, such as dlf or starrocks.

datasource_id

string

Data source identifier, such as a StarRocks cluster ID, the ID of the Alibaba Cloud account to which DLF or MaxCompute belongs, or an RDS instance ID.

catalog_name

string

Data catalog name. This field has a value if the data source type supports data catalogs.

database_name

string

Database name.

schema_name

string

Schema name. This field has a value if the data source type supports schemas.

table_name

string

Table name.

name

string

Field name.

type

string

Field type.

comment

string

Comment.

ordinal_position

bigint

Ordinal position of the field, starting from 1.

is_primary_key

boolean

Indicates whether the field is a primary key.

is_nullable

boolean

Indicates whether the field can be NULL.

is_partition_key

boolean

Indicates whether the field is a partition key.

properties

string

Properties and parameters (JSON string).

business_description

string

Business description.

meta_entity_id

string

Unique identifier of the field (API-friendly and compliant with metadata entity ID specifications).

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Partitions (partitions)

Field

Type

Description

datasource_type

string

Data source type, such as maxcompute, dlf, or starrocks.

datasource_id

string

Data source identifier, such as a StarRocks cluster ID, the ID of the Alibaba Cloud account to which DLF or MaxCompute belongs, or an RDS instance ID.

catalog_name

string

Data catalog name. This field has a value if the data source type supports data catalogs.

database_name

string

Database name.

schema_name

string

Schema name. This field has a value if the data source type supports schemas.

table_name

string

Table name.

name

string

Partition name (partition specification).

create_timestamp

bigint

13-digit creation timestamp.

update_timestamp

bigint

13-digit modification timestamp.

content_size

bigint

Partition size in bytes.

properties

string

Properties and parameters (JSON string).

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Table-level and column-level lineage (lineages)

Field

Type

Description

source_meta_entity_id

string

Unique identifier of the source (API-friendly and compliant with metadata entity ID specifications).

source_raw_entity_type

string

Source entity type. If the identified metadata is not managed, source_meta_entity_type is empty and source_raw_entity_type is used as the identifier.

source_uuid

string

Unique identifier of the source (page-access-friendly).

target_meta_entity_id

string

Unique identifier of the target (API-friendly and compliant with metadata entity ID specifications).

target_raw_entity_type

string

Target entity type. If the identified metadata is not managed, target_meta_entity_type is empty and target_raw_entity_type is used as the identifier.

target_uuid

string

Unique identifier of the target (page-access-friendly).

compute_engine

string

Compute engine, such as maxcompute, datax, or hologres.

transform_type

string

Transform task type in the engine, such as SQL, DATAX, DATAX_STREAM, EXTERNAL_TABLE_MAPPING, STORAGE_MAPPING, or API_MAPPING.

task_id

bigint

DataWorks scheduling task ID. For more information, see the tasks table. This field is empty for lineage data not triggered by DataWorks scheduling.

task_instance_id

bigint

DataWorks scheduling task instance ID. For more information, see the tasks_instances table. This field is empty for lineage data not triggered by DataWorks scheduling.

lineage_time

bigint

Time when the lineage was generated, in milliseconds.

granularity

string

Lineage level, such as TABLE or COLUMN.

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Task and workflow definitions (tasks)

Field

Type

Description

id

bigint

Task ID.

name

string

Task name.

description

string

Task description.

type

bigint

Task type. For more information, see the node type codes in Node development.

workflow_id

bigint

Workflow ID.

instance_mode

string

Instance generation mode.

  • T+1 (generated the next day)

  • Immediately (generated immediately)

baseline_id

bigint

Baseline ID.

priority

bigint

Task priority. The value ranges from 1 to 8. A larger value indicates a higher priority. The default priority is 1.

timeout

bigint

Task execution timeout period in hours.

rerun_mode

bigint

Rerun configuration. 0: Rerun is allowed only on failure. 1: Rerun is allowed on both failure and success. 2: Rerun is not allowed on failure or success.

rerun_times

bigint

Number of retries. This takes effect when the task is configured to be rerunnable.

rerun_interval

bigint

Retry interval in seconds.

script_parameters

string

List of runtime script parameters.

trigger_type

string

Trigger method type (Scheduler: triggered by a scheduling cycle; Manual: triggered manually).

trigger_recurrence

bigint

Running mode when triggered. 0: Normal. 1: Manual task. 2: Paused. 3: Dry-run. 4: Referenced task.

trigger_cron

string

cron expression. This takes effect when type is set to Scheduler.

trigger_start_time

string

Effective time for periodic triggering. This takes effect when type is set to Scheduler.

trigger_end_time

string

Expiration time for periodic triggering. This takes effect when type is set to Scheduler.

runtime_resource_group_id

bigint

ID of the resource group for running the task.

runtime_image

string

ID of the runtime image configured for the task.

runtime_cu

string

CU consumption configured for the task.

datasource_name

string

Data source name.

inputs_variables

array<struct<`name`:string,`type`:string,`value`:string>>

List of input variables.

outputs

array<struct<`output`:string,`type`:string>>

List of task output identifiers.

outputs_variables

array<struct<`name`:string,`type`:string,`value`:string>>

List of output variables.

dependencies

array<struct<`type`:string,`upstream_output`:string,`upstream_node_id`:bigint>>

List of dependency information.

related_workflow_id

bigint

Associated workflow ID.

tags

array<struct<`key`:string,`value`:string>>

List of task tags.

project_id

bigint

Project ID. For more information, see the workspace_id field in the workspaces table.

project_env

string

Environment type (PROD: production; DEV: development).

owner

string

Account ID of the task owner. For more information, see the users table.

create_time

string

Creation time.

modify_time

string

Modification time.

create_user

string

Account ID of the creator. For more information, see the users table.

modify_user

string

Account ID of the modifier. For more information, see the users table.

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Task and workflow instances (task_instances)

Field

Type

Description

id

bigint

Task instance ID.

node_id

bigint

Task ID. For more information, see the tasks table.

node_type

bigint

Task type. For more information, see the node type codes in Node development.

node_name

string

Task name.

description

string

Task description.

workflow_id

bigint

Workflow ID. For more information, see the tasks table.

workflow_name

string

Workflow name.

workflow_instance_id

bigint

Workflow instance ID.

workflow_instance_type

bigint

Workflow instance type. 0: Daily scheduling. 1: Manual task. 2: Smoke testing. 3: Data backfill. 4: One-time workflow. 5: Manual workflow.

trigger_type

string

Trigger method type (Scheduler/Manual).

trigger_recurrence

string

Running mode. 0: Normal. 1: Manual. 2: Paused. 3: Dry-run. 4: Referenced.

timeout

bigint

Task execution timeout period in hours.

rerun_mode

string

Rerun configuration. 0: Rerun on failure only. 1: Rerun on success or failure. 2: Rerun is not allowed.

run_number

bigint

Number of runs.

period_number

bigint

Epoch ordinal number.

baseline_id

bigint

Baseline ID.

priority

bigint

Task priority (1-8).

script_parameters

string

List of runtime script parameters.

runtime_resource_group_id

bigint

ID of the resource group for running the task.

runtime_resource_group_identifier

string

Identifier name of the resource group for running the task.

runtime_image

string

Runtime image ID.

runtime_cu

string

Runtime CU consumption.

runtime_process_id

string

Runtime process ID.

runtime_gateway

string

Runtime gateway.

datasource_name

string

Data source name.

inputs_variables

array<struct<`name`:string,`type`:string,`value`:string>>

List of input variables.

outputs

array<struct<`output`:string,`type`:string>>

List of output identifiers.

outputs_variables

array<struct<`name`:string,`type`:string,`value`:string>>

List of output variables.

tags

array<struct<`key`:string,`value`:string>>

List of task tags.

status

bigint

Task status. 1: Not run. 2: Waiting for time. 3: Waiting for resource. 4: Running. 5: Failed. 6: Succeeded. 7: Verifying. 8: Condition check. 9: Waiting for trigger.

trigger_time

string

Trigger time.

bizdate

string

Data timestamp.

started_time

string

Start time.

finished_time

string

End time.

project_id

bigint

Project ID. For more information, see the workspace_id field in the workspaces table.

project_env

string

Environment type (PROD/DEV).

owner

string

Account ID of the owner. For more information, see the users table.

create_time

string

Creation time.

modify_time

string

Modification time.

create_user

string

Account ID of the creator. For more information, see the users table.

modify_user

string

Account ID of the modifier. For more information, see the users table.

waiting_resource_time

string

Time waiting for resources.

waiting_trigger_time

string

Time waiting for trigger.

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Users (users)

Field

Type

Description

user_id

string

User identifier.

user_nick

string

Account alias (display name).

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Workspaces (workspaces)

Field

Type

Description

workspace_id

bigint

Workspace ID.

workspace_name

string

Workspace name.

workspace_identifier

string

Workspace identifier.

workspace_description

string

Workspace description.

workspace_owner

string

Identifier of the workspace owner. For more information, see the users table.

workspace_status

bigint

Workspace status. 0: Normal. 1: Deleted. 2: Initialization. 3: Initialization failed. 4: Manually disabled. 5: Deleting. 6: Deletion failed. 7: Frozen due to overdue payment.

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Workspace members (workspace_members)

Field

Type

Description

workspace_id

bigint

Workspace ID. For more information, see the workspaces table.

user_id

string

User identifier. For more information, see the users table.

user_status

bigint

User status. 0: Normal. 1: Disabled. 2: Deleted.

gmt_create_ts

bigint

Creation time (13-digit timestamp).

gmt_modified_ts

bigint

Modification time (13-digit timestamp).

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Resource groups (resource_groups)

Field

Type

Description

resource_group_id

bigint

Resource group ID.

resource_group_identifier

string

Resource group identifier.

resource_group_type

bigint

Resource group type. 1: Schedule resource group. 2: MaxCompute resource group. 4: Data integration resource group.

resource_group_mode

bigint

Resource group mode. 1: Subscription. 2: Pay-as-you-go. 3: Developer edition (MaxCompute only).

resource_group_status

bigint

Resource group status. 0: Normal. 1: Frozen. 2: Deleted. 3: Creating. 4: Creation failed. 5: Updating. 6: Update failed. 7: Deleting. 8: Deletion failed.

is_exclusive_resource_group

boolean

Indicates whether the resource group is exclusive.

dt

string

Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Example metadata

Table metric details (table_metrics_detail)

Field

Type

Description

datasource_type

string

Data source type.

datasource_id

string

Data source identifier.

catalog_name

string

Data catalog name.

database_name

string

Database name.

schema_name

string

Schema name.

table_name

string

Table name.

table_uuid

string

Table identifier, used to link to the details page.

meta_entity_id

string

Table identifier, highly readable.

content_size

bigint

Collected storage volume. The value is NULL if storage volume collection is not supported.

daily_rate_cs

decimal(16,6)

Day-to-day change rate of storage volume.

avg_content_size_7d

bigint

7-day average of storage volume.

daily_rate_acs_7d

decimal(16,6)

Day-to-day change rate of the 7-day average storage volume.

latest_data_update_time_31d

bigint

End time of the corresponding instance as a downstream lineage within a 31-day data range. Maximum value of data_modified_timestamp. The value is NULL if there are no updates within the 31-day range.

latest_data_update_task_id

bigint

ID of the scheduling task that most recently updated the table within 31 days.

latest_data_update_instance_id

bigint

ID of the scheduling task instance that most recently updated the table within 31 days.

latest_data_update_time_by_task

bigint

End time of the scheduling task instance that most recently updated the table within 31 days.

writing_task_ids

array<bigint>

IDs of scheduling tasks that write to the table for the current data timestamp (no duplicate IDs).

writing_task_ids_31d

array<bigint>

IDs of scheduling tasks that write to the table within a 31-day data range (no duplicate IDs).

latest_data_access_time_31d

bigint

End time of the corresponding instance as an upstream lineage within a 31-day data range. Maximum value of last_access_timestamp. The value is NULL if there are no accesses within the 31-day range.

latest_data_access_task_id

bigint

ID of the scheduling task that most recently read the table within 31 days.

latest_data_access_instance_id

bigint

ID of the scheduling task instance that most recently read the table within 31 days.

latest_data_access_time_by_task

bigint

End time of the corresponding instance as an upstream lineage within a 31-day data range.

reading_task_ids

array<string>

IDs of scheduling tasks that read the table.

reading_task_ids_31d

array<string>

IDs of scheduling tasks that read the table within a 31-day data range (no duplicate IDs).

direct_downstream_tables

array<string>

IDs of child tables (uuid).

direct_upstream_tables

array<string>

IDs of parent tables (uuid).

dt

string

Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Table metric summary (table_metrics_summary)

Field

Type

Description

table_count

bigint

Number of tables.

daily_rate_tc

decimal(16,6)

Day-to-day change rate of the number of tables.

avg_table_count_7d

bigint

7-day average number of tables.

daily_rate_atc_7d

decimal(16,6)

Day-to-day change rate of the 7-day average number of tables.

content_size

bigint

Collected storage volume. The value is NULL if storage volume collection is not supported.

daily_rate_cs

decimal(16,6)

Day-to-day change rate of storage volume.

avg_content_size_7d

bigint

7-day average of storage volume.

daily_rate_acs_7d

decimal(16,6)

Day-to-day change rate of the 7-day average storage volume.

updated_table_count

bigint

Number of tables updated within 31 days.

daily_rate_utc

decimal(16,6)

Day-to-day change rate of the number of tables updated within 31 days.

avg_updated_table_count_7d

bigint

7-day average number of tables updated within 31 days.

daily_rate_autc_7d

decimal(16,6)

Day-to-day change rate of the 7-day average number of tables updated within 31 days.

accessed_table_count

bigint

Number of tables read within 31 days.

daily_rate_atc

decimal(16,6)

Day-to-day change rate of the number of tables read within 31 days.

avg_accessed_table_count_7d

bigint

7-day average number of tables read within 31 days.

daily_rate_aatc_7d

decimal(16,6)

Day-to-day change rate of the 7-day average number of tables read within 31 days.

dt

string

Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Task metric details (task_metrics_detail)

Field

Type

Description

task_id

bigint

Task identifier.

workflow_id

bigint

Workflow identifier.

node_type

bigint

Task type.

project_id

bigint

Workspace identifier.

week_number

bigint

The week of the year for the data timestamp.

task_owner

string

Owner ID.

compute_resource_type

string

Computing resource type.

compute_resource_id

string

Computing resource identifier: MaxCompute project name, EMR cluster ID, Hologres instance ID, etc.

datasource_name

string

Data source name.

inst_success_count

bigint

Number of successful instances.

inst_failed_count

bigint

Number of failed instances.

inst_running_count

bigint

Number of running instances.

inst_abnormal_count

bigint

Number of abnormal instances.

inst_not_started_count

bigint

Number of instances not started.

inst_runtime_cu

double

Instance runtime CU consumption.

task_avg_cu_31d

double

Average daily CU consumption of the task (within 31 days).

dt

string

Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Task metric summary (task_metrics_summary)

Field

Type

Description

node_type

bigint

Node type.

inst_status

string

Instance status.

inst_count

bigint

Number of instances.

avg_inst_count_7d

double

7-day average number of instances.

granularity

string

Statistic granularity: DAILY, WEEKLY.

dt

string

Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.