Details of open data table structures - DataWorks - Alibaba Cloud Documentation Center

DataWorks open data provides tables and views across various dimensions to help you collect metadata. This topic describes the tables and views available in DataWorks open data and their detailed structures.

Metadata

This set of metadata tables and example metric statistics tables is generated by DataWorks based on the metadata of the current tenant's tables, tasks, instances, workspaces, members, and projects. The actual table structures are subject to dynamic adjustments based on business development. The content displayed on the system interface prevails.

Data asset metadata

Asset table issue details (asset_table_issues)

Partition field: dt

Description: Details of data governance issues for tables.

Field	Type	Description
tenant_id	string	DataWorks tenant.
meta_entity_id	string	ID of the corresponding metadata entity.
uuid	string	Unique key of the table.
meta_entity_type	string	Type of the corresponding metadata, such as maxcompute-table.
entity_type	string	Entity type, such as table, view, or materialized_view.
account_id	string	Alibaba Cloud account to which the asset belongs.
datasource_type	string	Data source type, such as EMR or MaxCompute.
datasource_id	string	Compute engine name (MaxCompute: projectName, EMR: clusterId, Hologres: databaseName).
catalog_name	string	The DLF data catalog when the metadata is from DLF.
database_name	string	Database name (EMR dbName).
schema_name	string	Schema name.
rule_id	string	Governance item identifier.
rule_name_zh	string	Chinese name of the governance item.
rule_name_en	string	English name of the governance item.
category	string	Dimension to which the item belongs.
deduct_score_tenant	string	Global score deduction, accurate to four decimal places.
deduct_score_owner	string	Individual score deduction, accurate to four decimal places.
cost	string	Wasted resources.
project_id	string	DataWorks project.
dt	string	Date partition (logical partition field) in the YYYYMMDD format.

Asset table metric details (asset_table_profiles)

Partition field: dt

Description: Details of table metrics.

Field	Type	Description
tenant_id	bigint	Source tenant ID.
meta_entity_id	string	ID of the corresponding metadata entity.
meta_entity_type	string	Type of the corresponding metadata, such as maxcompute-table.
entity_type	string	Entity type, such as table, view, or materialized_view.
account_id	string	Alibaba Cloud account to which the asset belongs.
datasource_type	string	Data source type, such as EMR or MaxCompute.
datasource_id	string	Compute engine name (MaxCompute: projectName, EMR: clusterId, Hologres: databaseName).
catalog_name	string	The DLF data catalog when the metadata is from DLF.
database_name	string	Database name (EMR dbName).
schema_name	string	Schema name.
uuid	string	Unique key of the table.
name	string	Table name.
owner	string	Asset owner.
last_access_timestamp	bigint	Time when the table was last accessed.
meta_modified_timestamp	bigint	13-digit timestamp indicating when the table metadata was modified.
data_modified_timestamp	bigint	13-digit timestamp indicating when the table data was modified.
create_timestamp	bigint	Time when the table was created.
comment	string	Table comment.
partition_keys	string	Partition key.
tags	string	Asset tags.
governance_rule_finding_count	bigint	Number of governance item issues.
governance_rule_finding_history_count	string	Historical number of asset governance items.
governance_health_score	string	Asset governance score.
governance_health_level	string	Asset governance score level.
is_partitioned	bigint	Indicates whether the table is a partitioned table.
content_size	bigint	Logical size.
record_num	bigint	Number of records.
life_cycle	string	Lifecycle.
partition_count	bigint	Number of partitions.
view_count_monthly	bigint	Number of views in the last month.
access_count	bigint	Number of accesses.
upstream_table_count	bigint	Number of upstream tables.
upstream_table_detail	string	Details of upstream tables.
downstream_table_count	bigint	Number of downstream tables.
downstream_table_detail	string	Details of downstream tables.
producing_project_ids	string	List of workspaces involved in table output.
producing_tasks_count	bigint	Number of nodes involved in table output.
producing_tasks_detail	string	Details of nodes involved in table output.
using_tasks_count	bigint	Number of nodes that use the table.
using_tasks_detail	string	Details of nodes that use the table.
quality_rule_count	bigint	Number of quality rules.
quality_monitor_count	bigint	Number of quality monitoring metrics.
quality_rule_7_days_failed_count	bigint	Number of failed quality rules.
quality_monitor_7_days_failed_count	bigint	Number of failed quality monitoring metrics.
dt	string	Date partition (logical partition field) in the YYYYMMDD format.

Asset task issue details (asset_task_issues)

Partition field: dt

Description: Details of data governance issues for tasks.

Field	Type	Description
tenant_id	string	DataWorks tenant ID.
node_id	string	Scheduling node ID.
node_name	string	Node name.
node_type	string	Task type: SQL, SQLCost, LOT, or CUPID.
node_owner	string	Base ID of the owner.
priority	string	Priority.
rule_id	string	Governance item identifier.
rule_name_zh	string	Chinese name of the governance item.
rule_name_en	string	English name of the governance item.
category	string	The governance domain to which it belongs.
deduct_score_tenant	string	Global score deduction, accurate to four decimal places.
deduct_score_owner	string	Individual score deduction, accurate to four decimal places.
cost	string	Governance benefits.
project_id	string	DataWorks project ID.
dt	string	Date partition (logical partition field) in the YYYYMMDD format.

Asset task metric details (asset_task_profiles)

Partition field: dt

Description: Details of task metrics.

Field	Type	Description
tenant_id	bigint	Source tenant ID.
data_asset_id	string	Asset ID within the module, corresponding to task.id.
name	string	Asset name, corresponding to task.name.
project_id	bigint	Workspace where the asset is located.
project_env	string	Environment. PROD: production. DEV: development.
owner	string	Asset owner.
create_user	string	Creator.
create_time	bigint	Creation time.
modify_user	string	Modifier.
modify_time	bigint	Modification time.
trigger_type	string	Trigger method type. Scheduler: triggered by a scheduling cycle. Manual: triggered manually.
trigger_recurrence_type	string	Normal: runs normally. Manual: manual task. Pause: paused. Skip: dry-run.
trigger_cron	string	cron expression.
type	bigint	Execution code type. For more information, see the node type codes in Node development.
script_parameters	string	Parameter information.
priority	bigint	Task priority. The value ranges from 1 to 8. A larger value indicates a higher priority. The default priority is 1.
trigger_start_time	bigint	Start date for scheduling.
trigger_end_time	bigint	End date for scheduling.
runtime_resource_group_id	bigint	ID of the resource group to which the node belongs.
runtime_cu	string	Computing CUs.
baseline_id	bigint	ID of the baseline to which the node belongs.
rerun_times	bigint	Number of times the task can be rerun.
rerun_interval	bigint	Rerun interval in milliseconds.
rerun_mode_type	string	AllAllowed: Rerun is allowed on both failure and success. FailureAllowed: Rerun is allowed only on failure. AllDenied: Rerun is not allowed on failure or success.
tags	string	Asset tags.
tags_count	bigint	Number of asset tags.
input_table_count	bigint	Number of input tables.
output_table_count	bigint	Number of output tables.
input_table_detail	string	Details of input tables.
output_table_detail	string	Details of output tables.
upstream_node_count	bigint	Number of upstream nodes.
downstream_node_count	bigint	Number of downstream nodes.
governance_rule_finding_count	bigint	Number of governance item issues.
governance_rule_finding_history_count	string	Historical number of asset governance items.
governance_health_score	string	Asset score.
governance_health_level	string	Asset score level.
engine_datasource_id	string	Compute engine ID.
engine_instance_count	bigint	Number of compute engine jobs.
engine_instance_run_time	bigint	Runtime of compute engine jobs.
engine_instance_comput_volume_cost	string	Computation volume.
engine_instance_cu_cost	string	Computing CUs.
engine_instance_cpu_cost	string	CPU consumption.
engine_instance_mem_cost	string	Memory consumption.
engine_instance_exist_data_skew	bigint	Data skew.
engine_instance_suggestions	string	Suggestions for data skew.
engine_instance_data_skew_ids	string	Job IDs with data skew.
engine_instance_ids	string	Job IDs.
task_instance_wait_time_cost_sum	bigint	Total wait time.
task_instance_wait_time_cost_max	bigint	Maximum instance wait time.
task_instance_run_time_cost_sum	bigint	Total runtime.
task_instance_run_time_cost_max	bigint	Maximum runtime.
task_instance_7_days_wait_time_cost_max	bigint	Maximum instance wait time in the last 7 days.
task_instance_7_days_run_time_cost_max	bigint	Maximum instance runtime in the last 7 days.
task_instance_count	bigint	Number of instances.
task_instance_7_days_failed_count	bigint	Number of failed instances.
task_instance_7_days_failed_day_count	bigint	Number of days with failures.
task_instance_7_days_frezeed_day_count	bigint	Number of days frozen.
task_instance_7_days_dry_run_day_count	bigint	Number of days with dry-runs.
quality_monitor_count	bigint	Number of Data Quality monitoring metrics.
quality_monitor_7_days_failed_count	bigint	Number of failed Data Quality monitoring metrics.
di_task_resource_group_id	string	ID of the data integration resource group to which the node belongs.
di_task_is_public_network	bigint	Indicates whether the data integration task uses Internet traffic.
di_task_concurrency	bigint	Concurrency.
di_task_total_records	bigint	Number of synchronized records.
di_task_total_bytes	bigint	Volume of synchronized data.
di_task_source_type	string	Source type.
di_task_target_type	string	Target type.
di_task_run_time_cost	bigint	Execution duration of the data integration task.
di_task_wait_time_cost	bigint	Wait time of the data integration task.
dt	string	Date partition (logical partition field) in the YYYYMMDD format.

Data Quality

Data quality rule instances (quality_rule_results)

Partition field: dt

Description: Data quality rule instances.

Field	Type	Description
id	bigint	Primary key ID.
scan_run_id	bigint	Quality monitoring instance ID.
rule_id	bigint	Rule ID.
rule_name	string	Rule name.
status	string	Rule check result: Pass, Error, Warn, Fail, or Running.
severity	string	Rule severity: High or Normal.
create_time	bigint	Creation time.
modify_time	bigint	Time of the last modification.
spec	string	Rule instance spec.
tags	array<string>	Rule instance tags.
tenant_id	bigint	DataWorks tenant ID.
project_id	bigint	DataWorks project ID.
meta_entity_id	string	Unique identifier of the Data Map table entity.
dt	string	Date partition in the YYYYMMDD format. The value can be any date from D days ago to yesterday.

Data quality rule metric details (quality_rules)

Partition field: dt

Description: Details of data quality rule metrics.

Field	Type	Description
id	bigint	Primary key ID.
scan_id	bigint	Quality monitoring ID.
rule_name	string	Rule name.
enabled	boolean	Indicates whether the rule is enabled.
severity	string	Business severity level of the rule. Enumeration values: High, Normal.
create_time	bigint	Creation time.
modify_time	bigint	Time of the last modification.
spec	string	Rule spec.
tags	array<string>	Rule tags.
tenant_id	bigint	DataWorks tenant ID.
project_id	bigint	DataWorks project ID.
meta_entity_id	string	Unique identifier of the Data Map entity.
pass_count	int	Number of times the rule check passed.
warn_count	int	Number of times the rule check triggered the warning threshold.
error_count	int	Number of times the rule check triggered the error threshold.
fail_count	int	Number of times the rule check failed.
dt	string	Date partition in the YYYYMMDD format. The value can be any date from D days ago to yesterday.

Data quality monitoring task instances (quality_scan_runs)

Partition field: dt

Description: Data quality monitoring task instances.

Field	Type	Description
id	bigint	Primary key ID.
scan_id	bigint	Quality monitoring ID.
name	string	Monitoring name.
status	string	Monitoring instance status: Pass, Warn, Error, Fail, or Running.
post_action_type	string	Action taken after the monitoring check. Enumeration values: Alert or BlockTaskInstance.
data_filter	string	The data scope actually used for sampling.
trigger_time	bigint	The scheduled time used by the task.
trigger_type	string	Trigger method for Data Quality monitoring: ByManual, BySchedule, or ByQualityNode.
create_time	bigint	Creation time.
modify_time	bigint	Time of the last update.
datasource_id	bigint	ID of the data source to which the table belongs.
datasource_type	string	Data source type.
computing_resource_id	bigint	Compute engine ID.
compute_resource_option	string	Computing resource used for running Data Quality monitoring.
spec	string	Quality monitoring spec.
tenant_id	bigint	DataWorks tenant ID.
project_id	bigint	DataWorks project ID.
owner	string	Owner of the quality monitoring task.
task_id	bigint	Scheduling task ID.
task_instance_id	bigint	Scheduling task instance ID.
meta_entity_id	string	Unique identifier of the Data Map entity.
table_name	string	Table name.
catalog_name	string	Name of the data catalog to which the table belongs.
schema_name	string	Name of the schema to which the table belongs.
database_name	string	Name of the database to which the table belongs.
cluster_id	string	ID of the cluster to which the table belongs.
dt	string	Date partition in the YYYYMMDD format. The value can be any date from D days ago to yesterday.

Data quality monitoring task metric details (quality_scans)

Partition field: dt

Description: Details of data quality monitoring task metrics.

Field	Type	Description
id	bigint	Primary key ID.
name	string	Monitoring name.
data_filter_type	string	Data scope type: ByPartition or ByWhere.
data_filter	string	Data scope expression.
trigger_type	string	Trigger method for Data Quality monitoring: ByManual, BySchedule, or ByQualityNode.
create_time	bigint	Creation time.
modify_time	bigint	Time of the last update.
computing_resource_id	bigint	Compute engine ID.
compute_resource_option	string	Computing resource used for running Data Quality monitoring.
spec	string	Data Quality monitoring spec.
related_tasks	array<bigint>	Scheduling tasks associated with the monitoring task.
tenant_id	bigint	DataWorks tenant ID.
project_id	bigint	DataWorks project ID.
owner	string	Owner of the quality monitoring task.
datasource_id	string	ID of the data source to which the table belongs.
datasource_type	string	Data source type.
meta_entity_id	string	Unique identifier of the Data Map entity.
table_name	string	Table name.
catalog_name	string	Name of the data catalog to which the table belongs.
schema_name	string	Name of the schema to which the table belongs.
database_name	string	Name of the database to which the table belongs.
cluster_id	string	ID of the cluster to which the table belongs.
related_scheduler_task_count	int	Number of associated scheduling tasks.
rule_count	int	Number of associated rules.
high_severity_rule_count	int	Number of associated strong rules.
normal_severity_rule_count	int	Number of associated weak rules.
enabled_rule_count	int	Number of enabled rules.
enabled_high_severity_rule_count	int	Number of enabled strong rules.
enabled_normal_severity_rule_count	int	Number of enabled weak rules.
rule_instance_count	int	Number of rule instances for today.
high_severity_rule_instance_count	int	Number of strong rule instances for today.
normal_severity_rule_instance_count	int	Number of weak rule instances for today.
high_severity_rule_instance_pass_count	int	Number of passed strong rule instances for today.
high_severity_rule_instance_warn_count	int	Number of strong rule instances with warnings for today.
high_severity_rule_instance_error_count	int	Number of strong rule instances with errors for today.
high_severity_rule_instance_fail_count	int	Number of failed strong rule instances for today.
normal_severity_rule_instance_pass_count	int	Number of passed weak rule instances for today.
normal_severity_rule_instance_warn_count	int	Number of weak rule instances with warnings for today.
normal_severity_rule_instance_error_count	int	Number of weak rule instances with errors for today.
normal_severity_rule_instance_fail_count	int	Number of failed weak rule instances for today.
block_task_instance_count	int	Number of blocked scheduling tasks for today.
alert_rule_count	int	Number of configured alert subscriptions.
sms_alert_rule_count	int	Number of configured text message alert subscriptions.
mail_alert_rule_count	int	Number of configured email alert subscriptions.
phone_alert_rule_count	int	Number of configured phone call alert subscriptions.
ding_alert_rule_count	int	Number of configured DingTalk alert subscriptions.
feishu_alert_rule_count	int	Number of configured Lark alert subscriptions.
weixin_alert_rule_count	int	Number of configured WeChat alert subscriptions.
webhook_alert_rule_count	int	Number of configured custom webhook alert subscriptions.
alert_times	int	Number of alerts triggered today.
sms_alert_times	int	Number of text message alerts triggered today.
mail_alert_times	int	Number of email alerts triggered today.
phone_alert_times	int	Number of phone call alerts triggered today.
ding_alert_times	int	Number of DingTalk alerts triggered today.
feishu_alert_times	int	Number of Lark alerts triggered today.
weixin_alert_times	int	Number of WeChat alerts triggered today.
webhook_alert_times	int	Number of custom webhook alerts triggered today.
dt	string	Date partition in the YYYYMMDD format. The value can be any date from D days ago to yesterday.

Data quality table metric details (table_quality_summary)

Partition field: dt

Description: Details of data quality table metrics.

Field	Type	Description
meta_entity_id	string	Unique identifier of the Data Map table entity.
project_id	bigint	DataWorks project ID.
table_name	string	Table name.
schema_name	string	Name of the schema to which the table belongs.
database_name	string	Name of the database to which the table belongs.
catalog_name	string	Name of the data catalog to which the table belongs.
datasource_id	bigint	ID of the data source to which the table belongs. This is NULL if Data Quality is not configured.
tenant_id	bigint	DataWorks tenant ID.
owner	string	Table owner.
scan_count	int	Number of configured quality monitoring tasks.
scheduler_related_scan_count	int	Number of quality monitoring tasks associated with scheduling.
scan_run_count	int	Number of quality monitoring task instances for today.
alert_scan_run_count	int	Number of quality monitoring task instances that triggered alerts today.
block_task_instance_scan_run_count	int	Number of quality monitoring task instances that blocked scheduling tasks today.
rule_count	int	Number of configured rules.
enabled_rule_count	int	Number of enabled rules.
high_severity_rule_count	int	Number of configured strong rules.
normal_severity_rule_count	int	Number of configured weak rules.
rule_instance_count	int	Number of rule instances for today.
high_severity_rule_instance_count	int	Number of strong rule instances for today.
normal_severity_rule_instance_count	int	Number of weak rule instances for today.
high_severity_rule_instance_pass_count	int	Number of times strong rule checks passed today.
high_severity_rule_instance_warn_count	int	Number of times strong rule checks resulted in warnings today.
high_severity_rule_instance_error_count	int	Number of times strong rule checks resulted in errors today.
high_severity_rule_instance_fail_count	int	Number of times strong rule checks failed today.
normal_severity_rule_instance_pass_count	int	Number of times weak rule checks passed today.
normal_severity_rule_instance_warn_count	int	Number of times weak rule checks resulted in warnings today.
normal_severity_rule_instance_error_count	int	Number of times weak rule checks resulted in errors today.
normal_severity_rule_instance_fail_count	int	Number of times weak rule checks failed today.
dt	string	Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Data catalogs (catalogs)

Field	Type	Description
datasource_type	string	Data source type, such as dlf or starrocks.
datasource_id	string	Data source identifier, such as a StarRocks cluster ID or the ID of the Alibaba Cloud account to which DLF belongs.
name	string	Data catalog name.
type	string	Data catalog type, such as Hive or Jdbc.
comment	string	Data catalog comment.
location	string	Directory path.
properties	string	Properties and parameters (JSON string).
owner	string	Owner of the data catalog. The value can be an Alibaba Cloud account UID or a database system account, depending on the data source type.
create_timestamp	bigint	13-digit creation timestamp.
update_timestamp	bigint	13-digit modification timestamp.
meta_entity_id	string	Unique identifier of the data catalog (API-friendly and compliant with metadata entity ID specifications).
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Databases (databases)

Field	Type	Description
datasource_type	string	Data source type, such as `dlf`, `starrocks`, `maxcompute`, `holodb`, or `mysql`.
datasource_id	string	Data source identifier, such as a StarRocks cluster ID, the ID of the Alibaba Cloud account to which DLF or MaxCompute belongs, or an RDS instance ID.
catalog_name	string	Data catalog name. This field has a value if the data source type supports data catalogs.
name	string	Database name.
type	string	Database type.
comment	string	Database comment.
location	string	Database path.
properties	string	Properties and parameters (JSON string).
owner	string	Database owner. The value can be an Alibaba Cloud account UID or a database system account, depending on the data source type.
is_external	boolean	Indicates whether the database is an external database.
create_timestamp	bigint	13-digit creation timestamp.
update_timestamp	bigint	13-digit modification timestamp.
meta_entity_id	string	Unique identifier of the database (API-friendly and compliant with metadata entity ID specifications).
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Schemas (schemas)

Field	Type	Description
datasource_type	string	Data source type, such as holodb, maxcompute, or postgresql.
datasource_id	string	Data source identifier, such as an RDS instance ID or the ID of the Alibaba Cloud account to which MaxCompute belongs.
catalog_name	string	Data catalog name. This field has a value if the data source type supports data catalogs.
database_name	string	Database name.
name	string	Schema name.
type	string	Schema type.
comment	string	Comment.
properties	string	Properties and parameters (JSON string).
owner	string	Schema owner. The value can be an Alibaba Cloud account UID or a database system account, depending on the data source type.
create_timestamp	bigint	13-digit creation timestamp.
update_timestamp	bigint	13-digit modification timestamp.
meta_entity_id	string	Unique identifier of the schema (API-friendly and compliant with metadata entity ID specifications).
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Tables (tables)

Field	Type	Description
datasource_type	string	Data source type, such as dlf, starrocks, maxcompute, holodb, or mysql.
datasource_id	string	Data source identifier, such as a StarRocks cluster ID, the ID of the Alibaba Cloud account to which DLF or MaxCompute belongs, or an RDS instance ID.
catalog_name	string	Data catalog name. This field has a value if the data source type supports data catalogs.
database_name	string	Database name.
schema_name	string	Schema name. This field has a value if the data source type supports schemas.
name	string	Table name.
type	string	Table type.
comment	string	Comment.
partition_keys	string	Partition key. For multi-level partitions, fields are separated by commas (,).
location	string	Table storage path.
properties	string	Properties and parameters (JSON string). For views, this is the view definition DDL.
owner	string	Table owner. The value can be an Alibaba Cloud account UID or a database system account, depending on the data source type.
content_size	bigint	Storage size in bytes.
data_retention	map<string,string>	Data retention period or lifecycle. The value varies based on the table type. For MaxCompute tables, the key is `lifecycle` and the value is the table's lifecycle, such as 365. For DLF tables, the key is `retention` and the value is the table's lifecycle, such as 91. This is not yet supported for other types. This document will be updated if support is added.
is_compressed	boolean	Indicates whether the data is compressed.
is_temporary	boolean	Indicates whether the table is a temporary table.
entity_type	string	Entity type, such as table, view, or materialized_view.
input_format	string	Input format.
output_format	string	Output format.
serde_parameters	string	SerDe parameters.
serialization_lib	string	Serialization library.
create_timestamp	bigint	13-digit table creation timestamp.
meta_modified_timestamp	bigint	13-digit timestamp indicating when the table metadata was modified.
data_modified_timestamp	bigint	13-digit timestamp indicating when the table data was modified.
last_access_timestamp	bigint	13-digit timestamp indicating when the table was last accessed.
business_description	string	Business description or Chinese name.
meta_entity_id	string	Unique identifier of the table (API-friendly and compliant with metadata entity ID specifications). Examples: maxcompute-table: Alibaba Cloud account ID::project_name:schema_name:table_name. holo-table: Hologres instance ID::sample_database:public_schema:table_name. starrocks-table: Cluster instance ID:default_catalog:sample_database::sample_table.
uuid	string	Table UUID, used to link to the DataWorks Data Map table details page.
business_tags	array<string>	Business tags. Tags set on the Data Map page are recorded in this field.
wikis	array<struct<`version`:bigint,`operator`:string,`update_timestamp`:bigint,`content`:string>>	Instructions for using the table (version: version number; operator: committer; update_timestamp: 13-digit update timestamp; content: content).
producing_tasks	array<bigint>	List of scheduling task IDs that produce data for the table. For more information, see the tasks table.
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Fields (columns)

Field	Type	Description
datasource_type	string	Data source type, such as dlf or starrocks.
datasource_id	string	Data source identifier, such as a StarRocks cluster ID, the ID of the Alibaba Cloud account to which DLF or MaxCompute belongs, or an RDS instance ID.
catalog_name	string	Data catalog name. This field has a value if the data source type supports data catalogs.
database_name	string	Database name.
schema_name	string	Schema name. This field has a value if the data source type supports schemas.
table_name	string	Table name.
name	string	Field name.
type	string	Field type.
comment	string	Comment.
ordinal_position	bigint	Ordinal position of the field, starting from 1.
is_primary_key	boolean	Indicates whether the field is a primary key.
is_nullable	boolean	Indicates whether the field can be NULL.
is_partition_key	boolean	Indicates whether the field is a partition key.
properties	string	Properties and parameters (JSON string).
business_description	string	Business description.
meta_entity_id	string	Unique identifier of the field (API-friendly and compliant with metadata entity ID specifications).
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Partitions (partitions)

Field	Type	Description
datasource_type	string	Data source type, such as maxcompute, dlf, or starrocks.
datasource_id	string	Data source identifier, such as a StarRocks cluster ID, the ID of the Alibaba Cloud account to which DLF or MaxCompute belongs, or an RDS instance ID.
catalog_name	string	Data catalog name. This field has a value if the data source type supports data catalogs.
database_name	string	Database name.
schema_name	string	Schema name. This field has a value if the data source type supports schemas.
table_name	string	Table name.
name	string	Partition name (partition specification).
create_timestamp	bigint	13-digit creation timestamp.
update_timestamp	bigint	13-digit modification timestamp.
content_size	bigint	Partition size in bytes.
properties	string	Properties and parameters (JSON string).
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Table-level and column-level lineage (lineages)

Field	Type	Description
source_meta_entity_id	string	Unique identifier of the source (API-friendly and compliant with metadata entity ID specifications).
source_raw_entity_type	string	Source entity type. If the identified metadata is not managed, source_meta_entity_type is empty and source_raw_entity_type is used as the identifier.
source_uuid	string	Unique identifier of the source (page-access-friendly).
target_meta_entity_id	string	Unique identifier of the target (API-friendly and compliant with metadata entity ID specifications).
target_raw_entity_type	string	Target entity type. If the identified metadata is not managed, target_meta_entity_type is empty and target_raw_entity_type is used as the identifier.
target_uuid	string	Unique identifier of the target (page-access-friendly).
compute_engine	string	Compute engine, such as maxcompute, datax, or hologres.
transform_type	string	Transform task type in the engine, such as SQL, DATAX, DATAX_STREAM, EXTERNAL_TABLE_MAPPING, STORAGE_MAPPING, or API_MAPPING.
task_id	bigint	DataWorks scheduling task ID. For more information, see the tasks table. This field is empty for lineage data not triggered by DataWorks scheduling.
task_instance_id	bigint	DataWorks scheduling task instance ID. For more information, see the tasks_instances table. This field is empty for lineage data not triggered by DataWorks scheduling.
lineage_time	bigint	Time when the lineage was generated, in milliseconds.
granularity	string	Lineage level, such as TABLE or COLUMN.
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Task and workflow definitions (tasks)

Field	Type	Description
id	bigint	Task ID.
name	string	Task name.
description	string	Task description.
type	bigint	Task type. For more information, see the node type codes in Node development.
workflow_id	bigint	Workflow ID.
instance_mode	string	Instance generation mode. T+1 (generated the next day) Immediately (generated immediately)
baseline_id	bigint	Baseline ID.
priority	bigint	Task priority. The value ranges from 1 to 8. A larger value indicates a higher priority. The default priority is 1.
timeout	bigint	Task execution timeout period in hours.
rerun_mode	bigint	Rerun configuration. 0: Rerun is allowed only on failure. 1: Rerun is allowed on both failure and success. 2: Rerun is not allowed on failure or success.
rerun_times	bigint	Number of retries. This takes effect when the task is configured to be rerunnable.
rerun_interval	bigint	Retry interval in seconds.
script_parameters	string	List of runtime script parameters.
trigger_type	string	Trigger method type (Scheduler: triggered by a scheduling cycle; Manual: triggered manually).
trigger_recurrence	bigint	Running mode when triggered. 0: Normal. 1: Manual task. 2: Paused. 3: Dry-run. 4: Referenced task.
trigger_cron	string	cron expression. This takes effect when type is set to Scheduler.
trigger_start_time	string	Effective time for periodic triggering. This takes effect when type is set to Scheduler.
trigger_end_time	string	Expiration time for periodic triggering. This takes effect when type is set to Scheduler.
runtime_resource_group_id	bigint	ID of the resource group for running the task.
runtime_image	string	ID of the runtime image configured for the task.
runtime_cu	string	CU consumption configured for the task.
datasource_name	string	Data source name.
inputs_variables	array<struct<`name`:string,`type`:string,`value`:string>>	List of input variables.
outputs	array<struct<`output`:string,`type`:string>>	List of task output identifiers.
outputs_variables	array<struct<`name`:string,`type`:string,`value`:string>>	List of output variables.
dependencies	array<struct<`type`:string,`upstream_output`:string,`upstream_node_id`:bigint>>	List of dependency information.
related_workflow_id	bigint	Associated workflow ID.
tags	array<struct<`key`:string,`value`:string>>	List of task tags.
project_id	bigint	Project ID. For more information, see the workspace_id field in the workspaces table.
project_env	string	Environment type (PROD: production; DEV: development).
owner	string	Account ID of the task owner. For more information, see the users table.
create_time	string	Creation time.
modify_time	string	Modification time.
create_user	string	Account ID of the creator. For more information, see the users table.
modify_user	string	Account ID of the modifier. For more information, see the users table.
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Task and workflow instances (task_instances)

Field	Type	Description
id	bigint	Task instance ID.
node_id	bigint	Task ID. For more information, see the tasks table.
node_type	bigint	Task type. For more information, see the node type codes in Node development.
node_name	string	Task name.
description	string	Task description.
workflow_id	bigint	Workflow ID. For more information, see the tasks table.
workflow_name	string	Workflow name.
workflow_instance_id	bigint	Workflow instance ID.
workflow_instance_type	bigint	Workflow instance type. 0: Daily scheduling. 1: Manual task. 2: Smoke testing. 3: Data backfill. 4: One-time workflow. 5: Manual workflow.
trigger_type	string	Trigger method type (Scheduler/Manual).
trigger_recurrence	string	Running mode. 0: Normal. 1: Manual. 2: Paused. 3: Dry-run. 4: Referenced.
timeout	bigint	Task execution timeout period in hours.
rerun_mode	string	Rerun configuration. 0: Rerun on failure only. 1: Rerun on success or failure. 2: Rerun is not allowed.
run_number	bigint	Number of runs.
period_number	bigint	Epoch ordinal number.
baseline_id	bigint	Baseline ID.
priority	bigint	Task priority (1-8).
script_parameters	string	List of runtime script parameters.
runtime_resource_group_id	bigint	ID of the resource group for running the task.
runtime_resource_group_identifier	string	Identifier name of the resource group for running the task.
runtime_image	string	Runtime image ID.
runtime_cu	string	Runtime CU consumption.
runtime_process_id	string	Runtime process ID.
runtime_gateway	string	Runtime gateway.
datasource_name	string	Data source name.
inputs_variables	array<struct<`name`:string,`type`:string,`value`:string>>	List of input variables.
outputs	array<struct<`output`:string,`type`:string>>	List of output identifiers.
outputs_variables	array<struct<`name`:string,`type`:string,`value`:string>>	List of output variables.
tags	array<struct<`key`:string,`value`:string>>	List of task tags.
status	bigint	Task status. 1: Not run. 2: Waiting for time. 3: Waiting for resource. 4: Running. 5: Failed. 6: Succeeded. 7: Verifying. 8: Condition check. 9: Waiting for trigger.
trigger_time	string	Trigger time.
bizdate	string	Data timestamp.
started_time	string	Start time.
finished_time	string	End time.
project_id	bigint	Project ID. For more information, see the workspace_id field in the workspaces table.
project_env	string	Environment type (PROD/DEV).
owner	string	Account ID of the owner. For more information, see the users table.
create_time	string	Creation time.
modify_time	string	Modification time.
create_user	string	Account ID of the creator. For more information, see the users table.
modify_user	string	Account ID of the modifier. For more information, see the users table.
waiting_resource_time	string	Time waiting for resources.
waiting_trigger_time	string	Time waiting for trigger.
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Users (users)

Field	Type	Description
user_id	string	User identifier.
user_nick	string	Account alias (display name).
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Workspaces (workspaces)

Field	Type	Description
workspace_id	bigint	Workspace ID.
workspace_name	string	Workspace name.
workspace_identifier	string	Workspace identifier.
workspace_description	string	Workspace description.
workspace_owner	string	Identifier of the workspace owner. For more information, see the users table.
workspace_status	bigint	Workspace status. 0: Normal. 1: Deleted. 2: Initialization. 3: Initialization failed. 4: Manually disabled. 5: Deleting. 6: Deletion failed. 7: Frozen due to overdue payment.
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Workspace members (workspace_members)

Field	Type	Description
workspace_id	bigint	Workspace ID. For more information, see the workspaces table.
user_id	string	User identifier. For more information, see the users table.
user_status	bigint	User status. 0: Normal. 1: Disabled. 2: Deleted.
gmt_create_ts	bigint	Creation time (13-digit timestamp).
gmt_modified_ts	bigint	Modification time (13-digit timestamp).
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Resource groups (resource_groups)

Field	Type	Description
resource_group_id	bigint	Resource group ID.
resource_group_identifier	string	Resource group identifier.
resource_group_type	bigint	Resource group type. 1: Schedule resource group. 2: MaxCompute resource group. 4: Data integration resource group.
resource_group_mode	bigint	Resource group mode. 1: Subscription. 2: Pay-as-you-go. 3: Developer edition (MaxCompute only).
resource_group_status	bigint	Resource group status. 0: Normal. 1: Frozen. 2: Deleted. 3: Creating. 4: Creation failed. 5: Updating. 6: Update failed. 7: Deleting. 8: Deletion failed.
is_exclusive_resource_group	boolean	Indicates whether the resource group is exclusive.
dt	string	Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Example metadata

Table metric details (table_metrics_detail)

Field	Type	Description
datasource_type	string	Data source type.
datasource_id	string	Data source identifier.
catalog_name	string	Data catalog name.
database_name	string	Database name.
schema_name	string	Schema name.
table_name	string	Table name.
table_uuid	string	Table identifier, used to link to the details page.
meta_entity_id	string	Table identifier, highly readable.
content_size	bigint	Collected storage volume. The value is NULL if storage volume collection is not supported.
daily_rate_cs	decimal(16,6)	Day-to-day change rate of storage volume.
avg_content_size_7d	bigint	7-day average of storage volume.
daily_rate_acs_7d	decimal(16,6)	Day-to-day change rate of the 7-day average storage volume.
latest_data_update_time_31d	bigint	End time of the corresponding instance as a downstream lineage within a 31-day data range. Maximum value of data_modified_timestamp. The value is NULL if there are no updates within the 31-day range.
latest_data_update_task_id	bigint	ID of the scheduling task that most recently updated the table within 31 days.
latest_data_update_instance_id	bigint	ID of the scheduling task instance that most recently updated the table within 31 days.
latest_data_update_time_by_task	bigint	End time of the scheduling task instance that most recently updated the table within 31 days.
writing_task_ids	array<bigint>	IDs of scheduling tasks that write to the table for the current data timestamp (no duplicate IDs).
writing_task_ids_31d	array<bigint>	IDs of scheduling tasks that write to the table within a 31-day data range (no duplicate IDs).
latest_data_access_time_31d	bigint	End time of the corresponding instance as an upstream lineage within a 31-day data range. Maximum value of last_access_timestamp. The value is NULL if there are no accesses within the 31-day range.
latest_data_access_task_id	bigint	ID of the scheduling task that most recently read the table within 31 days.
latest_data_access_instance_id	bigint	ID of the scheduling task instance that most recently read the table within 31 days.
latest_data_access_time_by_task	bigint	End time of the corresponding instance as an upstream lineage within a 31-day data range.
reading_task_ids	array<string>	IDs of scheduling tasks that read the table.
reading_task_ids_31d	array<string>	IDs of scheduling tasks that read the table within a 31-day data range (no duplicate IDs).
direct_downstream_tables	array<string>	IDs of child tables (uuid).
direct_upstream_tables	array<string>	IDs of parent tables (uuid).
dt	string	Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Table metric summary (table_metrics_summary)

Field	Type	Description
table_count	bigint	Number of tables.
daily_rate_tc	decimal(16,6)	Day-to-day change rate of the number of tables.
avg_table_count_7d	bigint	7-day average number of tables.
daily_rate_atc_7d	decimal(16,6)	Day-to-day change rate of the 7-day average number of tables.
content_size	bigint	Collected storage volume. The value is NULL if storage volume collection is not supported.
daily_rate_cs	decimal(16,6)	Day-to-day change rate of storage volume.
avg_content_size_7d	bigint	7-day average of storage volume.
daily_rate_acs_7d	decimal(16,6)	Day-to-day change rate of the 7-day average storage volume.
updated_table_count	bigint	Number of tables updated within 31 days.
daily_rate_utc	decimal(16,6)	Day-to-day change rate of the number of tables updated within 31 days.
avg_updated_table_count_7d	bigint	7-day average number of tables updated within 31 days.
daily_rate_autc_7d	decimal(16,6)	Day-to-day change rate of the 7-day average number of tables updated within 31 days.
accessed_table_count	bigint	Number of tables read within 31 days.
daily_rate_atc	decimal(16,6)	Day-to-day change rate of the number of tables read within 31 days.
avg_accessed_table_count_7d	bigint	7-day average number of tables read within 31 days.
daily_rate_aatc_7d	decimal(16,6)	Day-to-day change rate of the 7-day average number of tables read within 31 days.
dt	string	Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Task metric details (task_metrics_detail)

Field	Type	Description
task_id	bigint	Task identifier.
workflow_id	bigint	Workflow identifier.
node_type	bigint	Task type.
project_id	bigint	Workspace identifier.
week_number	bigint	The week of the year for the data timestamp.
task_owner	string	Owner ID.
compute_resource_type	string	Computing resource type.
compute_resource_id	string	Computing resource identifier: MaxCompute project name, EMR cluster ID, Hologres instance ID, etc.
datasource_name	string	Data source name.
inst_success_count	bigint	Number of successful instances.
inst_failed_count	bigint	Number of failed instances.
inst_running_count	bigint	Number of running instances.
inst_abnormal_count	bigint	Number of abnormal instances.
inst_not_started_count	bigint	Number of instances not started.
inst_runtime_cu	double	Instance runtime CU consumption.
task_avg_cu_31d	double	Average daily CU consumption of the task (within 31 days).
dt	string	Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.

Task metric summary (task_metrics_summary)

Field	Type	Description
node_type	bigint	Node type.
inst_status	string	Instance status.
inst_count	bigint	Number of instances.
avg_inst_count_7d	double	7-day average number of instances.
granularity	string	Statistic granularity: DAILY, WEEKLY.
dt	string	Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday.