DataWorks open data provides tables and views across various dimensions to help you collect metadata. This topic describes the tables and views available in DataWorks open data and their detailed structures.
Metadata
This set of metadata tables and example metric statistics tables is generated by DataWorks based on the metadata of the current tenant's tables, tasks, instances, workspaces, members, and projects. The actual table structures are subject to dynamic adjustments based on business development. The content displayed on the system interface prevails.
Data asset metadata
Asset table issue details (asset_table_issues)
Partition field: dt
Description: Details of data governance issues for tables.
Field | Type | Description |
tenant_id | string | DataWorks tenant. |
meta_entity_id | string | ID of the corresponding metadata entity. |
uuid | string | Unique key of the table. |
meta_entity_type | string | Type of the corresponding metadata, such as maxcompute-table. |
entity_type | string | Entity type, such as table, view, or materialized_view. |
account_id | string | Alibaba Cloud account to which the asset belongs. |
datasource_type | string | Data source type, such as EMR or MaxCompute. |
datasource_id | string | Compute engine name (MaxCompute: projectName, EMR: clusterId, Hologres: databaseName). |
catalog_name | string | The DLF data catalog when the metadata is from DLF. |
database_name | string | Database name (EMR dbName). |
schema_name | string | Schema name. |
rule_id | string | Governance item identifier. |
rule_name_zh | string | Chinese name of the governance item. |
rule_name_en | string | English name of the governance item. |
category | string | Dimension to which the item belongs. |
deduct_score_tenant | string | Global score deduction, accurate to four decimal places. |
deduct_score_owner | string | Individual score deduction, accurate to four decimal places. |
cost | string | Wasted resources. |
project_id | string | DataWorks project. |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. |
Asset table metric details (asset_table_profiles)
Partition field: dt
Description: Details of table metrics.
Field | Type | Description |
tenant_id | bigint | Source tenant ID. |
meta_entity_id | string | ID of the corresponding metadata entity. |
meta_entity_type | string | Type of the corresponding metadata, such as maxcompute-table. |
entity_type | string | Entity type, such as table, view, or materialized_view. |
account_id | string | Alibaba Cloud account to which the asset belongs. |
datasource_type | string | Data source type, such as EMR or MaxCompute. |
datasource_id | string | Compute engine name (MaxCompute: projectName, EMR: clusterId, Hologres: databaseName). |
catalog_name | string | The DLF data catalog when the metadata is from DLF. |
database_name | string | Database name (EMR dbName). |
schema_name | string | Schema name. |
uuid | string | Unique key of the table. |
name | string | Table name. |
owner | string | Asset owner. |
last_access_timestamp | bigint | Time when the table was last accessed. |
meta_modified_timestamp | bigint | 13-digit timestamp indicating when the table metadata was modified. |
data_modified_timestamp | bigint | 13-digit timestamp indicating when the table data was modified. |
create_timestamp | bigint | Time when the table was created. |
comment | string | Table comment. |
partition_keys | string | Partition key. |
tags | string | Asset tags. |
governance_rule_finding_count | bigint | Number of governance item issues. |
governance_rule_finding_history_count | string | Historical number of asset governance items. |
governance_health_score | string | Asset governance score. |
governance_health_level | string | Asset governance score level. |
is_partitioned | bigint | Indicates whether the table is a partitioned table. |
content_size | bigint | Logical size. |
record_num | bigint | Number of records. |
life_cycle | string | Lifecycle. |
partition_count | bigint | Number of partitions. |
view_count_monthly | bigint | Number of views in the last month. |
access_count | bigint | Number of accesses. |
upstream_table_count | bigint | Number of upstream tables. |
upstream_table_detail | string | Details of upstream tables. |
downstream_table_count | bigint | Number of downstream tables. |
downstream_table_detail | string | Details of downstream tables. |
producing_project_ids | string | List of workspaces involved in table output. |
producing_tasks_count | bigint | Number of nodes involved in table output. |
producing_tasks_detail | string | Details of nodes involved in table output. |
using_tasks_count | bigint | Number of nodes that use the table. |
using_tasks_detail | string | Details of nodes that use the table. |
quality_rule_count | bigint | Number of quality rules. |
quality_monitor_count | bigint | Number of quality monitoring metrics. |
quality_rule_7_days_failed_count | bigint | Number of failed quality rules. |
quality_monitor_7_days_failed_count | bigint | Number of failed quality monitoring metrics. |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. |
Asset task issue details (asset_task_issues)
Partition field: dt
Description: Details of data governance issues for tasks.
Field | Type | Description |
tenant_id | string | DataWorks tenant ID. |
node_id | string | Scheduling node ID. |
node_name | string | Node name. |
node_type | string | Task type: SQL, SQLCost, LOT, or CUPID. |
node_owner | string | Base ID of the owner. |
priority | string | Priority. |
rule_id | string | Governance item identifier. |
rule_name_zh | string | Chinese name of the governance item. |
rule_name_en | string | English name of the governance item. |
category | string | The governance domain to which it belongs. |
deduct_score_tenant | string | Global score deduction, accurate to four decimal places. |
deduct_score_owner | string | Individual score deduction, accurate to four decimal places. |
cost | string | Governance benefits. |
project_id | string | DataWorks project ID. |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. |
Asset task metric details (asset_task_profiles)
Partition field: dt
Description: Details of task metrics.
Field | Type | Description |
tenant_id | bigint | Source tenant ID. |
data_asset_id | string | Asset ID within the module, corresponding to task.id. |
name | string | Asset name, corresponding to task.name. |
project_id | bigint | Workspace where the asset is located. |
project_env | string | Environment. PROD: production. DEV: development. |
owner | string | Asset owner. |
create_user | string | Creator. |
create_time | bigint | Creation time. |
modify_user | string | Modifier. |
modify_time | bigint | Modification time. |
trigger_type | string | Trigger method type. Scheduler: triggered by a scheduling cycle. Manual: triggered manually. |
trigger_recurrence_type | string | Normal: runs normally. Manual: manual task. Pause: paused. Skip: dry-run. |
trigger_cron | string | cron expression. |
type | bigint | Execution code type. For more information, see the node type codes in Node development. |
script_parameters | string | Parameter information. |
priority | bigint | Task priority. The value ranges from 1 to 8. A larger value indicates a higher priority. The default priority is 1. |
trigger_start_time | bigint | Start date for scheduling. |
trigger_end_time | bigint | End date for scheduling. |
runtime_resource_group_id | bigint | ID of the resource group to which the node belongs. |
runtime_cu | string | Computing CUs. |
baseline_id | bigint | ID of the baseline to which the node belongs. |
rerun_times | bigint | Number of times the task can be rerun. |
rerun_interval | bigint | Rerun interval in milliseconds. |
rerun_mode_type | string | AllAllowed: Rerun is allowed on both failure and success. FailureAllowed: Rerun is allowed only on failure. AllDenied: Rerun is not allowed on failure or success. |
tags | string | Asset tags. |
tags_count | bigint | Number of asset tags. |
input_table_count | bigint | Number of input tables. |
output_table_count | bigint | Number of output tables. |
input_table_detail | string | Details of input tables. |
output_table_detail | string | Details of output tables. |
upstream_node_count | bigint | Number of upstream nodes. |
downstream_node_count | bigint | Number of downstream nodes. |
governance_rule_finding_count | bigint | Number of governance item issues. |
governance_rule_finding_history_count | string | Historical number of asset governance items. |
governance_health_score | string | Asset score. |
governance_health_level | string | Asset score level. |
engine_datasource_id | string | Compute engine ID. |
engine_instance_count | bigint | Number of compute engine jobs. |
engine_instance_run_time | bigint | Runtime of compute engine jobs. |
engine_instance_comput_volume_cost | string | Computation volume. |
engine_instance_cu_cost | string | Computing CUs. |
engine_instance_cpu_cost | string | CPU consumption. |
engine_instance_mem_cost | string | Memory consumption. |
engine_instance_exist_data_skew | bigint | Data skew. |
engine_instance_suggestions | string | Suggestions for data skew. |
engine_instance_data_skew_ids | string | Job IDs with data skew. |
engine_instance_ids | string | Job IDs. |
task_instance_wait_time_cost_sum | bigint | Total wait time. |
task_instance_wait_time_cost_max | bigint | Maximum instance wait time. |
task_instance_run_time_cost_sum | bigint | Total runtime. |
task_instance_run_time_cost_max | bigint | Maximum runtime. |
task_instance_7_days_wait_time_cost_max | bigint | Maximum instance wait time in the last 7 days. |
task_instance_7_days_run_time_cost_max | bigint | Maximum instance runtime in the last 7 days. |
task_instance_count | bigint | Number of instances. |
task_instance_7_days_failed_count | bigint | Number of failed instances. |
task_instance_7_days_failed_day_count | bigint | Number of days with failures. |
task_instance_7_days_frezeed_day_count | bigint | Number of days frozen. |
task_instance_7_days_dry_run_day_count | bigint | Number of days with dry-runs. |
quality_monitor_count | bigint | Number of Data Quality monitoring metrics. |
quality_monitor_7_days_failed_count | bigint | Number of failed Data Quality monitoring metrics. |
di_task_resource_group_id | string | ID of the data integration resource group to which the node belongs. |
di_task_is_public_network | bigint | Indicates whether the data integration task uses Internet traffic. |
di_task_concurrency | bigint | Concurrency. |
di_task_total_records | bigint | Number of synchronized records. |
di_task_total_bytes | bigint | Volume of synchronized data. |
di_task_source_type | string | Source type. |
di_task_target_type | string | Target type. |
di_task_run_time_cost | bigint | Execution duration of the data integration task. |
di_task_wait_time_cost | bigint | Wait time of the data integration task. |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. |
Data Quality
Data quality rule instances (quality_rule_results)
Partition field: dt
Description: Data quality rule instances.
Field | Type | Description |
id | bigint | Primary key ID. |
scan_run_id | bigint | Quality monitoring instance ID. |
rule_id | bigint | Rule ID. |
rule_name | string | Rule name. |
status | string | Rule check result: Pass, Error, Warn, Fail, or Running. |
severity | string | Rule severity: High or Normal. |
create_time | bigint | Creation time. |
modify_time | bigint | Time of the last modification. |
spec | string | Rule instance spec. |
tags | array<string> | Rule instance tags. |
tenant_id | bigint | DataWorks tenant ID. |
project_id | bigint | DataWorks project ID. |
meta_entity_id | string | Unique identifier of the Data Map table entity. |
dt | string | Date partition in the YYYYMMDD format. The value can be any date from D days ago to yesterday. |
Data quality rule metric details (quality_rules)
Partition field: dt
Description: Details of data quality rule metrics.
Field | Type | Description |
id | bigint | Primary key ID. |
scan_id | bigint | Quality monitoring ID. |
rule_name | string | Rule name. |
enabled | boolean | Indicates whether the rule is enabled. |
severity | string | Business severity level of the rule. Enumeration values: High, Normal. |
create_time | bigint | Creation time. |
modify_time | bigint | Time of the last modification. |
spec | string | Rule spec. |
tags | array<string> | Rule tags. |
tenant_id | bigint | DataWorks tenant ID. |
project_id | bigint | DataWorks project ID. |
meta_entity_id | string | Unique identifier of the Data Map entity. |
pass_count | int | Number of times the rule check passed. |
warn_count | int | Number of times the rule check triggered the warning threshold. |
error_count | int | Number of times the rule check triggered the error threshold. |
fail_count | int | Number of times the rule check failed. |
dt | string | Date partition in the YYYYMMDD format. The value can be any date from D days ago to yesterday. |
Data quality monitoring task instances (quality_scan_runs)
Partition field: dt
Description: Data quality monitoring task instances.
Field | Type | Description |
id | bigint | Primary key ID. |
scan_id | bigint | Quality monitoring ID. |
name | string | Monitoring name. |
status | string | Monitoring instance status: Pass, Warn, Error, Fail, or Running. |
post_action_type | string | Action taken after the monitoring check. Enumeration values: Alert or BlockTaskInstance. |
data_filter | string | The data scope actually used for sampling. |
trigger_time | bigint | The scheduled time used by the task. |
trigger_type | string | Trigger method for Data Quality monitoring: ByManual, BySchedule, or ByQualityNode. |
create_time | bigint | Creation time. |
modify_time | bigint | Time of the last update. |
datasource_id | bigint | ID of the data source to which the table belongs. |
datasource_type | string | Data source type. |
computing_resource_id | bigint | Compute engine ID. |
compute_resource_option | string | Computing resource used for running Data Quality monitoring. |
spec | string | Quality monitoring spec. |
tenant_id | bigint | DataWorks tenant ID. |
project_id | bigint | DataWorks project ID. |
owner | string | Owner of the quality monitoring task. |
task_id | bigint | Scheduling task ID. |
task_instance_id | bigint | Scheduling task instance ID. |
meta_entity_id | string | Unique identifier of the Data Map entity. |
table_name | string | Table name. |
catalog_name | string | Name of the data catalog to which the table belongs. |
schema_name | string | Name of the schema to which the table belongs. |
database_name | string | Name of the database to which the table belongs. |
cluster_id | string | ID of the cluster to which the table belongs. |
dt | string | Date partition in the YYYYMMDD format. The value can be any date from D days ago to yesterday. |
Data quality monitoring task metric details (quality_scans)
Partition field: dt
Description: Details of data quality monitoring task metrics.
Field | Type | Description |
id | bigint | Primary key ID. |
name | string | Monitoring name. |
data_filter_type | string | Data scope type: ByPartition or ByWhere. |
data_filter | string | Data scope expression. |
trigger_type | string | Trigger method for Data Quality monitoring: ByManual, BySchedule, or ByQualityNode. |
create_time | bigint | Creation time. |
modify_time | bigint | Time of the last update. |
computing_resource_id | bigint | Compute engine ID. |
compute_resource_option | string | Computing resource used for running Data Quality monitoring. |
spec | string | Data Quality monitoring spec. |
related_tasks | array<bigint> | Scheduling tasks associated with the monitoring task. |
tenant_id | bigint | DataWorks tenant ID. |
project_id | bigint | DataWorks project ID. |
owner | string | Owner of the quality monitoring task. |
datasource_id | string | ID of the data source to which the table belongs. |
datasource_type | string | Data source type. |
meta_entity_id | string | Unique identifier of the Data Map entity. |
table_name | string | Table name. |
catalog_name | string | Name of the data catalog to which the table belongs. |
schema_name | string | Name of the schema to which the table belongs. |
database_name | string | Name of the database to which the table belongs. |
cluster_id | string | ID of the cluster to which the table belongs. |
related_scheduler_task_count | int | Number of associated scheduling tasks. |
rule_count | int | Number of associated rules. |
high_severity_rule_count | int | Number of associated strong rules. |
normal_severity_rule_count | int | Number of associated weak rules. |
enabled_rule_count | int | Number of enabled rules. |
enabled_high_severity_rule_count | int | Number of enabled strong rules. |
enabled_normal_severity_rule_count | int | Number of enabled weak rules. |
rule_instance_count | int | Number of rule instances for today. |
high_severity_rule_instance_count | int | Number of strong rule instances for today. |
normal_severity_rule_instance_count | int | Number of weak rule instances for today. |
high_severity_rule_instance_pass_count | int | Number of passed strong rule instances for today. |
high_severity_rule_instance_warn_count | int | Number of strong rule instances with warnings for today. |
high_severity_rule_instance_error_count | int | Number of strong rule instances with errors for today. |
high_severity_rule_instance_fail_count | int | Number of failed strong rule instances for today. |
normal_severity_rule_instance_pass_count | int | Number of passed weak rule instances for today. |
normal_severity_rule_instance_warn_count | int | Number of weak rule instances with warnings for today. |
normal_severity_rule_instance_error_count | int | Number of weak rule instances with errors for today. |
normal_severity_rule_instance_fail_count | int | Number of failed weak rule instances for today. |
block_task_instance_count | int | Number of blocked scheduling tasks for today. |
alert_rule_count | int | Number of configured alert subscriptions. |
sms_alert_rule_count | int | Number of configured text message alert subscriptions. |
mail_alert_rule_count | int | Number of configured email alert subscriptions. |
phone_alert_rule_count | int | Number of configured phone call alert subscriptions. |
ding_alert_rule_count | int | Number of configured DingTalk alert subscriptions. |
feishu_alert_rule_count | int | Number of configured Lark alert subscriptions. |
weixin_alert_rule_count | int | Number of configured WeChat alert subscriptions. |
webhook_alert_rule_count | int | Number of configured custom webhook alert subscriptions. |
alert_times | int | Number of alerts triggered today. |
sms_alert_times | int | Number of text message alerts triggered today. |
mail_alert_times | int | Number of email alerts triggered today. |
phone_alert_times | int | Number of phone call alerts triggered today. |
ding_alert_times | int | Number of DingTalk alerts triggered today. |
feishu_alert_times | int | Number of Lark alerts triggered today. |
weixin_alert_times | int | Number of WeChat alerts triggered today. |
webhook_alert_times | int | Number of custom webhook alerts triggered today. |
dt | string | Date partition in the YYYYMMDD format. The value can be any date from D days ago to yesterday. |
Data quality table metric details (table_quality_summary)
Partition field: dt
Description: Details of data quality table metrics.
Field | Type | Description |
meta_entity_id | string | Unique identifier of the Data Map table entity. |
project_id | bigint | DataWorks project ID. |
table_name | string | Table name. |
schema_name | string | Name of the schema to which the table belongs. |
database_name | string | Name of the database to which the table belongs. |
catalog_name | string | Name of the data catalog to which the table belongs. |
datasource_id | bigint | ID of the data source to which the table belongs. This is NULL if Data Quality is not configured. |
tenant_id | bigint | DataWorks tenant ID. |
owner | string | Table owner. |
scan_count | int | Number of configured quality monitoring tasks. |
scheduler_related_scan_count | int | Number of quality monitoring tasks associated with scheduling. |
scan_run_count | int | Number of quality monitoring task instances for today. |
alert_scan_run_count | int | Number of quality monitoring task instances that triggered alerts today. |
block_task_instance_scan_run_count | int | Number of quality monitoring task instances that blocked scheduling tasks today. |
rule_count | int | Number of configured rules. |
enabled_rule_count | int | Number of enabled rules. |
high_severity_rule_count | int | Number of configured strong rules. |
normal_severity_rule_count | int | Number of configured weak rules. |
rule_instance_count | int | Number of rule instances for today. |
high_severity_rule_instance_count | int | Number of strong rule instances for today. |
normal_severity_rule_instance_count | int | Number of weak rule instances for today. |
high_severity_rule_instance_pass_count | int | Number of times strong rule checks passed today. |
high_severity_rule_instance_warn_count | int | Number of times strong rule checks resulted in warnings today. |
high_severity_rule_instance_error_count | int | Number of times strong rule checks resulted in errors today. |
high_severity_rule_instance_fail_count | int | Number of times strong rule checks failed today. |
normal_severity_rule_instance_pass_count | int | Number of times weak rule checks passed today. |
normal_severity_rule_instance_warn_count | int | Number of times weak rule checks resulted in warnings today. |
normal_severity_rule_instance_error_count | int | Number of times weak rule checks resulted in errors today. |
normal_severity_rule_instance_fail_count | int | Number of times weak rule checks failed today. |
dt | string | Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Data catalogs (catalogs)
Field | Type | Description |
datasource_type | string | Data source type, such as dlf or starrocks. |
datasource_id | string | Data source identifier, such as a StarRocks cluster ID or the ID of the Alibaba Cloud account to which DLF belongs. |
name | string | Data catalog name. |
type | string | Data catalog type, such as Hive or Jdbc. |
comment | string | Data catalog comment. |
location | string | Directory path. |
properties | string | Properties and parameters (JSON string). |
owner | string | Owner of the data catalog. The value can be an Alibaba Cloud account UID or a database system account, depending on the data source type. |
create_timestamp | bigint | 13-digit creation timestamp. |
update_timestamp | bigint | 13-digit modification timestamp. |
meta_entity_id | string | Unique identifier of the data catalog (API-friendly and compliant with metadata entity ID specifications). |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Databases (databases)
Field | Type | Description |
datasource_type | string | Data source type, such as |
datasource_id | string | Data source identifier, such as a StarRocks cluster ID, the ID of the Alibaba Cloud account to which DLF or MaxCompute belongs, or an RDS instance ID. |
catalog_name | string | Data catalog name. This field has a value if the data source type supports data catalogs. |
name | string | Database name. |
type | string | Database type. |
comment | string | Database comment. |
location | string | Database path. |
properties | string | Properties and parameters (JSON string). |
owner | string | Database owner. The value can be an Alibaba Cloud account UID or a database system account, depending on the data source type. |
is_external | boolean | Indicates whether the database is an external database. |
create_timestamp | bigint | 13-digit creation timestamp. |
update_timestamp | bigint | 13-digit modification timestamp. |
meta_entity_id | string | Unique identifier of the database (API-friendly and compliant with metadata entity ID specifications). |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Schemas (schemas)
Field | Type | Description |
datasource_type | string | Data source type, such as holodb, maxcompute, or postgresql. |
datasource_id | string | Data source identifier, such as an RDS instance ID or the ID of the Alibaba Cloud account to which MaxCompute belongs. |
catalog_name | string | Data catalog name. This field has a value if the data source type supports data catalogs. |
database_name | string | Database name. |
name | string | Schema name. |
type | string | Schema type. |
comment | string | Comment. |
properties | string | Properties and parameters (JSON string). |
owner | string | Schema owner. The value can be an Alibaba Cloud account UID or a database system account, depending on the data source type. |
create_timestamp | bigint | 13-digit creation timestamp. |
update_timestamp | bigint | 13-digit modification timestamp. |
meta_entity_id | string | Unique identifier of the schema (API-friendly and compliant with metadata entity ID specifications). |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Tables (tables)
Field | Type | Description |
datasource_type | string | Data source type, such as dlf, starrocks, maxcompute, holodb, or mysql. |
datasource_id | string | Data source identifier, such as a StarRocks cluster ID, the ID of the Alibaba Cloud account to which DLF or MaxCompute belongs, or an RDS instance ID. |
catalog_name | string | Data catalog name. This field has a value if the data source type supports data catalogs. |
database_name | string | Database name. |
schema_name | string | Schema name. This field has a value if the data source type supports schemas. |
name | string | Table name. |
type | string | Table type. |
comment | string | Comment. |
partition_keys | string | Partition key. For multi-level partitions, fields are separated by commas (,). |
location | string | Table storage path. |
properties | string | Properties and parameters (JSON string). For views, this is the view definition DDL. |
owner | string | Table owner. The value can be an Alibaba Cloud account UID or a database system account, depending on the data source type. |
content_size | bigint | Storage size in bytes. |
data_retention | map<string,string> | Data retention period or lifecycle. The value varies based on the table type. For MaxCompute tables, the key is `lifecycle` and the value is the table's lifecycle, such as 365. For DLF tables, the key is `retention` and the value is the table's lifecycle, such as 91. This is not yet supported for other types. This document will be updated if support is added. |
is_compressed | boolean | Indicates whether the data is compressed. |
is_temporary | boolean | Indicates whether the table is a temporary table. |
entity_type | string | Entity type, such as table, view, or materialized_view. |
input_format | string | Input format. |
output_format | string | Output format. |
serde_parameters | string | SerDe parameters. |
serialization_lib | string | Serialization library. |
create_timestamp | bigint | 13-digit table creation timestamp. |
meta_modified_timestamp | bigint | 13-digit timestamp indicating when the table metadata was modified. |
data_modified_timestamp | bigint | 13-digit timestamp indicating when the table data was modified. |
last_access_timestamp | bigint | 13-digit timestamp indicating when the table was last accessed. |
business_description | string | Business description or Chinese name. |
meta_entity_id | string | Unique identifier of the table (API-friendly and compliant with metadata entity ID specifications). Examples:
|
uuid | string | Table UUID, used to link to the DataWorks Data Map table details page. |
business_tags | array<string> | Business tags. Tags set on the Data Map page are recorded in this field. |
wikis | array<struct<`version`:bigint,`operator`:string,`update_timestamp`:bigint,`content`:string>> | Instructions for using the table (version: version number; operator: committer; update_timestamp: 13-digit update timestamp; content: content). |
producing_tasks | array<bigint> | List of scheduling task IDs that produce data for the table. For more information, see the tasks table. |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Fields (columns)
Field | Type | Description |
datasource_type | string | Data source type, such as dlf or starrocks. |
datasource_id | string | Data source identifier, such as a StarRocks cluster ID, the ID of the Alibaba Cloud account to which DLF or MaxCompute belongs, or an RDS instance ID. |
catalog_name | string | Data catalog name. This field has a value if the data source type supports data catalogs. |
database_name | string | Database name. |
schema_name | string | Schema name. This field has a value if the data source type supports schemas. |
table_name | string | Table name. |
name | string | Field name. |
type | string | Field type. |
comment | string | Comment. |
ordinal_position | bigint | Ordinal position of the field, starting from 1. |
is_primary_key | boolean | Indicates whether the field is a primary key. |
is_nullable | boolean | Indicates whether the field can be NULL. |
is_partition_key | boolean | Indicates whether the field is a partition key. |
properties | string | Properties and parameters (JSON string). |
business_description | string | Business description. |
meta_entity_id | string | Unique identifier of the field (API-friendly and compliant with metadata entity ID specifications). |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Partitions (partitions)
Field | Type | Description |
datasource_type | string | Data source type, such as maxcompute, dlf, or starrocks. |
datasource_id | string | Data source identifier, such as a StarRocks cluster ID, the ID of the Alibaba Cloud account to which DLF or MaxCompute belongs, or an RDS instance ID. |
catalog_name | string | Data catalog name. This field has a value if the data source type supports data catalogs. |
database_name | string | Database name. |
schema_name | string | Schema name. This field has a value if the data source type supports schemas. |
table_name | string | Table name. |
name | string | Partition name (partition specification). |
create_timestamp | bigint | 13-digit creation timestamp. |
update_timestamp | bigint | 13-digit modification timestamp. |
content_size | bigint | Partition size in bytes. |
properties | string | Properties and parameters (JSON string). |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Table-level and column-level lineage (lineages)
Field | Type | Description |
source_meta_entity_id | string | Unique identifier of the source (API-friendly and compliant with metadata entity ID specifications). |
source_raw_entity_type | string | Source entity type. If the identified metadata is not managed, source_meta_entity_type is empty and source_raw_entity_type is used as the identifier. |
source_uuid | string | Unique identifier of the source (page-access-friendly). |
target_meta_entity_id | string | Unique identifier of the target (API-friendly and compliant with metadata entity ID specifications). |
target_raw_entity_type | string | Target entity type. If the identified metadata is not managed, target_meta_entity_type is empty and target_raw_entity_type is used as the identifier. |
target_uuid | string | Unique identifier of the target (page-access-friendly). |
compute_engine | string | Compute engine, such as maxcompute, datax, or hologres. |
transform_type | string | Transform task type in the engine, such as SQL, DATAX, DATAX_STREAM, EXTERNAL_TABLE_MAPPING, STORAGE_MAPPING, or API_MAPPING. |
task_id | bigint | DataWorks scheduling task ID. For more information, see the tasks table. This field is empty for lineage data not triggered by DataWorks scheduling. |
task_instance_id | bigint | DataWorks scheduling task instance ID. For more information, see the tasks_instances table. This field is empty for lineage data not triggered by DataWorks scheduling. |
lineage_time | bigint | Time when the lineage was generated, in milliseconds. |
granularity | string | Lineage level, such as TABLE or COLUMN. |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Task and workflow definitions (tasks)
Field | Type | Description |
id | bigint | Task ID. |
name | string | Task name. |
description | string | Task description. |
type | bigint | Task type. For more information, see the node type codes in Node development. |
workflow_id | bigint | Workflow ID. |
instance_mode | string | Instance generation mode.
|
baseline_id | bigint | Baseline ID. |
priority | bigint | Task priority. The value ranges from 1 to 8. A larger value indicates a higher priority. The default priority is 1. |
timeout | bigint | Task execution timeout period in hours. |
rerun_mode | bigint | Rerun configuration. 0: Rerun is allowed only on failure. 1: Rerun is allowed on both failure and success. 2: Rerun is not allowed on failure or success. |
rerun_times | bigint | Number of retries. This takes effect when the task is configured to be rerunnable. |
rerun_interval | bigint | Retry interval in seconds. |
script_parameters | string | List of runtime script parameters. |
trigger_type | string | Trigger method type (Scheduler: triggered by a scheduling cycle; Manual: triggered manually). |
trigger_recurrence | bigint | Running mode when triggered. 0: Normal. 1: Manual task. 2: Paused. 3: Dry-run. 4: Referenced task. |
trigger_cron | string | cron expression. This takes effect when type is set to Scheduler. |
trigger_start_time | string | Effective time for periodic triggering. This takes effect when type is set to Scheduler. |
trigger_end_time | string | Expiration time for periodic triggering. This takes effect when type is set to Scheduler. |
runtime_resource_group_id | bigint | ID of the resource group for running the task. |
runtime_image | string | ID of the runtime image configured for the task. |
runtime_cu | string | CU consumption configured for the task. |
datasource_name | string | Data source name. |
inputs_variables | array<struct<`name`:string,`type`:string,`value`:string>> | List of input variables. |
outputs | array<struct<`output`:string,`type`:string>> | List of task output identifiers. |
outputs_variables | array<struct<`name`:string,`type`:string,`value`:string>> | List of output variables. |
dependencies | array<struct<`type`:string,`upstream_output`:string,`upstream_node_id`:bigint>> | List of dependency information. |
related_workflow_id | bigint | Associated workflow ID. |
tags | array<struct<`key`:string,`value`:string>> | List of task tags. |
project_id | bigint | Project ID. For more information, see the workspace_id field in the workspaces table. |
project_env | string | Environment type (PROD: production; DEV: development). |
owner | string | Account ID of the task owner. For more information, see the users table. |
create_time | string | Creation time. |
modify_time | string | Modification time. |
create_user | string | Account ID of the creator. For more information, see the users table. |
modify_user | string | Account ID of the modifier. For more information, see the users table. |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Task and workflow instances (task_instances)
Field | Type | Description |
id | bigint | Task instance ID. |
node_id | bigint | Task ID. For more information, see the tasks table. |
node_type | bigint | Task type. For more information, see the node type codes in Node development. |
node_name | string | Task name. |
description | string | Task description. |
workflow_id | bigint | Workflow ID. For more information, see the tasks table. |
workflow_name | string | Workflow name. |
workflow_instance_id | bigint | Workflow instance ID. |
workflow_instance_type | bigint | Workflow instance type. 0: Daily scheduling. 1: Manual task. 2: Smoke testing. 3: Data backfill. 4: One-time workflow. 5: Manual workflow. |
trigger_type | string | Trigger method type (Scheduler/Manual). |
trigger_recurrence | string | Running mode. 0: Normal. 1: Manual. 2: Paused. 3: Dry-run. 4: Referenced. |
timeout | bigint | Task execution timeout period in hours. |
rerun_mode | string | Rerun configuration. 0: Rerun on failure only. 1: Rerun on success or failure. 2: Rerun is not allowed. |
run_number | bigint | Number of runs. |
period_number | bigint | Epoch ordinal number. |
baseline_id | bigint | Baseline ID. |
priority | bigint | Task priority (1-8). |
script_parameters | string | List of runtime script parameters. |
runtime_resource_group_id | bigint | ID of the resource group for running the task. |
runtime_resource_group_identifier | string | Identifier name of the resource group for running the task. |
runtime_image | string | Runtime image ID. |
runtime_cu | string | Runtime CU consumption. |
runtime_process_id | string | Runtime process ID. |
runtime_gateway | string | Runtime gateway. |
datasource_name | string | Data source name. |
inputs_variables | array<struct<`name`:string,`type`:string,`value`:string>> | List of input variables. |
outputs | array<struct<`output`:string,`type`:string>> | List of output identifiers. |
outputs_variables | array<struct<`name`:string,`type`:string,`value`:string>> | List of output variables. |
tags | array<struct<`key`:string,`value`:string>> | List of task tags. |
status | bigint | Task status. 1: Not run. 2: Waiting for time. 3: Waiting for resource. 4: Running. 5: Failed. 6: Succeeded. 7: Verifying. 8: Condition check. 9: Waiting for trigger. |
trigger_time | string | Trigger time. |
bizdate | string | Data timestamp. |
started_time | string | Start time. |
finished_time | string | End time. |
project_id | bigint | Project ID. For more information, see the workspace_id field in the workspaces table. |
project_env | string | Environment type (PROD/DEV). |
owner | string | Account ID of the owner. For more information, see the users table. |
create_time | string | Creation time. |
modify_time | string | Modification time. |
create_user | string | Account ID of the creator. For more information, see the users table. |
modify_user | string | Account ID of the modifier. For more information, see the users table. |
waiting_resource_time | string | Time waiting for resources. |
waiting_trigger_time | string | Time waiting for trigger. |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Users (users)
Field | Type | Description |
user_id | string | User identifier. |
user_nick | string | Account alias (display name). |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Workspaces (workspaces)
Field | Type | Description |
workspace_id | bigint | Workspace ID. |
workspace_name | string | Workspace name. |
workspace_identifier | string | Workspace identifier. |
workspace_description | string | Workspace description. |
workspace_owner | string | Identifier of the workspace owner. For more information, see the users table. |
workspace_status | bigint | Workspace status. 0: Normal. 1: Deleted. 2: Initialization. 3: Initialization failed. 4: Manually disabled. 5: Deleting. 6: Deletion failed. 7: Frozen due to overdue payment. |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Workspace members (workspace_members)
Field | Type | Description |
workspace_id | bigint | Workspace ID. For more information, see the workspaces table. |
user_id | string | User identifier. For more information, see the users table. |
user_status | bigint | User status. 0: Normal. 1: Disabled. 2: Deleted. |
gmt_create_ts | bigint | Creation time (13-digit timestamp). |
gmt_modified_ts | bigint | Modification time (13-digit timestamp). |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Resource groups (resource_groups)
Field | Type | Description |
resource_group_id | bigint | Resource group ID. |
resource_group_identifier | string | Resource group identifier. |
resource_group_type | bigint | Resource group type. 1: Schedule resource group. 2: MaxCompute resource group. 4: Data integration resource group. |
resource_group_mode | bigint | Resource group mode. 1: Subscription. 2: Pay-as-you-go. 3: Developer edition (MaxCompute only). |
resource_group_status | bigint | Resource group status. 0: Normal. 1: Frozen. 2: Deleted. 3: Creating. 4: Creation failed. 5: Updating. 6: Update failed. 7: Deleting. 8: Deletion failed. |
is_exclusive_resource_group | boolean | Indicates whether the resource group is exclusive. |
dt | string | Date partition (logical partition field) in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Example metadata
Table metric details (table_metrics_detail)
Field | Type | Description |
datasource_type | string | Data source type. |
datasource_id | string | Data source identifier. |
catalog_name | string | Data catalog name. |
database_name | string | Database name. |
schema_name | string | Schema name. |
table_name | string | Table name. |
table_uuid | string | Table identifier, used to link to the details page. |
meta_entity_id | string | Table identifier, highly readable. |
content_size | bigint | Collected storage volume. The value is NULL if storage volume collection is not supported. |
daily_rate_cs | decimal(16,6) | Day-to-day change rate of storage volume. |
avg_content_size_7d | bigint | 7-day average of storage volume. |
daily_rate_acs_7d | decimal(16,6) | Day-to-day change rate of the 7-day average storage volume. |
latest_data_update_time_31d | bigint | End time of the corresponding instance as a downstream lineage within a 31-day data range. Maximum value of data_modified_timestamp. The value is NULL if there are no updates within the 31-day range. |
latest_data_update_task_id | bigint | ID of the scheduling task that most recently updated the table within 31 days. |
latest_data_update_instance_id | bigint | ID of the scheduling task instance that most recently updated the table within 31 days. |
latest_data_update_time_by_task | bigint | End time of the scheduling task instance that most recently updated the table within 31 days. |
writing_task_ids | array<bigint> | IDs of scheduling tasks that write to the table for the current data timestamp (no duplicate IDs). |
writing_task_ids_31d | array<bigint> | IDs of scheduling tasks that write to the table within a 31-day data range (no duplicate IDs). |
latest_data_access_time_31d | bigint | End time of the corresponding instance as an upstream lineage within a 31-day data range. Maximum value of last_access_timestamp. The value is NULL if there are no accesses within the 31-day range. |
latest_data_access_task_id | bigint | ID of the scheduling task that most recently read the table within 31 days. |
latest_data_access_instance_id | bigint | ID of the scheduling task instance that most recently read the table within 31 days. |
latest_data_access_time_by_task | bigint | End time of the corresponding instance as an upstream lineage within a 31-day data range. |
reading_task_ids | array<string> | IDs of scheduling tasks that read the table. |
reading_task_ids_31d | array<string> | IDs of scheduling tasks that read the table within a 31-day data range (no duplicate IDs). |
direct_downstream_tables | array<string> | IDs of child tables (uuid). |
direct_upstream_tables | array<string> | IDs of parent tables (uuid). |
dt | string | Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Table metric summary (table_metrics_summary)
Field | Type | Description |
table_count | bigint | Number of tables. |
daily_rate_tc | decimal(16,6) | Day-to-day change rate of the number of tables. |
avg_table_count_7d | bigint | 7-day average number of tables. |
daily_rate_atc_7d | decimal(16,6) | Day-to-day change rate of the 7-day average number of tables. |
content_size | bigint | Collected storage volume. The value is NULL if storage volume collection is not supported. |
daily_rate_cs | decimal(16,6) | Day-to-day change rate of storage volume. |
avg_content_size_7d | bigint | 7-day average of storage volume. |
daily_rate_acs_7d | decimal(16,6) | Day-to-day change rate of the 7-day average storage volume. |
updated_table_count | bigint | Number of tables updated within 31 days. |
daily_rate_utc | decimal(16,6) | Day-to-day change rate of the number of tables updated within 31 days. |
avg_updated_table_count_7d | bigint | 7-day average number of tables updated within 31 days. |
daily_rate_autc_7d | decimal(16,6) | Day-to-day change rate of the 7-day average number of tables updated within 31 days. |
accessed_table_count | bigint | Number of tables read within 31 days. |
daily_rate_atc | decimal(16,6) | Day-to-day change rate of the number of tables read within 31 days. |
avg_accessed_table_count_7d | bigint | 7-day average number of tables read within 31 days. |
daily_rate_aatc_7d | decimal(16,6) | Day-to-day change rate of the 7-day average number of tables read within 31 days. |
dt | string | Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Task metric details (task_metrics_detail)
Field | Type | Description |
task_id | bigint | Task identifier. |
workflow_id | bigint | Workflow identifier. |
node_type | bigint | Task type. |
project_id | bigint | Workspace identifier. |
week_number | bigint | The week of the year for the data timestamp. |
task_owner | string | Owner ID. |
compute_resource_type | string | Computing resource type. |
compute_resource_id | string | Computing resource identifier: MaxCompute project name, EMR cluster ID, Hologres instance ID, etc. |
datasource_name | string | Data source name. |
inst_success_count | bigint | Number of successful instances. |
inst_failed_count | bigint | Number of failed instances. |
inst_running_count | bigint | Number of running instances. |
inst_abnormal_count | bigint | Number of abnormal instances. |
inst_not_started_count | bigint | Number of instances not started. |
inst_runtime_cu | double | Instance runtime CU consumption. |
task_avg_cu_31d | double | Average daily CU consumption of the task (within 31 days). |
dt | string | Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |
Task metric summary (task_metrics_summary)
Field | Type | Description |
node_type | bigint | Node type. |
inst_status | string | Instance status. |
inst_count | bigint | Number of instances. |
avg_inst_count_7d | double | 7-day average number of instances. |
granularity | string | Statistic granularity: DAILY, WEEKLY. |
dt | string | Date partition in the YYYYMMDD format. The value can be any date from 31 days ago to yesterday. |