The data opening feature of DataWorks provides metadata tables in various dimensions for you to collect metadata. This topic provides lists of such tables and describes the structure details of these tables.
Unless otherwise specified, the metadata provided by the data opening feature includes only the metadata of the MaxCompute compute engine.
Metadata
RPT metrics
Raw metadata details
Scheduling-related metadata
Tenant metadata
Core table metrics: rpt_v_meta_ind_table_core
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
project_id | bigint | The DataWorks workspace ID. |
catalog_name | string | The name of the catalog. This field is set to odps for MaxCompute projects. |
database_name | string | The name of the database or MaxCompute project. |
table_name | string | The name of the table. |
table_uuid | string | The table ID. |
owner_yun_acct | string | The Alibaba Cloud account of the table owner. |
dim_life_cycle | bigint | The time to live (TTL). Unit: days. Valid values:
|
is_partition_table | boolean | Specifies whether the table is a partitioned table. Valid values:
|
entity_type | bigint | The type of the entity. Valid values:
|
categories | string | The category information. |
last_access_time | bigint | The time when the table was last accessed. The value is a 10-digit UNIX timestamp. |
size | bigint | The size of the table, which indicates the logical storage space occupied by data in the table. Unit: bytes. This field is set to NULL for a view. |
column_count | bigint | The number of fields in the table. Partition key columns are included. |
partition_count | bigint | The number of partitions in the table. This field is set to NULL for a non-partitioned table. |
detail_view_count | bigint | The number of times that table details are viewed on the page. |
favorite_count | bigint | The number of times that the table is added to favorites. |
Additional table metrics: rpt_v_meta_ind_table_extra
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
table_uuid | string | The table ID. |
read_count | bigint | The number of times that data is read using SQL statements. The data includes that of non-scheduled tasks. |
read_count_30d | bigint | The number of times that data is read within 30 days using SQL statements. The data includes that of non-scheduled tasks. |
write_count | bigint | The number of times that data is written using SQL statements. The data includes that of non-scheduled tasks. |
join_count | bigint | The number of times that the table is joined. |
direct_upstream_count | bigint | The number of parent tables in the lineage. |
direct_downstream_count | bigint | The number of child tables in the lineage. |
output_task_count | bigint | The number of tasks that generate the data in the table. |
Database (MaxCompute project) metadata details: raw_v_meta_database
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
project_id | bigint | The DataWorks workspace ID. |
env_type | bigint | The environment type. Valid values:
|
catalog_name | string | The name of the catalog. This field is set to odps for MaxCompute projects. |
database_name | string | The name of the database or MaxCompute project. |
database_comment | string | The description of the database or MaxCompute project. |
owner_name | string | The name of the owner. |
created_time_ts | bigint | The creation time. The value is a 13-digit timestamp. |
last_modified_time_ts | bigint | The last modification time. The value is a 13-digit timestamp. |
location | string | The storage path of the database. |
extras | string | The additional information about the database, which is a JSON string. If the table preview and table visibility range attributes are configured for a MaxCompute project, you can use the allowDataPreview and projectVisibility keys to obtain the values of the attributes.
|
biz_date | string | The data timestamp. |
Table metadata details: raw_v_meta_table
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
project_id | string | The DataWorks workspace ID. |
table_uuid | string | The table ID. |
table_name | string | The name of the table. |
table_type | string | The type of the table. |
catalog_name | string | The name of the catalog. This field is set to odps for MaxCompute projects. |
database_name | string | The name of the database or MaxCompute project. |
partition_keys | string | The partition keys in the table. Multi-level partition keys are separated by commas (,). This field is set to an empty string for a non-partitioned table. |
table_comment | string | The description of the table. |
table_biz_comment | string | The business description of the table. |
visibility_scope | bigint | The visibility range of the table.
|
owner_name | string | The name of the owner. |
created_time_ts | bigint | The creation time. The value is a 13-digit timestamp. |
last_modified_time_ts | bigint | The time when data was last modified. The value is a 13-digit timestamp. |
last_meta_modified_time_ts | bigint | The time when table metadata was last modified. The value is a 13-digit timestamp. |
location | string | The storage path of the table. |
life_cycle | bigint | The TTL of the table. Unit: days. |
data_size | bigint | The logical storage volume of the table. Unit: bytes. If the table is a partitioned table, this field is set to NULL. You must collect statistics on the storage volume based on the partition list. |
biz_date | string | The data timestamp. |
View metadata details: raw_v_meta_view
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
project_id | string | The DataWorks workspace ID. |
table_uuid | string | The table ID. |
table_name | string | The name of the table. |
catalog_name | string | The name of the catalog. This field is set to odps for MaxCompute projects. |
database_name | string | The name of the database or MaxCompute project. |
table_comment | string | The description of the table. |
table_biz_comment | string | The business description of the table. |
visibility_scope | bigint | The visibility range of the table.
|
owner_name | string | The name of the owner. |
created_time_ts | bigint | The creation time. The value is a 13-digit timestamp. |
last_ddl_time_ts | bigint | The time when the DDL statement of the view was last modified. The value is a 13-digit timestamp. |
view_text | string | The SQL statement that is used to create a view. |
biz_date | string | The data timestamp. |
Column metadata details: raw_v_meta_column
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
project_id | bigint | The DataWorks workspace ID. |
catalog_name | string | The name of the catalog. This field is set to odps for MaxCompute projects. |
database_name | string | The name of the database or MaxCompute project. |
table_name | string | The name of the table. |
column_name | string | The name of the field. |
column_comment | string | The description of the field. |
column_biz_comment | string | The business description of the field. |
column_type | string | The data type of the field. |
column_sequence | bigint | The sequence number of the field, which starts from 1. |
is_partition_key | boolean | Specifies whether the field is a partition key. |
is_primary_key | boolean | Specifies whether the field is a primary key. |
biz_date | string | The data timestamp. |
Partition metadata details: raw_v_meta_partition
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
project_id | bigint | The DataWorks workspace ID. |
catalog_name | string | The name of the catalog. This field is set to odps for MaxCompute projects. |
database_name | string | The name of the database or MaxCompute project. |
table_name | string | The name of the table. |
partition_name | string | The name of the partition. |
size | bigint | The logical size of the partition. Unit: bytes. |
record_number | bigint | The number of records in the partition. |
created_time_ts | bigint | The creation time. The value is a 13-digit timestamp. |
last_modified_time_ts | bigint | The last modification time. The value is a 13-digit timestamp. |
biz_date | string | The data timestamp. |
Table lineage metadata details: raw_v_meta_table_lineage
The lineage feature cannot achieve 100% data integrity and accuracy due to the complexity of SQL statements and code. We recommend that you do not use this feature for the business that has integrity and accuracy requirements.
The table lineage data includes the lineage relationships generated by the MaxCompute compute engine and the lineage relationships generated by batch synchronization tasks.
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
project_id | bigint | The DataWorks workspace ID. |
src_type | string | The type of the source. |
src_data_source_id | string | The source ID. |
src_database | string | The source database. |
src_table | string | The source table. |
dest_type | string | Target Data Source |
dest_data_source_id | string | The destination ID. |
dest_database | string | The destination database. |
dest_table | string | The destination table. |
schedule_task_id | string | The scheduled task ID. |
schedule_instance_id | string | The instance ID of the scheduled task. |
schedule_task_owner | string | The owner of the scheduled task. |
job_start_time_ts | bigint | The start time of the task, which is a 13-digit timestamp. |
job_end_time_ts | bigint | The end time of the task, which is a 13-digit timestamp. |
execute_time | bigint | The time that is consumed to run the task. Unit: seconds. |
input_record_number | bigint | The number of records that are read from the source table. |
biz_date | string | The data timestamp. |
Table output task metadata details: raw_v_meta_table_output
Data Map displays only the tables whose data is generated by ODPS nodes. The metadata includes the tables whose data is generated by ODPS nodes and Data Integration nodes.
The output information is generated based on lineage.
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
project_id | bigint | The ID of the DataWorks workspace in which scheduled tasks are run. |
type | string | The type of the data source. |
data_source_id | string | The data source ID. |
database | string | The database. |
table | string | The name of the table. |
schedule_task_id | string | The scheduled task ID. |
schedule_instance_id | string | The instance ID of the scheduled task. |
schedule_task_owner | string | The owner of the scheduled task. |
job_start_time_ts | bigint | The start time of the task, which is a 13-digit timestamp. |
job_end_time_ts | bigint | The end time of the task, which is a 13-digit timestamp. |
execute_time | bigint | The time that is consumed to run the task. Unit: seconds. |
biz_date | string | The data timestamp. |
Table usage metadata details: raw_v_meta_table_usage
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
project_id | bigint | The ID of the DataWorks workspace in which scheduled tasks are run. |
catalog_name | string | The name of the catalog. This field is set to odps for MaxCompute projects. |
database_name | string | The name of the database or MaxCompute project. |
table_name | string | The name of the table. |
schedule_task_id | string | The scheduled task ID. |
schedule_task_owner | string | The owner of the scheduled task. If the current task is not scheduled in DataWorks, this field is set to NULL. |
job_id | string | The task ID, which may not be the instance ID of the task that is scheduled in DataWorks. You can use this field to count the number of times that data is read from the table and the number of times that data is written to the table. |
op_type | string | The operation type, which can be READ, WRITE, or UNKNOWN. |
extras | string | The additional information, which is a JSON string. If a MaxCompute task performs operations on the table, you can use the `task_name` key to get the name of the MaxCompute task. If the ID of a DataWorks scheduled task is not empty, you can use the `schedule_task_name` attribute to get the name of the scheduled task. For example: |
biz_date | string | The data timestamp. |
Column usage metadata details: raw_v_meta_column_usage
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
project_id | bigint | The ID of the DataWorks workspace in which scheduled tasks are run. |
catalog_name | string | The name of the catalog. This field is set to odps for MaxCompute projects. |
database_name | string | The name of the database or MaxCompute project. |
table_name | string | The name of the table. |
column_name | string | The name of the field. |
schedule_task_id | string | The scheduled task ID. |
schedule_task_owner | string | The owner of the scheduled task. If the current task is not scheduled in DataWorks, this field is set to NULL. |
inst_id | string | The task ID, which may not be the instance ID of the task that is scheduled in DataWorks. |
op_type | string | The operation type, which can be SELECT, JOIN, GROUP BY, or WHERE. |
extras | string | The additional information, which is a JSON string. If a MaxCompute task performs operations on the table, you can use the `task_name` key to get the name of the MaxCompute task. If the ID of a DataWorks scheduled task is not empty, you can use the `schedule_task_name` attribute to get the name of the scheduled task. For example: |
biz_date | string | The data timestamp. |
Table Wiki metadata details: raw_v_meta_biz_table_wiki
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
project_id | bigint | The ID of the DataWorks workspace in which scheduled tasks are run. |
catalog_name | string | The name of the catalog. This field is set to odps for MaxCompute projects. |
database_name | string | The name of the database or MaxCompute project. |
table_name | string | The name of the table. |
version | string | The version number of Wiki. |
operator | string | The final operator, which may be an owner of the table. |
content | string | The content of Wiki, which is written using the Markdown syntax. |
update_time_ts | bigint | The modification time. The value is a 13-digit timestamp. |
biz_date | string | The data timestamp. |
Table join metadata details: raw_v_meta_table_join_map
Field name | Type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
catalog_name | string | The name of the catalog. This field is set to odps for MaxCompute projects. |
database_name | string | The name of the database or MaxCompute project. |
table_name | string | The name of the table. |
column_name | string | The name of the field. |
join_database_name | string | The name of the associated database or MaxCompute project. |
join_table_name | string | The name of the associated table. |
join_column_name | string | The name of the associated field. |
join_type | string | The type of the JOIN operation, which can be left, right, or inner. |
schedule_task_id | string | The scheduled task ID. |
schedule_task_owner | string | The owner of the scheduled task. |
job_id | string | The ID of the task at the engine layer. |
extras | string | The additional information, which is a JSON string. If a MaxCompute task is run to perform operations on a table, you can use the task_name key to obtain the name of the MaxCompute task. |
biz_date | string | The data timestamp. |
Table viewing records: raw_v_meta_table_detail_log
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
catalog_name | string | The name of the catalog. This field is set to odps for MaxCompute projects. |
database_name | string | The name of the database or MaxCompute project. |
table_name | string | The name of the table. |
operator | string | The user who views table details. |
view_time_ts | bigint | The time when table details are viewed. The value is a 13-digit timestamp. |
biz_date | string | The data timestamp. |
Category metadata details: raw_v_meta_category
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
category_id | bigint | The category ID. |
category_name | string | The name of the category. |
category_pid | bigint | The parent category ID. This field is set to 0 or NULL for a level-1 category. |
depth | bigint | The level of the category. This field is set to 1 for a level-1 category. |
sort_field | double | The field based on which the categories are sorted. |
creator_account | string | The account that is used to create the category. |
created_time_ts | bigint | The creation time. The value is a 13-digit timestamp. |
last_modified_time_ts | bigint | The last modification time. The value is a 13-digit timestamp. |
biz_date | string | The data timestamp. |
Scheduling node details: raw_v_schedule_node
Field name | Data type | Description |
tenant_id | bigint | The tenant ID. |
project_id | bigint | The workspace ID. |
node_id | bigint | The node ID. |
node_name | string | The name of the node. |
node_type | bigint | The scheduling type of the node. Valid values:
|
prg_type | bigint | The type of the node. For more information, see Supported node types.
|
flow_id | bigint | The workflow ID. |
project_env | string | The identity of the environment.
|
create_time | bigint | The creation time. The value is a 13-digit timestamp. |
create_user | string | The creator. |
modify_time | bigint | The last modification time. The value is a 13-digit timestamp. |
modify_user | string | The modifier. |
prg_name | string | The name of the node type. |
para_value | string | The execution parameter. |
file_id | bigint | The ID of the file that corresponds to the node. |
file_version | bigint | The version of the file that corresponds to the node. |
owner | string | The owner of the node. |
resgroup_id | bigint | The resource group ID. |
baseline_id | bigint | The baseline ID. |
cycle_type | bigint | Scheduling epoch.
|
repeatable | bigint | Rerun ID
|
connection | string | The connection string of the data source. |
dqc_type | bigint | DQC Type
|
dqc_description | string | DQC rule expression |
task_rerun_time | bigint | The number of times that the task can be rerun. |
task_rerun_interval | bigint | The rerun interval. Unit: milliseconds. |
cron_express | string | The CRON expression that specifies the scheduling frequency of the node. |
priority | bigint | The priority of the task. Valid values: 1, 3, 5, 7, and 8. A larger value indicates a higher priority. |
start_effect_date | bigint | The time when the node takes effect. The value is a 13-digit timestamp. |
end_effect_date | bigint | The time when the node loses effect. The value is a 13-digit timestamp. |
biz_date | string | The data timestamp. |
Scheduling task details: raw_v_schedule_task
Field name | Data type | Description |
tenant_id | bigint | The tenant ID. |
project_id | bigint | The workspace ID. |
node_id | bigint | The node ID. |
node_name | string | The name of the node. |
task_id | bigint | The name of the task. |
dag_id | bigint | The directed acyclic graph (DAG) ID of the workflow. |
task_type | bigint | The scheduling type of the task. Valid values:
|
dag_type | bigint | The type of the DAG. Valid values:
|
prg_type | bigint | The type of the node. For more information, see Supported node types.
|
flow_id | bigint | The workflow ID. |
create_time | bigint | The creation time. The value is a 13-digit timestamp. |
modify_time | bigint | The last modification time. The value is a 13-digit timestamp. |
cycle_time | bigint | The scheduling time, which is a 13-digit timestamp. |
in_group_id | bigint | The serial number of the task. |
prg_name | string | The name of the node type. |
para_value | string | The execution parameter. |
file_id | bigint | The ID of the file that corresponds to the task. |
file_version | bigint | The version of the file that corresponds to the task. |
owner | string | The owner of the node. |
resgroup_id | bigint | The resource group ID. |
baseline_id | bigint | The baseline ID. |
cycle_type | bigint | The scheduling cycle. Valid values:
|
repeatable | bigint | The rerun ID.
|
connection | string | The connection string of the data source. |
dqc_type | bigint | DQC Type
|
dqc_description | string | DQC Rule String |
task_rerun_time | bigint | The number of times that the task can be rerun. |
task_rerun_interval | bigint | The rerun interval. Unit: milliseconds. |
begin_waittime_time | bigint | The time when the task starts to wait for the scheduling time to arrive. The value is a 13-digit timestamp. |
finish_time | bigint | The time when the task finishes running. The value is a 13-digit timestamp. |
begin_waitres_time | bigint | The time when the task starts to wait for resources. The value is a 13-digit timestamp. |
begin_run_time | bigint | The time when the task starts to run. The value is a 13-digit timestamp. |
rerun_times | bigint | The number of times that the task is rerun. |
priority | bigint | The priority of the task. Valid values: 1, 3, 5, 7, and 8. A larger value indicates a higher priority. |
task_key | string | The unique identifier of the task. |
error_msg | string | The reason why the task fails to run. |
status | bigint | The status of the task. Valid values:
|
biz_date | string | The data timestamp. |
Scheduling node relationships: raw_v_schedule_node_relation
Field name | Data type | Description |
tenant_id | bigint | The tenant ID. |
child_node_id | bigint | The descendant node ID. |
parent_node_id | bigint | The ancestor node ID. |
step_type | bigint | The dependency type. Valid values:
|
child_flow_id | bigint | The workflow ID. |
project_env | string | Environment Identity
|
create_time | bigint | The creation time. The value is a 13-digit timestamp. |
create_user | string | The creator. |
modify_time | bigint | The last modification time. The value is a 13-digit timestamp. |
modify_user | string | The modifier. |
biz_date | string | The data timestamp. |
Relationships between instances of the scheduling task: raw_v_schedule_task_relation
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
child_task_id | bigint | The instance ID of the downstream task. |
parent_task_id | bigint | The instance ID of the upstream task. |
child_project_id | bigint | The ID of the workspace to which the instance of the downstream task belongs. |
parent_project_id | bigint | The ID of the workspace to which the instance of the upstream task belongs. |
step_type | bigint | The dependency type.
|
daily_dag_id | bigint | The ID of the global DAG. |
child_dag_inst_id | bigint | The ID of the local DAG. |
biz_date | string | The data timestamp. |
Resource group for Data Integration details: raw_v_schedule_di_resgroup
Field name | Data type | Description |
tenant_id | bigint | The tenant ID. |
project_id | bigint | The workspace ID. |
node_id | bigint | The node ID. |
project_env | string | The environment of the workspace. |
res_group_identifier | string | The ID of the resource group for Data Integration. |
src_type | string | The type of the source. |
dst_type | string | The type of the destination. |
src_datasource | string | The source. |
dst_datasource | string | The destination. |
config_concurrent | bigint | The number of parallel threads. |
biz_date | string | The data timestamp. |
Information about resource groups in a tenant (including resource groups for scheduling, resource groups for Data Integration, and MaxCompute quota groups): raw_v_tenant_res_group
Field name | Data type | Description |
tenant_id | bigint | The tenant ID. |
res_group_id | bigint | The resource group ID. |
res_group_identifier | string | The identifier of the resource group. |
res_group_type | bigint | The type of the resource group. Valid values:
|
res_group_mode | bigint | Resource Group Type
|
status | bigint | The status of the resource group. Valid values:
|
biz_ext_key | string | The extension field of the resource group. The value single indicates an exclusive resource group. |
biz_date | string | The data timestamp. |
Information about users in a tenant: raw_v_tenant_user
Field name | Data type | Description |
tenant_id | bigint | The tenant ID. |
yun_account | string | The Alibaba Cloud account. |
account_name | string | The name of the account. |
nick | string | The display name of the account. |
full_yun_account | string | The Alibaba Cloud account that contains the account provider information. |
biz_date | string | The data timestamp. |
Information about workspaces in a tenant: raw_v_tenant_workspace
Field name | Data type | Description |
tenant_id | bigint | The tenant ID. |
project_id | bigint | The workspace ID. |
project_name | string | The name of the workspace. |
project_identifier | string | The identifier of the workspace. |
project_desc | string | The description of the workspace. |
project_owner | string | The owner of the workspace. |
status | bigint | The status of the workspace. Valid values:
|
biz_date | string | The data timestamp. |
Information about users in a workspace in a tenant: raw_v_tenant_workspace_user
Field name | Data type | Description |
tenant_id | bigint | The DataWorks tenant ID. |
project_id | bigint | The DataWorks workspace ID. |
base_id | string | The base ID of the user. |
status | bigint | The status of the user. Valid values:
|
gmt_create_ts | bigint | The creation time. The value is a 13-digit timestamp. |
gmt_modified_ts | bigint | The modification time. The value is a 13-digit timestamp. |
biz_date | string | The data timestamp. |