DataWorks Open Data provides tables and views across multiple dimensions for collecting metadata. This topic lists the available tables and views in Open Data and describes their structure.
DataWorks has introduced a new open data feature to easily browse and manage metadata through a visual interface. This document is being deprecated. We recommend using the official version. For details on the table structure, see Table structure details of open data.
By default, only metadata from the MaxCompute engine is included.
-
Meta metadata
-
RPT metrics
-
Raw details
-
-
Scheduling metadata
-
Tenant metadata
Core table metrics: rpt_v_meta_ind_table_core
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
project_id |
bigint |
The DataWorks workspace ID. |
|
catalog_name |
string |
The name of the catalog. For MaxCompute projects, the value is |
|
database_name |
string |
The name of the database or MaxCompute project. |
|
table_name |
string |
The name of the table. |
|
table_uuid |
string |
The table's unique identifier. |
|
owner_yun_acct |
string |
The table owner's Alibaba Cloud account. |
|
dim_life_cycle |
bigint |
The table's time to live (TTL), in days.
|
|
is_partition_table |
boolean |
Whether the table is a partitioned table.
|
|
entity_type |
bigint |
The entity type. Valid values:
|
|
categories |
string |
The categories assigned to the table. |
|
last_access_time |
bigint |
The last time the table was accessed, specified as a 10-digit UNIX timestamp. |
|
size |
bigint |
The table's logical storage size in bytes. This field is |
|
column_count |
bigint |
The number of fields in the table, including partition key columns. |
|
partition_count |
bigint |
The number of partitions in the table. This value is |
|
detail_view_count |
bigint |
The number of times the table's details page has been viewed. |
|
favorite_count |
bigint |
The number of times the table has been favorited. |
Table additional metrics: rpt_v_meta_ind_table_extra
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
table_uuid |
string |
The unique identifier for the table. |
|
read_count |
bigint |
The number of SQL read operations, including those from non-scheduled tasks. |
|
read_count_30d |
bigint |
The number of SQL read operations in the last 30 days, including those from non-scheduled tasks. |
|
write_count |
bigint |
The number of SQL write operations, including those from non-scheduled tasks. |
|
join_count |
bigint |
The number of join operations that include this table. |
|
direct_upstream_count |
bigint |
The number of direct upstream tables in the lineage. |
|
direct_downstream_count |
bigint |
The number of direct downstream tables in the lineage. |
|
output_task_count |
bigint |
The number of tasks for which this table is an output. |
Database (MaxCompute project) metadata: raw_v_meta_database
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
project_id |
bigint |
The DataWorks workspace ID. |
|
env_type |
bigint |
The environment type. Valid values:
|
|
catalog_name |
string |
The name of the catalog. For a MaxCompute project, the value is |
|
database_name |
string |
The name of the database or MaxCompute project. |
|
database_comment |
string |
The description of the database or MaxCompute project. |
|
owner_name |
string |
The name of the database or MaxCompute project owner. |
|
created_time_ts |
bigint |
The creation time, as a 13-digit Unix timestamp in milliseconds. |
|
last_modified_time_ts |
bigint |
The last modification time, as a 13-digit Unix timestamp in milliseconds. |
|
location |
string |
The storage path of the database. |
|
extras |
string |
Additional database properties, formatted as a JSON string. When table preview and visibility settings are configured for a MaxCompute project, this field contains the
|
|
biz_date |
string |
The business date. |
Table metadata: raw_v_meta_table
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The ID of the DataWorks tenant. |
|
project_id |
string |
The ID of the DataWorks workspace. |
|
table_uuid |
string |
The unique identifier of the table. |
|
table_name |
string |
The name of the table. |
|
table_type |
string |
The type of the table. |
|
catalog_name |
string |
The catalog that contains the table. For MaxCompute projects, the value is |
|
database_name |
string |
The name of the database or MaxCompute project. |
|
partition_keys |
string |
The partition keys of the table. For multi-level partitions, keys are separated by a comma ( |
|
table_comment |
string |
The description of the table. |
|
table_biz_comment |
string |
The business description of the table. |
|
visibility_scope |
bigint |
Specifies the table's visibility. Valid values:
|
|
owner_name |
string |
The name of the owner. |
|
created_time_ts |
bigint |
The creation time of the table, as a 13-digit millisecond timestamp. |
|
last_modified_time_ts |
bigint |
The last modification time of the table data, as a 13-digit millisecond timestamp. |
|
last_meta_modified_time_ts |
bigint |
The last modification time of the table metadata, as a 13-digit millisecond timestamp. |
|
location |
string |
The storage path of the table. |
|
life_cycle |
bigint |
The lifecycle of the table, in days. |
|
data_size |
bigint |
The logical storage size of the table, in bytes. For a partitioned table, this value is NULL. The total size is the sum of its partition sizes. |
|
biz_date |
string |
The business date of the data. |
View metadata: raw_v_meta_view
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
project_id |
string |
The DataWorks workspace ID. |
|
table_uuid |
string |
The unique identifier for the view. |
|
table_name |
string |
The name of the view. |
|
catalog_name |
string |
The name of the catalog. For MaxCompute workspaces, the value is ODPS. |
|
database_name |
string |
The name of the database or ODPS workspace. |
|
table_comment |
string |
The description of the view. |
|
table_biz_comment |
string |
The business description of the view. |
|
visibility_scope |
bigint |
The visibility scope of the view. Valid values:
|
|
owner_name |
string |
The name of the owner. |
|
created_time_ts |
bigint |
The creation time of the view, expressed as a 13-digit Unix timestamp in milliseconds. |
|
last_ddl_time_ts |
bigint |
The time of the view's last Data Definition Language (DDL) modification, expressed as a 13-digit Unix timestamp in milliseconds. |
|
view_text |
string |
The SQL statement used to create the view. |
|
biz_date |
string |
The business date. |
Column metadata: raw_v_meta_column
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
project_id |
bigint |
The DataWorks workspace ID. |
|
catalog_name |
string |
The name of the catalog. This field is set to odps for MaxCompute projects. |
|
database_name |
string |
The name of the database or MaxCompute project. |
|
table_name |
string |
The name of the table. |
|
column_name |
string |
The name of the column. |
|
column_comment |
string |
The description of the column. |
|
column_biz_comment |
string |
The business description of the column. |
|
column_type |
string |
The data type of the column. |
|
column_sequence |
bigint |
The 1-based sequence number of the column. |
|
is_partition_key |
boolean |
Indicates whether the column is a partition key. |
|
is_primary_key |
boolean |
Indicates whether the column is a primary key. |
|
biz_date |
string |
The business date. |
Partition metadata: raw_v_meta_partition
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
project_id |
bigint |
The DataWorks workspace ID. |
|
catalog_name |
string |
The name of the catalog. For MaxCompute projects, the value is ODPS. |
|
database_name |
string |
The name of the database or MaxCompute project. |
|
table_name |
string |
The name of the table. |
|
partition_name |
string |
The name of the partition. |
|
size |
bigint |
The logical size of the partition, in bytes. |
|
record_number |
bigint |
The number of records in the partition. |
|
created_time_ts |
bigint |
The creation time, as a 13-digit timestamp. |
|
last_modified_time_ts |
bigint |
The last modified time, as a 13-digit timestamp. |
|
biz_date |
string |
The business date. |
Table lineage: raw_v_meta_table_lineage
-
Due to the inherent complexity of SQL and user code, the table lineage feature cannot guarantee 100% completeness or accuracy. Do not use this feature for workloads that require complete and correct lineage.
-
Table lineage data includes lineage relationships generated by the MaxCompute engine and Data Integration batch synchronization tasks.
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
project_id |
bigint |
The DataWorks workspace ID. |
|
src_type |
string |
The type of the source data source. |
|
src_data_source_id |
string |
The ID of the source data source. |
|
src_database |
string |
The source database. |
|
src_table |
string |
The source table. |
|
dest_type |
string |
The type of the destination data source. |
|
dest_data_source_id |
string |
The ID of the destination data source. |
|
dest_database |
string |
The destination database. |
|
dest_table |
string |
The destination table. |
|
schedule_task_id |
string |
The ID of the scheduled task. |
|
schedule_instance_id |
string |
The ID of the scheduled task instance. |
|
schedule_task_owner |
string |
The owner of the scheduled task. |
|
job_start_time_ts |
bigint |
The start time of the job (13-digit Unix timestamp). |
|
job_end_time_ts |
bigint |
The end time of the job (13-digit Unix timestamp). |
|
execute_time |
bigint |
The job execution duration, in seconds. |
|
input_record_number |
bigint |
The number of records read from the source table. |
|
biz_date |
string |
The business date. |
Table output task metadata: raw_v_meta_table_output
While the Data Map page shows output tasks for ODPS tables only, this view includes all table types supported by data lineage.
Output information is derived from lineage.
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
project_id |
bigint |
The ID of the DataWorks workspace where the scheduled task runs. |
|
type |
string |
The type of the data source. |
|
data_source_id |
string |
The unique ID of the data source. |
|
database |
string |
The name of the database. |
|
table |
string |
The name of the table. |
|
schedule_task_id |
string |
The unique ID of the scheduled task. |
|
schedule_instance_id |
string |
The instance ID of the scheduled task. |
|
schedule_task_owner |
string |
The owner of the scheduled task. |
|
job_start_time_ts |
bigint |
The task start time, as a 13-digit Unix timestamp in milliseconds. |
|
job_end_time_ts |
bigint |
The task end time, as a 13-digit Unix timestamp in milliseconds. |
|
execute_time |
bigint |
The execution time in seconds. |
|
biz_date |
string |
The business date. |
Table usage metadata: raw_v_meta_table_usage
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
project_id |
bigint |
The ID of the DataWorks workspace where the scheduled task runs. |
|
catalog_name |
string |
The name of the catalog. For MaxCompute projects, this field is set to odps. |
|
database_name |
string |
The name of the database or MaxCompute project. |
|
table_name |
string |
The name of the table. |
|
schedule_task_id |
string |
The scheduled task ID. |
|
schedule_task_owner |
string |
The owner of the scheduled task. This field is NULL if the task is not scheduled by DataWorks. |
|
job_id |
string |
The task identifier. This is not always the ID of a DataWorks scheduled task instance. You can use this field to count table read and write operations. |
|
op_type |
string |
The operation type. Valid values are |
|
extras |
string |
Additional information formatted as a JSON string. If a MaxCompute task performs the table operation, you can get the task name from the task_name key. If schedule_task_id is not null, you can get the scheduled task name from the schedule_task_name key. For example: |
|
biz_date |
string |
The business date. |
Column usage metadata: raw_v_meta_column_usage
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The ID of the DataWorks tenant. |
|
project_id |
bigint |
The ID of the DataWorks workspace in which the scheduled task runs. |
|
catalog_name |
string |
The name of the catalog. For MaxCompute projects, the value is |
|
database_name |
string |
The name of the database or MaxCompute project. |
|
table_name |
string |
The name of the table. |
|
column_name |
string |
The field name. |
|
schedule_task_id |
string |
The ID of the scheduled task. |
|
schedule_task_owner |
string |
The owner of the scheduled task. This is NULL if the task is not scheduled by DataWorks. |
|
inst_id |
string |
The task identifier. This is not necessarily the instance ID of a DataWorks scheduled task. |
|
op_type |
string |
The type of operation, such as |
|
extras |
string |
Additional information, stored as a JSON string. If a MaxCompute task performs the table operation, you can get the task name from the task_name key. If schedule_task_id is not null, you can get the scheduled task name from the schedule_task_name key. For example: |
|
biz_date |
string |
The data timestamp. |
Table wiki metadata: raw_v_meta_biz_table_wiki
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
project_id |
bigint |
The ID of the DataWorks workspace where scheduled tasks run. |
|
catalog_name |
string |
The name of the catalog. For MaxCompute projects, the value is odps. |
|
database_name |
string |
The name of the database or MaxCompute project. |
|
table_name |
string |
The name of the table. |
|
version |
string |
The version number of the wiki. |
|
operator |
string |
The user who last modified the wiki, possibly a previous table owner. |
|
content |
string |
The wiki content in Markdown format. |
|
update_time_ts |
bigint |
The last modified time of the wiki, as a 13-digit Unix timestamp in milliseconds. |
|
biz_date |
string |
The business date. |
Table join metadata: raw_v_meta_table_join_map
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
catalog_name |
string |
The name of the catalog. For MaxCompute projects, the value is |
|
database_name |
string |
The name of the database or MaxCompute project. |
|
table_name |
string |
The name of the table. |
|
column_name |
string |
The name of the column. |
|
join_database_name |
string |
The name of the joined database or MaxCompute project. |
|
join_table_name |
string |
The name of the joined table. |
|
join_column_name |
string |
The name of the joined column. |
|
join_type |
string |
The type of the JOIN operation. Valid values include |
|
schedule_task_id |
string |
The scheduled task ID. |
|
schedule_task_owner |
string |
The owner of the scheduled task. |
|
job_id |
string |
The identifier of the task at the engine layer. |
|
extras |
string |
Additional information, stored as a JSON string. For example, when a MaxCompute task operates on the table, you can use the |
|
biz_date |
string |
The business date. |
Table detail view: raw_v_meta_table_detail_log
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
catalog_name |
string |
The name of the catalog. This field is set to ODPS for MaxCompute projects. |
|
database_name |
string |
The name of the database or MaxCompute project. |
|
table_name |
string |
The name of the table. |
|
operator |
string |
The user who viewed the table details. |
|
view_time_ts |
bigint |
The time the table details were viewed, a 13-digit Unix timestamp in milliseconds. |
|
biz_date |
string |
The business date. |
Metadata category: raw_v_meta_category
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
category_id |
bigint |
The category ID. |
|
category_name |
string |
The category name. |
|
category_pid |
bigint |
The parent category ID. A value of 0 or NULL indicates a level-1 category. |
|
depth |
bigint |
The category level (depth) in the hierarchy. A value of 1 indicates a level-1 category. |
|
sort_field |
double |
The sort field. |
|
creator_account |
string |
The creator account. |
|
created_time_ts |
bigint |
The creation timestamp, a 13-digit Unix timestamp in milliseconds. |
|
last_modified_time_ts |
bigint |
The last modification timestamp, a 13-digit Unix timestamp in milliseconds. |
|
biz_date |
string |
The business date. |
Scheduling node: raw_v_schedule_node
|
Parameter |
Type |
Description |
|
|
|
The tenant ID. |
|
|
|
The workspace ID. |
|
|
|
The node ID. |
|
|
|
The node name. |
|
|
|
The node scheduling type. Valid values:
|
|
|
|
The node type. For more information, see Supported node types.
|
|
|
|
The workflow ID. |
|
|
|
The environment type. Valid values:
|
|
|
|
The time the node was created, specified as a 13-digit timestamp. |
|
|
|
The creator. |
|
|
|
The time the node was last modified, specified as a 13-digit timestamp. |
|
|
|
The user who last modified the node. |
|
|
|
The node type name. |
|
|
|
The execution parameters. |
|
|
|
The corresponding file ID. |
|
|
|
The corresponding file version. |
|
|
|
The node owner. |
|
|
|
The resource group ID. |
|
|
|
The baseline ID. |
|
|
|
Specifies the scheduling cycle. Valid values:
|
|
|
|
The rerun policy. Valid values:
|
|
|
|
The data source connection string. |
|
|
|
Specifies if the node is associated with a Data Quality rule. Valid values:
|
|
|
|
The filter expression for the associated Data Quality rule. |
|
|
|
The maximum number of rerun attempts. |
|
|
|
The rerun interval, in milliseconds. |
|
cron_express |
|
The CRON expression that defines the node's schedule. |
|
|
|
The task priority. Valid values: 1, 3, 5, 7, and 8. A larger value indicates a higher priority. |
|
|
|
The date when the node's schedule becomes active, specified as a 13-digit timestamp. |
|
|
|
The date when the node's schedule becomes inactive, specified as a 13-digit timestamp. |
|
|
|
The business date to which the data pertains. |
Scheduling task: raw_v_schedule_task
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The tenant ID. |
|
project_id |
bigint |
The workspace ID. |
|
node_id |
bigint |
The node ID. |
|
node_name |
string |
The node name. |
|
task_id |
bigint |
The task ID. |
|
dag_id |
bigint |
The DAG ID. |
|
task_type |
bigint |
The task scheduling type. Valid values:
|
|
dag_type |
bigint |
The DAG type. Valid values:
|
|
prg_type |
bigint |
The node type. For more information, see Supported node types.
|
|
flow_id |
bigint |
The workflow ID. |
|
create_time |
bigint |
The creation time, as a 13-digit Unix timestamp in milliseconds. |
|
modify_time |
bigint |
The last modification time, as a 13-digit Unix timestamp in milliseconds. |
|
cycle_time |
bigint |
The scheduled run time, as a 13-digit Unix timestamp in milliseconds. |
|
in_group_id |
bigint |
The serial number of the task within its group. |
|
prg_name |
string |
The node type name. |
|
para_value |
string |
The task's execution parameters. |
|
file_id |
bigint |
The ID of the file associated with the node. |
|
file_version |
bigint |
The version of the file associated with the node. |
|
owner |
string |
The node owner. |
|
resgroup_id |
bigint |
The resource group ID. |
|
baseline_id |
bigint |
The baseline ID. |
|
cycle_type |
bigint |
The scheduling cycle. Valid values:
|
|
repeatable |
bigint |
The rerun mode. Valid values:
|
|
connection |
string |
The data source connection string. |
|
dqc_type |
bigint |
Indicates whether the task is associated with a Data Quality rule. Valid values:
|
|
dqc_description |
string |
The Data Quality rule string. |
|
task_rerun_time |
bigint |
The maximum number of times the task can be rerun. |
|
task_rerun_interval |
bigint |
The interval between reruns, in milliseconds. |
|
begin_waittime_time |
bigint |
The time the task began waiting for its schedule, as a 13-digit Unix timestamp in milliseconds. |
|
finish_time |
bigint |
The time the task completed, as a 13-digit Unix timestamp in milliseconds. |
|
begin_waitres_time |
bigint |
The time the task began waiting for resources, as a 13-digit Unix timestamp in milliseconds. |
|
begin_run_time |
bigint |
The time the task began running, as a 13-digit Unix timestamp in milliseconds. |
|
rerun_times |
bigint |
The number of times the task was rerun. |
|
priority |
bigint |
The task priority. Valid values: 1, 3, 5, 7, and 8. A larger value indicates a higher priority. |
|
task_key |
string |
The task's unique identifier. |
|
error_msg |
string |
The error message if the task failed. |
|
status |
bigint |
The task status. Valid values:
|
|
biz_date |
string |
The business date. |
Scheduling node relationships: raw_v_schedule_node_relation
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The tenant ID. |
|
child_node_id |
bigint |
The downstream node ID. |
|
parent_node_id |
bigint |
The upstream node ID. |
|
step_type |
bigint |
The dependency type. Valid values:
|
|
child_flow_id |
bigint |
The workflow ID. |
|
project_env |
string |
The environment type. Valid values:
|
|
create_time |
bigint |
The creation timestamp, a 13-digit Unix timestamp in milliseconds. |
|
create_user |
string |
The creator. |
|
modify_time |
bigint |
The last modification time, a 13-digit Unix timestamp in milliseconds. |
|
modify_user |
string |
The modifier. |
|
biz_date |
string |
The business date. |
Scheduling task instance relationships: raw_v_schedule_task_relation
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
Tenant ID. |
|
child_task_id |
bigint |
Instance ID of the downstream task. |
|
parent_task_id |
bigint |
Instance ID of the upstream task. |
|
child_project_id |
bigint |
Workspace ID for the downstream task instance. |
|
parent_project_id |
bigint |
Workspace ID for the upstream task instance. |
|
step_type |
bigint |
Dependency type. Valid values:
|
|
daily_dag_id |
bigint |
ID of the global DAG. |
|
child_dag_inst_id |
bigint |
ID of the local DAG. |
|
biz_date |
string |
Business date. |
Schedule data integration resource group: raw_v_schedule_di_resgroup
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The tenant ID. |
|
project_id |
bigint |
The workspace ID. |
|
node_id |
bigint |
The node ID. |
|
project_env |
string |
The workspace environment. |
|
res_group_identifier |
string |
The Data Integration resource group identifier. |
|
src_type |
string |
The source data source type. |
|
dst_type |
string |
The destination data source type. |
|
src_datasource |
string |
The source data source. |
|
dst_datasource |
string |
The destination data source. |
|
config_concurrent |
bigint |
The concurrency level. |
|
biz_date |
string |
The business date. |
Tenant resource groups: raw_v_tenant_res_group
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The tenant ID. |
|
res_group_id |
bigint |
The resource group ID. |
|
res_group_identifier |
string |
The resource group identifier. |
|
res_group_type |
bigint |
The type of the resource group. Valid values:
|
|
res_group_mode |
bigint |
The billing method of the resource group. Valid values:
|
|
status |
bigint |
The status of the resource group. Valid values:
|
|
biz_ext_key |
string |
An extension field for the resource group. A value of 'single' specifies an exclusive resource group. |
|
biz_date |
string |
The data timestamp. |
Tenant user information: raw_v_tenant_user
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The tenant ID. |
|
yun_account |
string |
Alibaba Cloud account |
|
account_name |
string |
The account name. |
|
nick |
string |
The display name. |
|
full_yun_account |
string |
The complete Alibaba Cloud account name, including provider details. |
|
biz_date |
string |
The business date. |
Tenant workspace: raw_v_tenant_workspace
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The tenant ID. |
|
project_id |
bigint |
The ID of the workspace. |
|
project_name |
string |
The workspace name. |
|
project_identifier |
string |
The workspace identifier. |
|
project_desc |
string |
The workspace description. |
|
project_owner |
string |
The workspace owner. |
|
status |
bigint |
The workspace status. Valid values:
|
|
biz_date |
string |
The business date. |
raw_v_tenant_workspace_user
|
Parameter |
Type |
Description |
|
tenant_id |
bigint |
The DataWorks tenant ID. |
|
project_id |
bigint |
The DataWorks workspace ID. |
|
base_id |
string |
The user's base ID. |
|
status |
bigint |
The user status. Valid values:
|
|
gmt_create_ts |
bigint |
The creation time, a 13-digit Unix timestamp in milliseconds. |
|
gmt_modified_ts |
bigint |
The modification time, a 13-digit Unix timestamp in milliseconds. |
|
biz_date |
string |
The business date. |