After the data opening package of DataWorks is installed, you can use the data opening feature to collect metadata in DataWorks and use the metadata for subsequent data governance and O&M. This topic describes the scenarios of the data opening feature and commands required to use the feature.
Prerequisites
The data opening package is installed. For more information, see Install and remove the data opening package.
Instructions
The following sections describe the commands that are used to obtain various types of metadata from a MaxCompute node of DataWorks. Before you use these commands, go to the code editing page of the MaxCompute node.
View the created databases in a MaxCompute project
SELECT * FROM u_meta_hangzhou.raw_v_meta_database_Version('Business date');
In the preceding command:- Version: specifies the version of the data opening package that you install and use. The format of the version is similar to v1_1. You can run the DESCRIBE PACKAGE command to query the actual version of the data opening package that you use. For more information, see View the tables or views provided by the data opening package.
- Business date: represents the specific business date on which you want to view the metadata information. The date is presented in the yyyymmdd format. If you do not specify a specific business date, the metadata information on all business dates is queried.
-- Query the databases created on January 9, 2021.
SELECT * FROM u_meta_hangzhou.raw_v_meta_database_v1_1('20210109');
The following figure shows the query result.For more information about the fields in the query result, see Metrics in the raw_v_meta_database table.
View the tables in a MaxCompute project
SELECT *
FROM u_meta_hangzhou.raw_v_meta_table_Version('Business date')
WHERE catalog_name = 'your_catalog_name'
AND database_name = 'your_database_name'
AND table_name = 'your_table_name'
;
In the preceding command:- Version: specifies the version of the data opening package that you install and use. The format of the version is similar to v1_1. You can run the DESCRIBE PACKAGE command to query the actual version of the data opening package that you use. For more information, see View the tables or views provided by the data opening package.
- Business date: represents the specific business date on which you want to view the metadata information. The date is presented in the yyyymmdd format. If you do not specify a specific business date, the metadata information on all business dates is queried.
- your_catalog_name: specifies the computing engine whose metadata you want to view. Set the value to odps.
- your_database_name: specifies the name of the project in which you want to view metadata information. You can specify this parameter based on your business requirements.
- your_table_name: specifies the name of the table whose metadata information you want to view. You can specify this parameter based on your business requirements.
-- Query the data of the ods_user_info_d table under the MaxCompute project isv2 on January 9, 2021.
SELECT *
FROM u_meta_hangzhou.raw_v_meta_table_v1_1('20210109')
WHERE catalog_name = 'odps'
AND database_name = 'isv2'
AND table_name = 'ods_user_info_d'
;
The following figure shows the query result.For more information about the fields in the query result, see Metrics in the raw_v_meta_table table.
View the statistical information of a table
SELECT c.tenant_id, c.table_uuid, c.dim_life_cycle, c.is_partition_table, c.entity_type, c.categories, c.last_access_time, c.partition_count, c.favorite_count, e.output_task_count
FROM u_meta_hangzhou.rpt_v_meta_ind_table_core_Version('Business date') c
LEFT OUTER JOIN u_meta_hangzhou.rpt_v_meta_ind_table_extra_Version('Business date') e
ON c.table_uuid = e.table_uuid AND c.tenant_id = e.tenant_id
WHERE c.catalog_name = 'your_catalog_name'
AND c.database_name = 'your_database_name'
AND c.table_name = 'your_table_name'
;
In the preceding command:- Version: specifies the version of the data opening package that you install and use. The format of the version is similar to v1_1. You can run the DESCRIBE PACKAGE command to query the actual version of the data opening package that you use. For more information, see View the tables or views provided by the data opening package.
- Business date: represents the specific business date on which you want to view the metadata information. The date is presented in the yyyymmdd format. If you do not specify a specific business date, the metadata information on all business dates is queried.
- your_catalog_name: specifies the computing engine whose metadata you want to view. Set the value to odps.
- your_database_name: specifies the name of the project in which you want to view metadata information. You can specify this parameter based on your business requirements.
- your_table_name: specifies the name of the table whose metadata information you want to view. You can specify this parameter based on your business requirements.
-- Query the statistical information of the ods_user_info_d table under the MaxCompute project isv2 on January 9, 2021.
SELECT c.tenant_id, c.table_uuid, c.dim_life_cycle, c.is_partition_table, c.entity_type, c.categories, c.last_access_time, c.partition_count, c.favorite_count, e.output_task_count
FROM u_meta_hangzhou.rpt_v_meta_ind_table_core_v1_1('20210109') c
LEFT OUTER JOIN u_meta_hangzhou.rpt_v_meta_ind_table_extra_v1_1('20210109') e
ON c.table_uuid = e.table_uuid AND c.tenant_id = e.tenant_id
WHERE c.catalog_name = 'odps'
AND c.database_name = 'isv2'
AND c.table_name = 'ods_user_info_d'
;
The following figure shows the query result.For more information about the fields in the query result, see Core metrics in the rpt_v_meta_ind_table_core table and Additional metrics in the rpt_v_meta_ind_table_extra table.
View the node details of an output table
SELECT s.*, o.schedule_instance_id, execute_time
FROM u_meta_hangzhou.raw_v_meta_table_output_Version('Business date') o
LEFT OUTER JOIN u_meta_hangzhou.raw_v_schedule_node_Version('Business date') s
ON o.schedule_task_id = s.node_id
WHERE o.type = 'your_table_type'
AND o.database = 'your_database_name'
AND o.table = 'your_table_name'
AND s.project_env = 'your_project_environment'
;
In the preceding command:- Version: specifies the version of the data opening package that you install and use. The format of the version is similar to v1_1. You can run the DESCRIBE PACKAGE command to query the actual version of the data opening package that you use. For more information, see View the tables or views provided by the data opening package.
- Business date: represents the specific business date on which you want to view the metadata information. The date is presented in the yyyymmdd format. If you do not specify a specific business date, the metadata information on all business dates is queried.
- your_table_type: specifies the type of the output table whose metadata you want to view. You can specify only the MaxCompute type. Set the value to odps.
- your_database_name: specifies the name of the project in which you want to view metadata information. You can specify this parameter based on your business requirements.
- your_table_name: specifies the name of the table whose metadata information you want to view. You can specify this parameter based on your business requirements.
- your_project_environment: specifies the environment of the MaxCompute project whose metadata you want to view. If you want to view the metadata of a MaxCompute project in a development environment, set the value to DEV. If you want to view the metadata of a MaxCompute project in a production environment, set the value to PROD.
-- Query the details of the MaxCompute table ods_user_info_d under the xc_simple_e1 project on January 9, 2021.
SELECT s.*, o.schedule_instance_id, execute_time
FROM u_meta_hangzhou.raw_v_meta_table_output_v1_1('20210109') o
LEFT OUTER JOIN u_meta_hangzhou.raw_v_schedule_node_v1_1('20210109') s
ON o.schedule_task_id = s.node_id
WHERE o.type = 'odps'
AND o.database = 'xc_simple_e1'
AND o.table = 'ods_user_info_d'
AND s.project_env = 'PROD'
;
The following figure shows the query result.For more information about the fields in the query result, see Metrics in the raw_v_meta_table_output table and Metrics in the raw_v_schedule_node table.
View the ancestor and descendant nodes of a node
-- Query the ancestor nodes of a node.
SELECT *
FROM u_meta_hangzhou.raw_v_schedule_node_Version('Business date') t
WHERE t.project_env = 'your_project_environment'
AND t.node_id IN (
SELECT parent_node_id
FROM u_meta_hangzhou.raw_v_schedule_node_relation_Version('Business date') r
WHERE r.child_node_id = your_child_node_id
AND r.project_env = 'your_project_environment'
)
;
-- Query the descendant nodes of a node.
SELECT *
FROM u_meta_hangzhou.raw_v_schedule_node_Version('Business date') t
WHERE t.project_env = 'your_project_environment'
AND t.node_id IN (
SELECT child_node_id
FROM u_meta_hangzhou.raw_v_schedule_node_relation_Version('Business date') r
WHERE r.child_node_id = your_child_node_id
AND r.project_env = 'your_project_environment'
)
;
In the preceding command:- Version: specifies the version of the data opening package that you install and use. The format of the version is similar to v1_1. You can run the DESCRIBE PACKAGE command to query the actual version of the data opening package that you use. For more information, see View the tables or views provided by the data opening package.
- Business date: represents the specific business date on which you want to view the metadata information. The date is presented in the yyyymmdd format. If you do not specify a specific business date, the metadata information on all business dates is queried.
- your_project_environment: specifies the environment of the MaxCompute project whose metadata you want to view. If you want to view the metadata of a MaxCompute project in a development environment, set the value to DEV. If you want to view the metadata of a MaxCompute project in a production environment, set the value to PROD.
- your_child_node_id: specifies the ID of the node whose metadata you want to view.
Query the owner details of a table or node
- Query the owner details of a table.
In the preceding command:SELECT c.catalog_name, c.database_name, c.table_name, c.owner_name, u.account_name, u.nick FROM u_meta_hangzhou.raw_v_meta_table_Version('Business date') c LEFT OUTER JOIN u_meta_hangzhou.raw_v_tenant_user_Version('Business date') u ON c.owner_name = TOLOWER(u.yun_account) WHERE c.catalog_name = 'your_catalog_name' AND c.database_name = 'your_database_name' AND c.table_name = 'your_table_name' ;
- Version: specifies the version of the data opening package that you install and use. The format of the version is similar to v1_1. You can run the DESCRIBE PACKAGE command to query the actual version of the data opening package that you use. For more information, see View the tables or views provided by the data opening package.
- Business date: represents the specific business date on which you want to view the metadata information. The date is presented in the yyyymmdd format. If you do not specify a specific business date, the metadata information on all business dates is queried.
- your_catalog_name: specifies the computing engine whose metadata you want to view. Set the value to odps.
- your_database_name: specifies the name of the project in which you want to view metadata information. You can specify this parameter based on your business requirements.
- your_table_name: specifies the name of the table whose metadata information you want to view. You can specify this parameter based on your business requirements.
- Query the owner details of a node.
In the preceding command:SELECT t.project_id, t.node_id, t.node_name, t.create_user, u.account_name AS create_user_name, u.nick as create_user_nick, t.modify_user, m.account_name AS modify_user_name, m.nick as modify_user_nick FROM u_meta_hangzhou.raw_v_schedule_node_Version('Business date') t LEFT OUTER JOIN u_meta_hangzhou.raw_v_tenant_user_Version('Business date') u ON t.create_user = u.yun_account LEFT OUTER JOIN u_meta_hangzhou.raw_v_tenant_user_Version('Business date') m ON t.modify_user = m.yun_account WHERE t.node_id = your_node_id AND t.project_env = 'your_project_environment' ;
- Version: specifies the version of the data opening package that you install and use. The format of the version is similar to v1_1. You can run the DESCRIBE PACKAGE command to query the actual version of the data opening package that you use. For more information, see View the tables or views provided by the data opening package.
- Business date: represents the specific business date on which you want to view the metadata information. The date is presented in the yyyymmdd format. If you do not specify a specific business date, the metadata information on all business dates is queried.
- your_project_environment: specifies the environment of the MaxCompute project whose metadata you want to view. If you want to view the metadata of a MaxCompute project in a development environment, set the value to DEV. If you want to view the metadata of a MaxCompute project in a production environment, set the value to PROD.
- your_node_id: specifies the ID of the node whose metadata information you want to view.
The following code provides command examples:
What to do next
The views provided by the data opening feature support date parameters in the yyyyMMdd format. You can create partitions based on dates and query historical data over the last 30 days. If you have requirements such as metric trend analysis, you can query data by business date. Then, you can save the data to a project and use the data as the historical data required to perform metric analysis.