DataWorks provides the Open Data feature. Before you can use this feature to collect metadata, you must install the Open Data package in your project. This topic describes how to install the Open Data package and view the installation result.
Recommendation
The DataWorks Open Data feature has moved from an invitational preview to an official release. The official version provides a visual interface that helps you easily browse and manage metadata. This document will be deprecated soon. We recommend that you use the official version. For more information, see Manage open data.
Limits
The Open Data feature is available only in DataWorks Enterprise Edition. This version only supports installing the Open Data package using the command line.
ImportantDataWorks has released a new Open Data feature. This feature lets you efficiently view and manage metadata using a visual interface. For more information, see Manage open data.
The Open Data feature can be used only with the MaxCompute compute engine. You cannot use this feature to collect metadata from other compute engines.
Install the Open Data package
Obtain permissions.
Before you install the Open Data package, you must obtain the required authorization. Submit a ticket and provide your project information to DataWorks technical support. After you are granted authorization, you can install the package in your project and use the Open Data feature to collect metadata.
NoteWhen you submit a ticket, provide your Alibaba Cloud UID, workspace name, workspace ID, and the region where the workspace resides.
The project must meet the following requirements:
The project must have a robust permission management mechanism to prevent data breaches that can be caused by sharing the collected metadata without restrictions.
The project must be shareable within your company or team. This allows the collected metadata to be shared among different data development teams.
Install the Open Data package as the project owner.
This example shows how to install the package in DataStudio. Log on to the DataStudio page of the authorized project and run the installation commands on an ODPS node.
NoteIf the authorized project is in standard mode, you must install the Open Data package in both the development and production environments. In standard mode, the development and production environments are isolated.
In the installation commands, replace the u_meta project name and the Open Data package name based on the region where you install the package. The following example shows how to install the Open Data package in the China (Hangzhou) region.
-- Install the Open Data package for the China (Hangzhou) region in the development environment (work_test_2_dev). INSTALL PACKAGE u_meta_hangzhou.systables; -- Install the Open Data package for the China (Hangzhou) region in the production environment (work_test_2). USE work_test_2; INSTALL PACKAGE u_meta_hangzhou.systables;Parameters:
u_meta_hangzhou: The name of the u_meta project for the Open Data package in the China (Hangzhou) region.
systables: The name of the Open Data package. After installation, you can use tables or views to collect metadata of different dimensions, such as partition metadata and table lineage metadata.
Alibaba Cloud provides Open Data packages for multiple regions. The u_meta project name varies by region. For more information about the u_meta project names for each region, see Appendix 2: List of open projects. When you use the Open Data feature in a region, you must install the corresponding Open Data package and replace the u_meta project name with the actual project name.
View the installation result.
In the list of installed packages, find the package whose PackageName and SourceProject match those from the previous step. If the Status of this package is OK, this means the Open Data package is installed.
-- Check whether the u_meta_hangzhou.systables package is installed in the project. SHOW PACKAGES; -- Sample output +-------------+-----------------+--------------------------+--------+ | PackageName | SourceProject | InstallTime | Status | +-------------+-----------------+--------------------------+--------+ | systables | u_meta_hangzhou | 2020-11-26T15:25:22+0800 | OK | +-------------+-----------------+--------------------------+--------+
View the tables or views provided by the Open Data feature
When you use the Open Data feature, you must enter the full name of the table or view because the names may differ across package versions.
Run the following command to view the list of tables and views provided by the current Open Data feature and obtain their names and permissions.
DESCRIBE PACKAGE u_meta_hangzhou.systables;The following example shows how to run a query command in DataStudio and shows the results.
The names of tables and views in the Open Data package include version information, which changes as features are updated. The version that you use is determined by the version in the systables package. For example, in the raw_v_tenant_user_v1_1 view, v1_1 is the version number.
-- View the tables or views included in the u_meta_hangzhou.systables package.
DESCRIBE PACKAGE u_meta_hangzhou.systables;
-- Sample output
CreateTime: 2020-11-18T20:17:24+0800
PackageName: systables
SourceProject: u_meta_hangzhou
Object List
+------------+-----------------------------------+------------------+
| ObjectType | ObjectName | ObjectPrivileges |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_biz_table_wiki_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_column_usage_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_column_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_database_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_partition_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_table_detail_log_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_table_join_map_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_table_lineage_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_table_output_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_table_usage_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_table_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_meta_view_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_schedule_di_resgroup_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_schedule_node_relation_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_schedule_node_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_schedule_task_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_tenant_res_group_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_tenant_user_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_tenant_workspace_user_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | raw_v_tenant_workspace_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | rpt_v_meta_ind_table_core_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+
| TABLE | rpt_v_meta_ind_table_extra_v1_1 | Describe,Select |
+------------+-----------------------------------+------------------+View the structure and field descriptions of a table or view
Run the following command to view the structure and field descriptions of a specific table or view.
DESCRIBE u_meta_hangzhou.rpt_v_meta_ind_table_core_v1_0;In the command, rpt_v_meta_ind_table_core_v1_0 is the name of the table or view to query. You can obtain the actual name of the table or view as described in View the tables or views provided by the Open Data feature.
DataWorks provides data of multiple dimensions, such as detailed data, metric data, and dimensional data. You can selectively retrieve and save data to your DataWorks project as needed. This data can be used as historical data for data governance or O&M, which helps you avoid using unnecessary storage space.
The following example shows the command and query result for viewing the rpt_v_meta_ind_table_core view in DataStudio.
-- View the field descriptions of the rpt_v_meta_ind_table_core view.
DESCRIBE u_meta_hangzhou.rpt_v_meta_ind_table_core_v1_0;
-- Sample output
+------------------------------------------------------------------------------------+
| Owner: ALIYUN$dataworks-datagovernance | Project: u_meta_hangzhou |
| TableComment: Core metrics of the table in the metadata module |
+------------------------------------------------------------------------------------+
| CreateTime: 2020-12-07 20:02:53 |
| LastDDLTime: 2020-12-07 20:02:53 |
| LastModifiedTime: 2020-12-07 20:02:53 |
+------------------------------------------------------------------------------------+
| VirtualView : YES | ViewText: CREATE OR REPLACE VIEW rpt_v_meta_ind_table_core_v1_1 (@param_biz_date STRING)
RETURNS @ret_result TABLE (
tenant_id BIGINT COMMENT 'Dimension: DataWorks tenant ID',
project_id BIGINT COMMENT 'Dimension: DataWorks project (workspace) ID',
catalog_name STRING COMMENT 'Dimension: The catalog to which the table belongs. The value for a MaxCompute project is odps.',
database_name STRING COMMENT 'Dimension: Database or MaxCompute project name',
table_name STRING COMMENT 'Dimension: Table name',
table_uuid STRING COMMENT 'Dimension: Unique identifier of the table',
owner_yun_acct STRING COMMENT 'Dimension: Alibaba Cloud account of the table owner',
dim_life_cycle BIGINT COMMENT 'Dimension: Lifecycle. 0 indicates that no lifecycle is set. Other values indicate the specific lifecycle value.',
is_partition_table BOOLEAN COMMENT 'Dimension: Indicates whether the table is a partitioned table. true indicates that it is a partitioned table.',
entity_type BIGINT COMMENT 'Dimension: Entity type. 0 indicates a table, and 1 indicates a view.',
categories STRING COMMENT 'Dimension: Category information',
last_access_time BIGINT COMMENT 'Dimension: The last time the table was accessed (10-digit UNIX timestamp)',
`size` BIGINT COMMENT 'Table size. This refers to the logical storage value occupied by the data. The storage amount for a view is NULL.',
column_count BIGINT COMMENT 'Number of fields (including partition key columns)',
partition_count BIGINT COMMENT 'Number of partitions. This value is NULL for a non-partitioned table.',
detail_view_count BIGINT COMMENT 'The number of times the table details were viewed on the page',
favorite_count BIGINT COMMENT 'The number of times the table was added to favorites',
biz_date STRING COMMENT 'Business data date'
) COMMENT 'Core metrics of the table in the metadata module' AS
SELECT * FROM u_meta_hangzhou.rpt_v_meta_ind_table_core_proxy(@param_biz_date) |
+------------------------------------------------------------------------------------+
| Native Columns: |
+------------------------------------------------------------------------------------+
| Field | Type | Label | Comment |
+------------------------------------------------------------------------------------+
| tenant_id | bigint | | Dimension: DataWorks tenant ID |
| project_id | bigint | | Dimension: DataWorks project (workspace) ID |
| catalog_name | string | | Dimension: The catalog to which the table belongs. The value for a MaxCompute project is odps. |
| database_name | string | | Dimension: Database or MaxCompute project name |
| table_name | string | | Dimension: Table name |
| table_uuid | string | | Dimension: Unique identifier of the table |
| owner_yun_acct | string | | Dimension: Alibaba Cloud account of the table owner |
| dim_life_cycle | bigint | | Dimension: Lifecycle. 0 indicates that no lifecycle is set. Other values indicate the specific lifecycle value. |
| is_partition_table | boolean | | Dimension: Indicates whether the table is a partitioned table. true indicates that it is a partitioned table. |
| entity_type | bigint | | Dimension: Entity type. 0 indicates a table, and 1 indicates a view. |
| categories | string | | Dimension: Category information |
| last_access_time | bigint | | Dimension: The last time the table was accessed (10-digit UNIX timestamp) |
| size | bigint | | Table size. This refers to the logical storage value occupied by the data. The storage amount for a view is NULL. |
| column_count | bigint | | Number of fields (including partition key columns) |
| partition_count | bigint | | Number of partitions. This value is NULL for a non-partitioned table. |
| detail_view_count | bigint | | The number of times the table details were viewed on the page |
| favorite_count | bigint | | The number of times the table was added to favorites |
| biz_date | string | | Business data date |
+------------------------------------------------------------------------------------+For more information about the structures of tables or views, see Appendix 1: List and structure details of views.
Uninstall the Open Data package
This example shows how to uninstall the Open Data package in DataStudio. The uninstallation command is as follows.
If the authorized project is in standard mode, you must uninstall the Open Data package from both the development and production environments. In standard mode, the development and production environments are isolated.
-- Uninstall from the development project (work_test_2_dev).
UNINSTALL PACKAGE u_meta_hangzhou.systables;
-- Uninstall from the production project (work_test_2).
USE work_test_2;
UNINSTALL PACKAGE u_meta_hangzhou.systables;What to do next
After you install the Open Data package, you can use the Open Data feature to collect metadata for data governance or O&M. For more information about how to use the Open Data feature, see Use the Open Data feature.