DataWorks provides the data opening feature. Before you use the feature to collect metadata, you must install the data opening package in your workspace. This topic describes how to install the data opening package and how to view the installation result after the package is installed.

Limits

  • The data opening feature is in public preview and is available only for DataWorks Enterprise Edition and other advanced editions.
  • The data opening feature can be used only if you select MaxCompute as the computing engine.

Install the data opening package

  1. Obtain the required permissions.
    Before you install the data opening package in a workspace, you must submit a ticket to provide the workspace information to the DataWorks technical support personnel. Then, the technical support personnel can grant the required permissions to the workspace. This way, you can install the data opening package in the workspace and use the data opening feature to collect metadata. Make sure that the workspace meets the following requirements:
    • The workspace has a well-established permission management mechanism. This can prevent data leaks caused by unlimited sharing of the metadata that is collected by using the data opening feature.
    • The workspace can be shared within an enterprise or a team. This way, the metadata that is collected by using the data opening feature can be shared among different data development teams.
  2. Use the account of the workspace owner to install the data opening package.
    In this topic, the data opening package is installed in a workspace on the DataStudio page of the workspace. Go to the DataStudio page of the workspace that is authorized and run the following commands to install the data opening package on the MaxCompute node:
    Note
    • If the authorized workspace is a standard-mode workspace, the data opening package must be installed in both the development environment and production environment. This is because that the production environments of a standard-mode workspace are isolated from the development environments.
    • You can change the names of both the workspace and the data opening package in the commands used to install the data opening package based on your business requirements. The following commands are used to install the data opening package in the workspace that resides in the China (Hangzhou) region:
    -- Install the data opening package in the workspace that resides in the China (Hangzhou) region in a development environment (work_test_2_dev).
    INSTALL PACKAGE u_meta_hangzhou.systables;
    
    -- Install the data opening package in a workspace that resides in the China (Hangzhou) region in a production environment (work_test_2).
    USE work_test_2;
    INSTALL PACKAGE u_meta_hangzhou.systables;
    In the preceding commands:
    • u_meta_hangzhou: specifies the name of the workspace in which you want to install the data opening package. This workspace resides in the China (Hangzhou) region. Alibaba Cloud provides different data opening packages for different regions. The name of the workspace varies based on the region. For more information about the names of the workspaces in different regions, see Appendix 2: Available data opening packages. You can change the name of the workspace based on your business requirements.
    • systables: specifies the name of the data opening package that you want to install in a workspace in the China (Hangzhou) region. Alibaba Cloud provides different data opening packages for different regions. The name of the data opening package varies based on the region. For information about the names of the data opening packages in different regions, see Appendix 2: Available data opening packages. You can change the name of the data opening package based on your business requirements.
  3. View the installation result of the data opening package.
    In the list of the installed data opening packages, find the data opening package that you installed. If the Status of the data opening package is OK, the data opening package is installed.
    -- Check whether the u_meta_hangzhou.systables data opening package is included in the data opening packages installed in the workspace.
    SHOW PACKAGES;
    
    -- Output example
    +-------------+-----------------+--------------------------+--------+
    | PackageName | SourceProject   | InstallTime              | Status |
    +-------------+-----------------+--------------------------+--------+
    | systables   | u_meta_hangzhou | 2020-11-26T15:25:22+0800 | OK     |
    +-------------+-----------------+--------------------------+--------+

View the tables or views provided by the data opening package

In most cases, when you use the data opening feature to query a table or view, you need to specify the full name of the table or view. The table or view name that you need to specify varies based on the data opening packages of different versions. You can run the following command to query the tables and views provided by the installed data opening package and view the names of and permissions on the tables or views:
DESCRIBE PACKAGE u_meta_hangzhou.systables;
In the following example, a command is run in DataStudio of DataWorks to query the tables provided by the installed data opening package, and the query results are also provided.
Note The names of the tables and views provided by the data opening package also contain the version information of the data opening package. The version number of the data opening package changes with the iteration and releases of new features. When you use the data opening feature, the actual version number of the data opening package in the package specified by the systables parameter prevails. For example, in the raw_v_tenant_user_v1_1 view, v1_1 is the version number.
-- View the tables or views contained in the u_meta_hangzhou.systables package.
DESCRIBE PACKAGE u_meta_hangzhou.systables;

-- Output example
CreateTime:         2020-11-18T20:17:24+0800
PackageName:        systables
SourceProject:      u_meta_hangzhou
Object List
+------------+-----------------------------------+------------------+
| ObjectType | ObjectName                        | ObjectPrivileges |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_biz_table_wiki_v1_1    | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_column_usage_v1_1      | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_column_v1_1            | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_database_v1_1          | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_partition_v1_1         | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_table_detail_log_v1_1  | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_table_join_map_v1_1    | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_table_lineage_v1_1     | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_table_output_v1_1      | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_table_usage_v1_1       | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_table_v1_1             | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_view_v1_1              | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_schedule_di_resgroup_v1_1   | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_schedule_node_relation_v1_1 | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_schedule_node_v1_1          | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_schedule_task_v1_1          | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_tenant_res_group_v1_1       | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_tenant_user_v1_1            | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_tenant_workspace_user_v1_1  | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_tenant_workspace_v1_1       | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | rpt_v_meta_ind_table_core_v1_1    | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | rpt_v_meta_ind_table_extra_v1_1   | Describe,Select  |
+------------+-----------------------------------+------------------+

View the schema of a table or view and its field descriptions

Run the following command to query the schema of a table or view and its field descriptions:
DESCRIBE u_meta_hangzhou.rpt_v_meta_ind_table_core_v1_0;
In the preceding command, rpt_v_meta_ind_table_core_v1_0 is the name of the table or view whose schema and field descriptions you want to query. You can obtain the actual table or view name based on the instructions provided in view tables or views provided by the data opening package.
Note DataWorks provides data from multiple dimensions, such as detail data, metric data, and dimensional data. You can obtain and store data to a DataWorks workspace based on your business requirements to save storage space. The data can be used as the historical data for subsequent data governance or O&M.
In the following example, a command is run to query the rpt_v_meta_ind_table_core table in DataStudio of DataWorks, and the query results are also provided.
-- View the descriptions of the fields in the rpt_v_meta_ind_table_core table.
DESCRIBE u_meta_hangzhou.rpt_v_meta_ind_table_core_v1_0;

-- Output example
+------------------------------------------------------------------------------------+
| Owner: ALIYUN$dataworks-datagovernance | Project: u_meta_hangzhou                             |
| TableComment: core metrics of the table in the metadata module                                                       |
+------------------------------------------------------------------------------------+
| CreateTime:               2020-12-07 20:02:53                                      |
| LastDDLTime:              2020-12-07 20:02:53                                      |
| LastModifiedTime:         2020-12-07 20:02:53                                      |
+------------------------------------------------------------------------------------+
| VirtualView  : YES  | ViewText: CREATE OR REPLACE VIEW rpt_v_meta_ind_table_core_v1_1 (@param_biz_date STRING)
RETURNS @ret_result TABLE (
    tenant_id            BIGINT           COMMENT     'Specifies the ID of the DataWorks tenant.',
    project_id           BIGINT           COMMENT     'Specifies the ID of the DataWorks workspace.',
    catalog_name         STRING           COMMENT     'Specifies the catalog to which the table belongs. The catalog name for a project in MaxCompute is odps.',
    database_name        STRING           COMMENT     'Specifies the name of the database or MaxCompute project.',
    table_name           STRING           COMMENT     'Specifies the name of the table.',
    table_uuid           STRING           COMMENT     'Specifies the ID of the table.',
    owner_yun_acct       STRING           COMMENT     'Specifies the Alibaba Cloud account used by the table owner.',
    dim_life_cycle       BIGINT           COMMENT     'Specifies whether to configure the lifecycle for the table. The value 0 indicates that the lifecycle is not configured for the table, and other values indicate the specific values of the lifecycle.'
    is_partition_table   BOOLEAN          COMMENT     'Specifies whether the table is a partitioned table. The value true indicates that the table is a partitioned table.',
    entity_type          BIGINT           COMMENT     'Specifies the entity type. The value 0 indicates a table, and the value 1 indicates a view.',
    categories           STRING           COMMENT     'Specifies the categories in the table.',
    last_access_time     BIGINT           COMMENT     'Specifies the time at which the table was last accessed. This time is a 10-digit UNIX timestamp.',
    `size`               BIGINT           COMMENT     'Specifies the size of the table, which indicates the logical storage space occupied by data in the table. The volume of data stored in a view is NULL.',
    column_count         BIGINT           COMMENT     'Specifies the number of fields in the table, including the partition key column.',
    partition_count      BIGINT           COMMENT     'Specifies the number of partitions in the table. If the table is a non-partitioned table, the value of this parameter is NULL.',
    detail_view_count    BIGINT           COMMENT     'Specifies the number of times that the details of the table are viewed on the page.',
    favorite_count       BIGINT           COMMENT     'Specifies the number of times that the table is added to favorite.',
    biz_date             STRING           COMMENT     'Specifies the date on which business data was generated.'
) COMMENT 'Core metrics of the table in the metadata module' AS
SELECT * FROM u_meta_hangzhou.rpt_v_meta_ind_table_core_proxy(@param_biz_date) |
+------------------------------------------------------------------------------------+
| Native Columns:                                                                    |
+------------------------------------------------------------------------------------+
| Field           | Type       | Label | Comment                                     |
+------------------------------------------------------------------------------------+
| tenant_id       | bigint     |       | Indicates the ID of the DataWorks tenant.                            |
| project_id      | bigint     |       | Indicates the ID of the DataWorks workspace.                      |
| catalog_name    | string     |       | Indicates the catalog to which the table belongs. The catalog name for a project in MaxCompute is odps.;          |
| database_name   | string     |       | Indicates the name of the database or MaxCompute project.                             |
| table_name      | string     |       | Indicates the name of the table.                                      |
| table_uuid      | string     |       l Indicates the ID of the table.                                    |
| owner_yun_acct  | string     |       | Indicates the Alibaba Cloud account used by the table owner.                                |
| dim_life_cycle  | bigint     |       | Indicates whether the lifecycle is configured for the table. The value 0 indicates that the lifecycle is not configured for the table, and other values indicate the specific values of the lifecycle.              |
| is_partition_table | boolean    |       | Indicates whether the table is a partitioned table. The value true indicates that the table is a partitioned table.                        |
| entity_type     | bigint     |       | Indicates the entity type. The value 0 indicates a table, and the value 1 indicates a view.                    |
| categories      | string     |       | Indicates the categories in the table.                                    |
| last_access_time | bigint     |       | Indicates the time at which the table was last accessed. This time is a 10-digit UNIX timestamp.                |
| size            | bigint     |       | Indicates the size of the table, which indicates the logical storage space occupied by data in the table. The volume of data stored in a view is NULL.              |
| column_count    | bigint     |       | Indicates the number of fields in the table, including the partition key column.                                  |
| partition_count | bigint     |       | Indicates the number of partitions in the table. If the table is a non-partitioned table, the value of this parameter is NULL.                          |
| detail_view_count | bigint     |       | Indicates the number of times that the details of the table are viewed on the page.                                |
| favorite_count  | bigint     |       | Indicates the number of times that the table is added to favorite.                                  |
| biz_date        | string     |       | Indicates the date on which business data was generated.                                      |
+------------------------------------------------------------------------------------+

Remove the data opening package

In this section, the data opening package is removed from a workspace in DataStudio of DataWorks by running one of the following commands:
Note If the authorized workspace is a standard-mode workspace, the data opening package must be removed from both the development environment and production environment. This is because that the production environments of a standard-mode workspace are isolated from the production environments.
-- Remove the data opening package from a workspace in a development environment (work_test_2_dev).
UNINSTALL PACKAGE u_meta_hangzhou.systables;

-- Remove the data opening package from a workspace in a production environment (work_test_2).
USE work_test_2;
UNINSTALL PACKAGE u_meta_hangzhou.systables;

What to do next

After the data opening package is installed, you can use the data opening feature to collect metadata for subsequent data governance or O&M. For more information about how to use the data opening feature, see Use the data opening feature.