All Products
Search
Document Center

:Install and remove the data opening package

Last Updated:Mar 13, 2024

DataWorks provides the data opening feature. Before you use the feature to collect metadata, you must install the data opening package in your workspace. This topic describes how to install the data opening package and how to view the installation result after the package is installed.

Limits

  • The data opening feature is in public preview and is available only for DataWorks Enterprise Edition and more advanced editions.

  • The data opening feature can be used only for a MaxCompute compute engine. For other types of compute engines, you cannot use the data opening feature to collect metadata.

Install the data opening package

  1. Obtain the required permissions.

    Before you install the data opening package in a workspace, you must submit a ticket to provide the workspace information to the DataWorks technical support engineers. Then, the technical support engineers can grant the required permissions to the workspace. This way, you can install the data opening package in the workspace and use the data opening feature to collect metadata.

    Note

    When you submit a ticket, you must provide the Alibaba Cloud account ID, workspace name, workspace ID, and the region where the workspace resides.

    Make sure that the workspace meets the following requirements:

    • The workspace has a well-established permission management mechanism. This can prevent data leaks caused by unlimited sharing of the metadata that is collected by using the data opening feature.

    • The workspace can be shared within an enterprise or a team. This way, the metadata that is collected by using the data opening feature can be shared among different data development teams.

  2. Use the account of the workspace owner to install the data opening package.

    In this topic, the data opening package is installed in a workspace on the DataStudio page of the workspace. Go to the DataStudio page of the authorized workspace and run the following commands to install the data opening package on an ODPS node:

    Note
    • If the authorized workspace is a workspace in standard mode, the data opening package must be installed in both the development and production environments. This is because the production environment is isolated from the development environment in a workspace in standard mode.

    • You need to change the names of both the u_meta project and the data opening package in the commands used to install the data opening package based on your business requirements. The following commands are used to install the data opening package in the workspace that resides in the China (Hangzhou) region:

    -- Install the data opening package in the workspace in the development environment (work_test_2_dev) that resides in the China (Hangzhou) region. 
    INSTALL PACKAGE u_meta_hangzhou.systables;
    
    -- Install the data opening package in the workspace in the production environment (work_test_2) that resides in the China (Hangzhou) region. 
    USE work_test_2;
    INSTALL PACKAGE u_meta_hangzhou.systables;

    Take note of the following parameters in the preceding commands:

    • u_meta_hangzhou: the name of the workspace in which you want to install the data opening package. This workspace resides in the China (Hangzhou) region.

    • systables: the name of the data opening package that you want to install. After installation, metadata in various dimensions, such as partition metadata and table lineage metadata, can be collected by using tables or views.

    Alibaba Cloud provides different data opening packages for different regions. The name of the workspace varies based on the region. For more information about the names of the workspaces in different regions, see Appendix 2: Available data opening packages. Make sure that you install the related data opening package that is supported by the region in which you want to use the data opening feature. You can change the name of the workspace based on your business requirements.

  3. View the installation result of the data opening package.

    In the list of the installed packages, find the data opening package that you installed. If the value of the Status parameter is OK for the data opening package, the data opening package is installed.

    -- Check whether the u_meta_hangzhou.systables data opening package is included in the data opening packages installed in the workspace. 
    SHOW PACKAGES;
    
    -- Output example
    +-------------+-----------------+--------------------------+--------+
    | PackageName | SourceProject   | InstallTime              | Status |
    +-------------+-----------------+--------------------------+--------+
    | systables   | u_meta_hangzhou | 2020-11-26T15:25:22+0800 | OK     |
    +-------------+-----------------+--------------------------+--------+

View the tables or views provided by the data opening package

When you use the data opening feature to query a table or view, you need to specify the full name of the table or view. The table or view name that you need to specify varies based on the version of your data opening package.

You can run the following command to query the tables or views provided by the installed data opening package and view the names of and permissions on the tables or views:

DESCRIBE PACKAGE u_meta_hangzhou.systables;

In the following example, a command is run in DataStudio of DataWorks to query the tables provided by the installed data opening package, and the query results are also provided.

Note

The names of the tables and views provided by the data opening package also contain the version information of the data opening package. The version number of the data opening package changes with the iteration and releases of new features. When you use the data opening feature, the actual version number of the data opening package depends on the systables parameter. For example, in the raw_v_tenant_user_v1_1 view, v1_1 is the version number.

-- View the tables or views contained in the u_meta_hangzhou.systables package. 
DESCRIBE PACKAGE u_meta_hangzhou.systables;

-- Output example
CreateTime:         2020-11-18T20:17:24+0800
PackageName:        systables
SourceProject:      u_meta_hangzhou
Object List
+------------+-----------------------------------+------------------+
| ObjectType | ObjectName                        | ObjectPrivileges |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_biz_table_wiki_v1_1    | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_column_usage_v1_1      | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_column_v1_1            | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_database_v1_1          | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_partition_v1_1         | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_table_detail_log_v1_1  | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_table_join_map_v1_1    | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_table_lineage_v1_1     | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_table_output_v1_1      | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_table_usage_v1_1       | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_table_v1_1             | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_meta_view_v1_1              | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_schedule_di_resgroup_v1_1   | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_schedule_node_relation_v1_1 | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_schedule_node_v1_1          | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_schedule_task_v1_1          | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_tenant_res_group_v1_1       | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_tenant_user_v1_1            | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_tenant_workspace_user_v1_1  | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | raw_v_tenant_workspace_v1_1       | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | rpt_v_meta_ind_table_core_v1_1    | Describe,Select  |
+------------+-----------------------------------+------------------+
| TABLE      | rpt_v_meta_ind_table_extra_v1_1   | Describe,Select  |
+------------+-----------------------------------+------------------+

View the schema of a table or view and its field descriptions

Run the following command to query the schema of a table or view and its field descriptions:

DESCRIBE u_meta_hangzhou.rpt_v_meta_ind_table_core_v1_0;

In the preceding command, rpt_v_meta_ind_table_core_v1_0 is the name of the table or view whose schema and field descriptions you want to query. You can obtain the actual table or view name based on the instructions provided in the View the tables or views provided by the data opening package section of this topic.

Note

DataWorks provides data from multiple dimensions, such as detail data, metric data, and dimensional data. You can obtain and store data to a DataWorks workspace based on your business requirements to save storage space. The data can be used as the historical data for subsequent data governance or O&M.

In the following example, a command is run to query the rpt_v_meta_ind_table_core view in DataStudio of DataWorks, and the query results are also provided.

-- View the descriptions of the fields in the rpt_v_meta_ind_table_core view.
DESCRIBE u_meta_hangzhou.rpt_v_meta_ind_table_core_v1_0;

-- Output example
+------------------------------------------------------------------------------------+
| Owner: ALIYUN$dataworks-datagovernance | Project: u_meta_hangzhou                             |
| TableComment: core metrics of the table in the metadata module                                                       |
+------------------------------------------------------------------------------------+
| CreateTime:               2020-12-07 20:02:53                                      |
| LastDDLTime:              2020-12-07 20:02:53                                      |
| LastModifiedTime:         2020-12-07 20:02:53                                      |
+------------------------------------------------------------------------------------+
| VirtualView  : YES  | ViewText: CREATE OR REPLACE VIEW rpt_v_meta_ind_table_core_v1_1 (@param_biz_date STRING)
RETURNS @ret_result TABLE (
    tenant_id            BIGINT           COMMENT     'Specifies the ID of the DataWorks tenant.',
    project_id           BIGINT           COMMENT     'Specifies the ID of the DataWorks workspace.',
    catalog_name         STRING           COMMENT     'Specifies the catalog to which the table belongs. The catalog name for a MaxCompute project is odps.',
    database_name        STRING           COMMENT     'Specifies the name of the database or MaxCompute project.',
    table_name           STRING           COMMENT     'Specifies the name of the table.',
    table_uuid           STRING           COMMENT     'Specifies the ID of the table.',
    owner_yun_acct       STRING           COMMENT     'Specifies the Alibaba Cloud account used by the table owner.',
    dim_life_cycle       BIGINT           COMMENT     'Specifies the lifecycle of the table. The value 0 indicates that the lifecycle is not configured for the table, and other values indicate the specific values of the lifecycle.'
    is_partition_table   BOOLEAN          COMMENT     'Specifies whether the table is a partitioned table. The value true indicates that the table is a partitioned table.',
    entity_type          BIGINT           COMMENT     'Specifies the entity type. The value 0 indicates a table, and the value 1 indicates a view.',
    categories           STRING           COMMENT     'Specifies the categories in the table.',
    last_access_time     BIGINT           COMMENT     'Specifies the time at which the table was last accessed. This time is a 10-digit UNIX timestamp.',
    `size`               BIGINT           COMMENT     'Specifies the size of the table, which indicates the logical storage space occupied by data in the table. The volume of data stored in a view is NULL.',
    column_count         BIGINT           COMMENT     'Specifies the number of fields in the table, including the partition key column.',
    partition_count      BIGINT           COMMENT     'Specifies the number of partitions in the table. If the table is a non-partitioned table, the value of this parameter is NULL.',
    detail_view_count    BIGINT           COMMENT     'Specifies the number of times that the details of the table are viewed on the page.',
    favorite_count       BIGINT           COMMENT     'Specifies the number of times that the table is added to favorites.',
    biz_date             STRING           COMMENT     'Specifies the date on which business data was generated.'
) COMMENT 'core metrics of the table in the metadata module' AS
SELECT * FROM u_meta_hangzhou.rpt_v_meta_ind_table_core_proxy(@param_biz_date) |
+------------------------------------------------------------------------------------+
| Native Columns:                                                                    |
+------------------------------------------------------------------------------------+
| Field           | Type       | Label | Comment                                     |
+------------------------------------------------------------------------------------+
| tenant_id       | bigint     |       | Indicates the ID of the DataWorks tenant.                            |
| project_id      | bigint     |       | Indicates the ID of the DataWorks workspace.                      |
| catalog_name    | string     |       | Indicates the catalog to which the table belongs. The catalog name for a MaxCompute project is odps.;          |
| database_name   | string     |       | Indicates the name of the database or MaxCompute project.                             |
| table_name      | string     |       | Indicates the name of the table.                                      |
| table_uuid      | string     |       l Indicates the ID of the table.                                    |
| owner_yun_acct  | string     |       | Indicates the Alibaba Cloud account used by the table owner.                                |
| dim_life_cycle  | bigint     |       | Indicates the lifecycle of the table. The value 0 indicates that the lifecycle is not configured for the table, and other values indicate the specific values of the lifecycle.              |
| is_partition_table | boolean    |       | Indicates whether the table is a partitioned table. The value true indicates that the table is a partitioned table.                        |
| entity_type     | bigint     |       | Indicates the entity type. The value 0 indicates a table, and the value 1 indicates a view.                    |
| categories      | string     |       | Indicates the categories in the table.                                    |
| last_access_time | bigint     |       | Indicates the time at which the table was last accessed. This time is a 10-digit UNIX timestamp.                |
| size            | bigint     |       | Indicates the size of the table, which indicates the logical storage space occupied by data in the table. The volume of data stored in a view is NULL.              |
| column_count    | bigint     |       | Indicates the number of fields in the table, including the partition key column.                                  |
| partition_count | bigint     |       | Indicates the number of partitions in the table. If the table is a non-partitioned table, the value of this parameter is NULL.                          |
| detail_view_count | bigint     |       | Indicates the number of times that the details of the table are viewed on the page.                                |
| favorite_count  | bigint     |       | Indicates the number of times that the table is added to favorites.                                  |
| biz_date        | string     |       | Indicates the date on which business data was generated.                                      |
+------------------------------------------------------------------------------------+

For more information about the schema of a table or view, see Appendix 1: List and structure details of tables and views.

Remove the data opening package

In this section, the data opening package is removed from a workspace on the DataStudio page of the workspace by running one of the following commands:

Note

If the authorized workspace is a workspace in standard mode, the data opening package must be removed from both the development and production environments. This is because the production environment is isolated from the development environment in a workspace in standard mode.

-- Remove the data opening package from a workspace in the development environment (work_test_2_dev).
UNINSTALL PACKAGE u_meta_hangzhou.systables;

-- Remove the data opening package from a workspace in the production environment (work_test_2).
USE work_test_2;
UNINSTALL PACKAGE u_meta_hangzhou.systables;

What to do next

After the data opening package is installed, you can use the data opening feature to collect metadata for subsequent data governance or O&M. For more information about how to use the data opening feature, see Use the data opening feature.