All Products
Search
Document Center

DataWorks:Overview

Last Updated:Sep 06, 2023

You can use Data Map to manage the metadata and data assets of your business. For example, you can use Data Map to globally search for data, view the details about the metadata, preview data, view data lineage, and manage data categories. Data Map can help you search for, understand, and use data.

Collect metadata

  • Metadata of MaxCompute tables

    If you associate the MaxCompute compute engine with your workspace, you can use Data Map to manage the metadata of MaxCompute tables.

  • Other types of metadata

    In addition to the metadata from MaxCompute, you can use the metadata collection feature to collect metadata from various data sources to Data Map for unified management. On the Data Discovery page of Data Map, you can click Create Crawler to collect metadata from a specific data source to Data Map. After the metadata is collected, you can search for and view the metadata in Data Map. In addition to MaxCompute, Data Map can be used to collect metadata of the following types of data sources: E-MapReduce, Hologres, CDH Hive, CDH Kudu, CDH HBase, AnalyticDB for MySQL 2.0, AnalyticDB for MySQL 3.0, AnalyticDB for PostgreSQL, OSS, OTS, PostgreSQL, MySQL, SQL Server, and Oracle. More data source types will be supported in the future. For information about configuration details of the metadata collection feature, see Collect metadata.

    Note

    If you need to create tables in visual mode in the Workspace Tables pane on the DataStudio page, you must use Data Map to collect metadata first. To create tables in visual mode based on a specific type of data source, you must associate the specific type of compute engine instance corresponding to the data source with your workspace. For more information, see Manage tables.

    Metadata collection

Connect to data sources

To collect metadata from a data source to Data Map for unified metadata management, make sure that the metadata crawler in Data Map can access the data source.

Add the CIDR blocks of the region in which your DataWorks workspace resides to the whitelist. For more information, see Configure IP address whitelists for metadata collection.

View overall data

  • On the Overview page of Data Map, you can view the following information about MaxCompute resources in the current region: the total number of projects, total number of tables, total storage usage, total number of APIs, storage trend chart, top projects that occupy the most storage space, top tables that occupy the most storage space, and most frequently referenced tables.

  • On this page, you can also view the following information related to other data sources such as AnalyticDB MySQL V3.0, MySQL, EMR, Hologres, AnalyticDB for PostgreSQL, and Tablestore after metadata collection: the total number of databases, total number of tables, and total number of APIs.

For more information, see View resource information.

Search for tables and APIs

Data Map allows you to search for tables and APIs by using the following methods:

    • Go to the homepage of Data Map, click the Table tab above the search box, and then search for a table in the corresponding table list in the Recently Viewed Tables, Recently Read Tables, Most Viewed Tables, or Most Read Tables section. Alternatively, you can enter a keyword in the search box on the Table tab to search for a table. For more information, see Homepage.

    • Go to the homepage of Data Map, click the API tab above the search box, and then search for an API in the corresponding API list in the Recently Viewed APIs, Frequently Viewed APIs, or Frequently Called APIs section. Alternatively, you can enter a keyword in the search box on the API tab to search for an API. For more information, see Homepage.

    Homepage
    • Go to the Search page of Data Map, click the Table tab above the search box, and then enter a keyword in the search box to search for a table in a specific data source after the metadata from the data source is collected. The keyword can be a table name, table description, or field name. Then, you can specify a category, project, or database in the left-side pane to filter the search results. If a MaxCompute data source is used, you can specify an environment type and a table owner to filter the search results. If an EMR data source is used, you can specify an EMR cluster to filter the search results. For more information, see Search for tables.

    • Go to the Search page of Data Map, click the API tab above the search box, and then enter a keyword to search for an API in all DataWorks workspaces within the current tenant. The keyword can be an API name or description. Then, you can specify an API type, a workspace, or an API owner in the left-side pane to filter the search results. Find the API that meets your requirements. For more information, see Query an API.

    All Data page

View the details of tables and APIs

  • Click the name of the table that you want to view to go to the table details page. On this page, you can view the details of the table, such as the basic information, output information, and lineage information of the table. For more information, see View the details of a table. Lineage

    Note

    The Lineage tab displays the lineage information about a table. For the specific lineage information that can be viewed for a table, go to the Lineage tab of the table. Data Map also allows you to view the full-link lineage of a MaxCompute table for batch synchronization. You can view the ancestor and descendant tables of a MaxCompute table. On the Lineage tab of the table, you can click the Table Lineage tab to view the information about the data sources and destinations of the table.

    In the upper-right corner of the table details page, you can enter a keyword in the search box to search for a table. The keyword can be a table name, table description, field name, field description, or project name. For more information, see View the details of a table. Search for tables

  • Click the name of the API that you want to view to go to the API details page. On this page, you can view the details of the API, such as the basic information and technical information of the API. For more information, see View the details of an API.

    In the upper-right corner of the API details page, you can enter a keyword in the search box to search for an API. The keyword can be an API ID, API path, API name, or API description. API

Organize and manage tables

The category management feature of Data Map helps you efficiently organize and manage tables by category. After you add tables to categories, you can specify a category to filter the search results during table searches. For more information, see Configuration management. You can also manage tables.

Note

To modify the category tree, you must use your Alibaba Cloud account or access DataWorks as a RAM user that has the AliyunDataWorksfullaccess permission.

  • Category management

    You can use the following methods to add multiple tables to a category at a time:

    1. Go to the Manage Categories tab of the Configuration Management page.

      After you configure the category tree, you can click a last-level category. Then, click Add Tables in the upper-right corner of the Manage Categories tab to add multiple tables in a project to the category at a time. For more information, see Configuration management.

    2. Go to the My Data page.

      After you configure the category tree, you can go to the Owned by Me or Managed by Me tab of the My Data page to add multiple tables to a specific category at a time. For more information, see My Data.

  • Table management

    Data Map allows you to modify the display names of multiple MaxCompute tables at a time, modify the time-to-live (TTL) of a MaxCompute table, delete multiple MaxCompute tables in the development or production environment at a time, or change the owners of multiple MaxCompute tables at a time. For more information, see My Data. Table management

  • Favorites

    Data Map allows you to add tables to favorites for easy management. This helps you find and view specific tables. You can click My Data in the top navigation bar of Data Map. Then, click the My Favorites tab in the left-side navigation pane to view the tables that are added to favorites. For more information, see My Data. Favorites

    • Add a table to favorites

      If you are viewing the details of a table, you can click Add to Favorites in the upper part of the table details page to add the table to favorites. For more information, see View the details of a table. After a table is added to favorites, you can view the table on the My Favorites tab of the My Data page. For more information, see View the details of a table.

    • Remove a table from favorites

      You can use the following two methods to remove a table from favorites. A removed table is not displayed on the My Favorites tab of the My Data page.

      • Go to the My Favorites tab of the My Data page, find the table that you want to remove, and then click Remove from Favorites in the Actions column.

      • Go to the details page of a table that has been added to favorites and click Remove from Favorites in the upper part of the page. Remove from Favorites

Manage table permissions

  • Apply for the operation permissions on a table

    In a DataWorks workspace in standard mode, RAM users cannot directly execute SQL statements to perform operations on a table in the production environment. If you are a RAM user, you must apply for required permissions to perform operations on a table in the production environment or query a table that belongs to another Alibaba Cloud account in the production environment. To apply for required permissions, go to the details page of the table on which you want to apply for permissions and click Apply for Permission in the upper part. After you click Apply for Permission, the Permission application tab in Security Center appears. On this tab, you can apply for the required permissions. For more information, see Request permissions on tables. Apply for permissions

    Note

    By default, Data Map does not allow a RAM user without the query permissions on a table to preview the table data on the details page of the table.

  • Manage the preview permissions on MaxCompute tables

    You can go to the Manage Workspaces tab of the Configuration Management page to manage the preview permissions on MaxCompute tables in a workspace in the development or production environment. After you enable data preview for a project in an environment, all members in the workspace can preview all the tables in the project in the specified environment without the need to apply for the access permissions. For more information, see Manage table visibility. Preview permissions

    Note
    • After you enable data preview, sensitive information may be leaked. Proceed with caution.

    • The project owner or administrator of the workspace can manage the preview permissions.

    • The preview permissions apply only to the data preview feature on the table details page in Data Map.

    Preview data
  • Hide tables

    After a table is hidden, the table is not displayed in the search results. You can specify that the table is hidden from all users or is visible only to the members of the workspace to which the table belongs. For more information, see My Data.

    Valid values of the Hide or Show parameter for a table:

    • Hide: The table cannot be searched by any user.

    • Within the Project Only: The table is visible only to the members of the workspace to which the table belongs. In this case, the table can be searched only by the members of the workspace.

    • Show: The table can be searched by all users.

    Note

    By default, the preceding rules do not apply to table owners or workspace administrators.

    Hide

More features

  • Workspace management

    In the top navigation bar of Data Map, select Workspaces to view all the workspaces within the current Alibaba Cloud account. On the Workspace Management page, you can click the name of a workspace to view its details. For more information, see View workspaces. Workspaces

  • Manual synchronization

    If an existing table cannot be found during searches or the information about an updated table is not updated in Data Map, you can manually synchronize the table.

    • On the search result page, click Manually Synchronize. Manually Synchronize

    • Go to the Manually Sync Table tab of the My Data page in Data Map, specify a value in the odps.Project name.Table name format for the Table GUID parameter, and then click Manually Sync Table. Manually Sync Table

      Note

      The manual synchronization feature can be used to synchronize only MaxCompute tables.

    Then, you can go to the All Data page of Data Map and enter a keyword in the search box on the All Data page to search for the corresponding table again.