All Products
Search
Document Center

DataWorks:Overview

Last Updated:Apr 29, 2024

Data Map is a DataWorks service used to manage data directories of enterprises based on metadata. The service provides various features, such as globally searching for data, viewing the details of metadata, previewing data, viewing data lineages, and managing data categories. Data Map can help you search for, understand, and use data.

Collect metadata

  • Metadata of MaxCompute tables

    If you associate a MaxCompute compute engine with your workspace, you can use Data Map to manage the metadata of MaxCompute tables.

  • Other types of metadata

    In addition to the metadata from MaxCompute, you can use the metadata collection feature to collect metadata from various data source types to Data Map for unified management. On the Data Discovery page of Data Map, you can click Create Crawler to collect metadata from a specific data source to Data Map. After the metadata is collected, you can search for and view the metadata of the data source in Data Map. In addition to MaxCompute, Data Map can be used to collect metadata of the following types of data sources: E-MapReduce, Hologres, CDH Hive, CDH Kudu, CDH HBase, AnalyticDB for MySQL 2.0, AnalyticDB for MySQL 3.0, AnalyticDB for PostgreSQL, OTS, PostgreSQL, MySQL, SQL Server, Oracle, ClickHouse, and StarRocks. More data source types will be supported in the future. For information about configuration details of the metadata collection feature, see Metadata collection.

    Note

    If you need to create tables in visual mode in the Workspace Tables pane on the DataStudio page, you must use Data Map to collect metadata first. To create tables in visual mode based on a specific type of data source, you must associate the specific type of compute engine instance corresponding to the data source with your workspace. For more information, see Manage tables.

    元数据采集

Connect to data sources

To collect metadata from a data source to Data Map for unified metadata management, make sure that the metadata crawler in Data Map can access the data source. Add the CIDR blocks of the region in which your DataWorks workspace resides to the whitelist. For more information, see Configure IP address whitelists for metadata collection.

View overall data

  • On the Overview page of Data Map, you can view the following information about MaxCompute resources in the current region: the total number of projects, total number of tables, total storage usage, total number of APIs, storage trend chart, top projects that occupy the most storage space, top tables that occupy the most storage space, and most frequently referenced tables.

  • On this page, you can also view the following information related to other data sources such as AnalyticDB for MySQL V3.0, MySQL, E-MapReduce (EMR), Hologres, AnalyticDB for PostgreSQL, and OTS after metadata collection: the total number of databases, total number of tables, and total number of APIs.

For more information, see View resource information.

Search for tables, code, and APIs

Search for information on the homepage

  • On the homepage, after you select API from the drop-down list on the left of the search box, you can find the desired API based on the recently viewed APIs, most frequently viewed APIs, and most frequently called APIs in an efficient manner. You can also enter a keyword in the search box to search for the desired API. For more information, see Query and manage common data.

  • On the homepage, after you select Table from the drop-down list on the left of the search box, you can find the desired table based on the recently viewed tables, recently read tables, most frequently viewed tables, and most frequently read tables in an efficient manner. You can also enter a keyword in the search box to search for the desired table. For more information, see Query and manage common data.

  • On the homepage, after you select Code from the drop-down list on the left of the search box, you can find the desired code based on the recently searched code or by entering a keyword in the search box. For more information, see Query and manage common data.

image.png

Search for information on the Search page

  • On the Search page, after you click API in the Type section, you can search for an API in all workspaces within the current tenant by entering a keyword such as API name or API description. Then, you can specify filter conditions such as API type, workspace, and owner in the left-side pane to filter the search results. Find the API that meets your requirements. For more information, see APIs in DataService Studio.

  • On the Search page, after you click Table in the Type section, you can search for tables of each type of data source for which metadata is collected based on the table name, table description, field name, and field description. Different types of data sources support different filter conditions. For example, you can search for MaxCompute tables by specifying filter conditions such as project, owner, environment (production or development), and category. For more information, see Search for tables.

  • On the Search page, after you click Code in the Type section, you can search for the desired code by selecting a DataWorks service, specifying filter conditions, and entering a keyword in the left-side pane. For more information, see Search for code.

全部数据

View the details of tables and APIs

  • Click the name of the table that you want to view to go to the table details page. On this page, you can view the details of the table, such as the basic information, output information, and lineage information of the table. For more information, see View the details of a table.血缘信息

    Note

    The Lineage tab displays the lineage information about a table. For the specific lineage information that can be viewed for a table, go to the Lineage tab of the table. Data Map also allows you to view the full-link lineage of a MaxCompute table for batch synchronization. You can view the ancestor and descendant tables of a MaxCompute table. On the Lineage tab of the table, you can click the Table Lineage tab to view the information about the data sources and destinations of the table.

    In the upper-right corner of the table details page, you can enter a keyword in the search box to search for a table. The keyword can be a table name, table description, field name, field description, or project name. For more information, see View the details of a table.

  • You can click the desired code file to view the code details.

  • Click the name of the API that you want to view to go to the API details page. On this page, you can view the details of the API, such as the basic information and technical information of the API. For more information, see View the details of an API.

    In the upper-right corner of the API details page, you can enter a keyword in the search box to search for an API. The keyword can be an API ID, API path, API name, or API description.

Organize and manage tables

The category management feature of Data Map helps you efficiently organize and manage tables by category. After you add tables to categories, you can specify a category to filter the search results during table searches. For more information, see Category management: Configuration management. You can also manage tables.

Note

To modify the category tree, you must use your Alibaba Cloud account or access DataWorks as a RAM user that has the AliyunDataWorksfullaccess permission.

  • Category management

    You can use the following methods to add multiple tables to a category at a time:

    1. Go to the Manage Categories tab of the Configuration Management page.

      After you configure the category tree, you can click a last-level category. Then, click Add Tables in the upper-right corner of the Manage Categories tab to add multiple tables in a project to the category at a time. For more information, see Category management: Configuration management.

    2. Go to the My Data page.

      After you configure the category tree, you can go to the Owned by Me or Managed by Me tab of the My Data page to add multiple tables to a specific category at a time. For more information, see My Data.

  • Table management

    Data Map allows you to modify the display names of multiple MaxCompute tables at a time, modify the time-to-live (TTL) of a MaxCompute table, delete multiple MaxCompute tables in the development or production environment at a time, or change the owners of multiple MaxCompute tables at a time. For more information, see My Data.表管理

  • Favorites

    Data Map allows you to add tables to favorites for easy management. This helps you find and view specific tables. You can click My Data in the top navigation bar of Data Map. Then, click the My Favorites tab in the left-side navigation pane to view the tables that are added to favorites. For more information, see My Data.

    • Add a table to favorites

      If you are viewing the details of a table, you can click Add to Favorites in the upper part of the table details page to add the table to favorites. For more information, see View the details of a table. After a table is added to favorites, you can view the table on the My Favorites tab of the My Data page. For more information, see Add a table to favorites.

    • Remove a table from favorites

      You can use the following two methods to remove a table from favorites. A removed table is not displayed on the My Favorites tab of the My Data page.

      • Go to the My Favorites tab of the My Data page, find the table that you want to remove, and then click Remove from Favorites in the Actions column.

      • Go to the details page of a table that has been added to favorites and click Remove from Favorites in the upper part of the page.取消收藏

Manage table permissions

  • Apply for the operation permissions on a table

    In a DataWorks workspace in standard mode, RAM users cannot directly execute SQL statements to perform operations on a table in the production environment. If you are a RAM user, you must apply for required permissions to perform operations on a table in the production environment or query a table that belongs to another Alibaba Cloud account in the production environment. To apply for required permissions, go to the details page of the table on which you want to apply for permissions and click Apply for Permission in the upper part. After you click Apply for Permission, the Permission application tab in Security Center appears. On this tab, you can apply for the required permissions. For more information, see Request permissions on tables.申请权限

    Note

    By default, Data Map does not allow a RAM user without the query permissions on a table to preview the table data on the details page of the table.

  • Manage the preview permissions on MaxCompute tables

    You can go to the Manage Workspaces tab of the Manage Configurations page to manage the preview permissions on MaxCompute tables in a workspace in the development or production environment. After you enable data preview for a project in an environment, all members in the workspace can preview all the tables in the project in the specified environment without the need to apply for the access permissions. For more information, see Manage table visibility.

    Note
    • After you enable data preview, sensitive information may be leaked. Proceed with caution.

    • The project owner or administrator of the workspace can manage the preview permissions.

    • The preview permissions apply only to the data preview feature on the table details page in Data Map.数据预览

  • Hide tables

    After a table is hidden, the table is not displayed in the search results. You can specify that the table is hidden from all users or is visible only to the members of the workspace to which the table belongs. For more information, see My Data.

    • Hide a single table

      Valid values of the Hide or Show parameter for a table:

      • Hide: The table cannot be searched by any user.

      • Within the Project Only: The table is visible only to the members of the workspace to which the table belongs. In this case, the table can be searched only by the members of the workspace.

      • Show: The table can be searched by all users.

      Note

      By default, the preceding rules do not apply to table owners or workspace administrators.

      隐藏

    • Hide multiple tables in a workspace at the same time

      Go to the Manage Workspaces tab of the Manage Configurations page, select a workspace in the Workspaces Owned/Managed by Me pane, and then specify whether to hide tables in the selected workspace in the Allow Only Project Members to View Project Tables column.

      image

More features

  • Workspace management

    Go to the Search page in Data Map and select Workspace in the Type section. You can view all the workspaces within the current Alibaba Cloud account and click the name of the desired workspace to view its details. For more information, see View workspaces.工作空间

  • Table metadata refresh

    If a table exists but cannot be found in Data Map, or a table is updated but the updated table is not displayed in Data Map, you need to refresh the table metadata.

    • In the upper-right corner of the search result page, click Refresh Table Metadata.手工同步

    • Go to the Refresh Table Metadata page of the My Data page in Data Map.

      Note

      Manually refreshing table metadata takes effect only for MaxCompute and EMR.

      • If the data type is MaxCompute, you must specify the name of a table in the odps.Project name.Table name format in the Guid field, and click Refresh.手工同步2

      • If the data type is E-MapReduce, you must configure the Cluster ID, Database, and Table Name parameters, and click Refresh.image

    Then, you can go to the All Data page of Data Map and enter a keyword in the search box on the All Data page to search for the corresponding table again.