Data catalogs in Data Studio support the lakehouse architecture of OpenLake. This architecture allows you to manage metadata in a centralized manner and create tables in a variety of ways and an intelligent manner. This improves the efficiency of data development and meets the diverse metadata creation and management requirements of different types of users.
Prerequisites
A workspace is created, and Participate in Public Preview of DataStudio of New Version is turned on. For more information about how to create a workspace, see Create a workspace.
Supported data catalog types
The following table describes the data source types that are supported by data catalogs and the methods that can be used to add the corresponding types of data sources as DataWorks data catalogs.
Data catalog type | Added based on data sources that are associated with a workspace | Added based on existing data sources within an account |
MaxCompute (internal project and external project) | ||
Hologres (internal database and external database) | ||
DLF Catalog (DLF 1.0 and DLF 2.0) | ||
Hive (EMR Hive) | ||
AnalyticDB MySQL | ||
AnalyticDB PostgreSQL | ||
StarRocks | ||
AI Catalog (AI dataset and AI model) | The system automatically reads data from the AI workspace that has the same name as the current DataWorks workspace. |
Identity authentication and authorization
Whether a data catalog can properly read data from a data source depends on the way the data source is added as a data catalog. Take note of the following items:
If a data source associated with the workspace is added as a data catalog, DataWorks uses the identity information configured for the data source to read data from the data source and displays the data in the added data catalog.
If an object created within the current logon account is directly added as a data catalog, DataWorks uses the personal identity information to read data from the data source and displays the data in the added data catalog.
Go to the DATA CATALOG pane
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
In the left-side navigation pane, click the
icon to go to the DATA CATALOG pane.
Add a data catalog to a workspace
You can create a personalized data catalog tree based on your business requirements to enhance user experience. To add a data catalog to a workspace, perform the following steps:
In the DATA CATALOG pane, find a data source type and click the
icon next to the data source type name to go to the details page on which you can add data catalogs.
On the data catalog addition page, find the desired instance or data source and click Add in the Actions column to add the instance or data source as a data catalog.
Data catalogs that are added based on data sources in a workspace are visible to the members of the workspace.
Data catalogs that are added based on the objects created within the current logon account are visible to only the account.
If data catalogs are added based on the objects created within the current logon account, the DATA CATALOG pane displays only the data catalogs added based on the objects that reside in the same region as the workspace and that you have permissions to access.
Manage data catalogs
Hide a data catalog
Perform the following steps to hide the data catalogs that you do not use:
In the upper-right corner of the DATA CATALOG pane, click the
icon.
In the popover that appears, click the
icon before the corresponding data source type to hide all data catalogs of the data source type.
NoteIn the popover, click the blank area before the corresponding data source type to show all data catalogs of the data source type.
Remove a data catalog from the workspace
If you no longer use the added data catalog, perform the following operations to remove the data catalog from the workspace:
In the DATA CATALOG pane, find the data catalog that you add and click Remove or Disassociate Data Catalog in the Actions column to remove the data catalog from the workspace.
Create a data object
Find and expand the added data catalog in the DATA CATALOG pane. Move the pointer over the data catalog. The icon appears. Click the
icon to create a data object. The creation methods for a data object vary based on the data object type. To create a data object for a data source type, perform the following steps:
Manage data objects
View a data object
Perform the following steps to view the data object that you create in the DATA CATALOG pane:
If you encounter an exception in permission verification when you view data tables in the data catalog, you must confirm the data source of the data catalog and locate the identity or account that fails the permission verification based on the source of the data source. Then, you can go to the details page of the data source to grant the corresponding permissions to the identity or account. For more information, see Identity authentication and authorization.
In the DATA CATALOG pane, click the
icon before the object at each layer of the data catalog in sequence. Then, find the database, table, view, and other information that you want to view under the data catalog.
In the MaxCompute section of the DATA CATALOG pane, find the data catalog and click Table in the data catalog. On the Tables page, find the desired table and click the name of the table. In this example, a MaxCompute table is used.
You can also view the table fields and partition fields on the Details tab, the project to which the table belongs, owner of the table, and lifecycle of the table on the Basic Information tab, and the DDL statement that is used to create the table on the DDL tab.
Modify a data object
Perform the following steps to modify the data object that you create in the DATA CATALOG pane:
In the DATA CATALOG pane, click the
icon before the object at each layer of the data catalog in sequence. Then, find the database, table, view, and other information that you want to view under the data catalog.
In the MaxCompute section of the DATA CATALOG pane, find the data catalog and click Table in the data catalog. In this example, a MaxCompute table is used.
On the Tables page, find the desired table and click the name of the table.
Click Edit in the upper-right corner of the table details page to modify the table configurations, such as the table name, lifecycle, and field information.
Delete a data object
Perform the following steps to delete the data object that you create from the data catalog in the DATA CATALOG pane. In this example, a table is deleted.
In the DATA CATALOG pane, right-click the table that you want to delete and select Delete.