The data catalog provides a unified interface to manage Hive metadata. This topic describes how to add a Hive data source to the catalog, create tables with fields and partitions, and manage or remove catalog entries.
The Hive data catalog uses a three-level hierarchy: data source (Hive instance) > database > table. Add a Hive data source first, then manage databases and tables within it. Only Internal Table is supported as the table type.
Go to the Hive data catalog page
-
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a region. Find the target workspace and choose Shortcuts > Data Studio in the Actions column.
-
In the left navigation pane, click the
icon. In the Data Catalog tree, click Hive to open the Hive data catalog management page.
Create a Hive data catalog
On the Hive data catalog management page, add existing Hive data sources as datasets.
-
To the right of the Hive data catalog, click the
icon to open the Add Instance page. -
On the DataWorks Data Source tab, add a Hive data source:
-
To manage the EMR computing resources attached for the new Data Studio in the current workspace, find the corresponding EMR cluster data source and click Add in the Actions column.
-
To add multiple sources at once, select multiple Hive data sources and click Batch Add below the list.
-
Manage a Hive data catalog
After adding a Hive data source, create and manage Hive tables in the data catalog.
Create a table
-
Click the
icon next to the Hive data catalog to expand the database, then click Tables. -
To the right of Tables, click the
icon to open the Create Table page. -
Generate basic table and field information using one of the following methods:
-
Create a table using Copilot:
-
In the top toolbar, click Create Table With Copilot to open the Copilot chat interface.
-
Enter an instruction in natural language, for example:
Create a user table. -
Click Generate And Replace. The system generates a default table name and field information.
-
Click Accept to apply the result. To make further changes, edit the generated information manually after accepting.
-
-
Create a table manually:
Parameter Description Basic information Specify a Table Name, Table Description, and other details. Field information Add fields and annotations. Click Insert to add rows manually and fill in Field Name, Field Type, and other details. Alternatively, click Generate Fields or Generate Field Descriptions to let Copilot generate fields based on the table name and description.
-
-
(Optional) Configure partition information. In the Partition Fields section, specify the number of partition fields in Rows and click Insert. Multiple partitions are supported. Configure Field Name, Field Type, and other parameters for each partition field.
-
(Optional) Configure advanced settings.
Parameter Description Table type Only Internal Table is supported. Storage location Specify a custom storage path. Example: /user/hive/warehouse/hive_work.Storage format Select a format based on your data structure and query pattern. See Choose a storage format below. -
Click Publish in the top toolbar to create the table.
Choose a storage format
The system automatically sets the data input format, output format, and serialization/deserialization methods based on the format you select.
| Format | Best for |
|---|---|
| CSV | Simple data structures; comma-separated text |
| PARQUET | Big data analytics; high compression ratio; columnar storage |
| ORC | Complex data types; high-performance columnar storage |
| AVRO | Dynamic data structures; supports schema evolution |
| JSON | Semi-structured data; supports nested structures |
| SELF_DEFINE | Custom serialization/deserialization logic |
Manage tables
Click the
icon to the left of the Hive data catalog, then click Tables to open the Tables page.
-
View tables: Browse basic information for all tables. Click a table name to view its Details, Basic Information, and DDL.
-
Delete a table: Find the table and click Delete in the Actions column.
ImportantThis operation cannot be undone. Proceed with caution.
View and remove a Hive data catalog
View a data catalog
Click the
icon to the left of the Hive data catalog to see the added Hive data sources. Click a data source to view all Databases in that Hive instance.
Remove a data catalog
Right-click the catalog and select Detach Data Catalog from the context menu.