The data catalog offers a unified interface for managing Hive metadata. This topic explains how to create and manage tables in the data catalog.
Accessing the Hive data catalog
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
In the left navigation pane, click the
icon. In the Data Directory tree, click Hive to open the Hive data catalog management page.
Create a Hive data catalog
On the Hive data catalog management page, you can add existing Hive data sources to the data catalog.
To the right of the Hive data catalog, click the
icon to go to the Add Instance page.On the DataWorks Data Source tab, add a Hive data source to the data catalog list.
On the DataWorks Data Source tab, find the EMR cluster data source for your workspace's computing resources and click Add in its Operation column.
You can also select multiple Hive data sources on the DataWorks Data Source tab and click the Batch Add button at the bottom of the list to add them in bulk.
Manage a Hive data catalog
You can add and manage Hive tables in the Hive data catalog.
Create a table
Click the
icon to the left of the Hive data catalog to expand the tree, and find Table under the database.Click the
icon to the right of Table to go to the Create Table page.You can generate the basic table information and field information by using either of the following methods:
Create a table by using Copilot:
In the toolbar at the top of the page, click Copilot table creation to open the Copilot Chat interface.
Enter an instruction in natural language, for example,
Create a user table.Click Generate and Replace. The system generates a default table name and field information based on your instruction.
If the table name and fields meet your requirements, click Accept.
NoteIf you need to modify some of the table information, you can manually edit the system-generated information after you click Accept.
Create a table manually:
Configure the following parameters as described.
Parameter
Description
Basic information
Specify the Table Name, Table Description, and other information.
Field information
Edit the fields and annotations.
Edit manually: Click the Insert button above the field list, specify the number of rows to insert, and then edit the Field Name and Data Type.
Edit with Copilot: Click Generate Field or Generate field descriptions above the field list. The system can generate relevant fields and descriptions based on the table name and description you have entered.
(Optional) Configure partition information.
To create a partitioned table, go to the Partition Field section. Set the number of Rows for partition fields and click Insert. Then, configure the partition Field Name and Data Type.
(Optional) Configure advanced settings.
Parameter
Description
Table type
Only Internal Table is supported.
Storage location
You can customize the storage directory for the table. Example:
/user/hive/warehouse/hive_work.Storage format
You can set the storage format to CSV, PARQUET, ORC, AVRO, JSON, or SELF_DEFINE. The selected storage format determines the data input and output formats, and the serialization and deserialization methods.
CSV: Comma-separated text files, suitable for simple data structures.
PARQUET: A columnar storage format with a high compression ratio, suitable for big data analysis.
ORC: An optimized columnar storage format with excellent performance that supports complex data types.
AVRO: A binary format that supports schema evolution, suitable for dynamic data structures.
JSON: Supports nested structures, suitable for semi-structured data.
SELF_DEFINE: Allows you to define custom serialization and deserialization logic.
When you are finished, click Publish in the top toolbar to create the table.
Manage tables
After you create a table in the Hive data catalog, click the
icon to the left of the Hive data catalog, and then click Table to go to the Table page.
View tables.
On the Table page, you can view the basic information of all tables. You can also click a specific table name to view its Details, Basic information, and DDL information.
Delete a table.
On the Table page, find the table that you want to delete and click Delete in the Operation column.
ImportantThis action cannot be undone. Proceed with caution.
View and detach a Hive data catalog
If you no longer need a Hive data catalog, you can view its details or detach it.
View a data catalog.
After you add a Hive data source to the Hive data catalog, you can click the
icon to the left of the Hive data catalog to view the added Hive data source.Click the corresponding Hive data source to view all Database in that Hive instance.
Detach a data catalog.
To detach a Hive data catalog that you no longer need, right-click it and select Disassociate Data Catalog.