A Hudi catalog is an external catalog that lets you query Apache Hudi data directly in StarRocks without importing it. Use INSERT INTO with a Hudi catalog to transform and load Hudi data into StarRocks internal tables. StarRocks supports Hudi catalogs from version 2.4.
Use cases
| Scenario | Description |
|---|---|
| Query acceleration | Run StarRocks queries directly against Hudi tables in your data lake without moving data. |
| Data integration | Read Hudi data and write it into StarRocks internal tables using INSERT INTO. |
Supported capabilities
| Category | Details |
|---|---|
| Storage systems | Hadoop Distributed File System (HDFS), Object Storage Service (OSS) |
| Metadata services | Data Lake Formation (DLF), Hive Metastore (HMS) |
| File format | Parquet |
| Compression formats | SNAPPY, LZ4, ZSTD, GZIP, NO_COMPRESSION |
| Table types | Copy On Write (COW), Merge On Read (MOR) |
Create a Hudi catalog
Syntax
CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
"type" = "hudi",
MetastoreParams,
StorageCredentialParams,
MetadataUpdateParams
)Parameters
| Parameter | Required | Description |
|---|---|---|
catalog_name | Yes | Name of the Hudi catalog. Must start with a letter and contain only letters, digits, and underscores (_). Length: 1–64 characters. |
comment | No | Description of the Hudi catalog. |
type | Yes | Type of the data source. Set to hudi. |
MetastoreParams | Yes | Parameters for connecting to the metadata service. See MetastoreParams. |
MetastoreParams
Configure one of the following, depending on your metadata service.
Use DLF
| Property | Required | Description |
|---|---|---|
hive.metastore.type | Yes | Type of metadata service. Set to dlf. |
dlf.catalog.id | No | ID of an existing data catalog in DLF. If not specified, StarRocks uses the default DLF catalog. |
Use HMS
| Property | Required | Description |
|---|---|---|
hive.metastore.type | Yes | Type of metadata service. Set to hive. |
hive.metastore.uris | Yes | URI of the Hive Metastore service. Format: thrift://<metastore-host>:<port>. The default port is 9083. |
Examples
Example 1: OSS with DLF
CREATE EXTERNAL CATALOG hudi_catalog_dlf
PROPERTIES
(
"type" = "hudi",
"hive.metastore.type" = "dlf",
"dlf.catalog.id" = "<your-dlf-catalog-id>"
);Example 2: OSS with HMS
CREATE EXTERNAL CATALOG hudi_catalog_hms
PROPERTIES
(
"type" = "hudi",
"hive.metastore.type" = "hive",
"hive.metastore.uris" = "thrift://<metastore-host>:9083"
);Example 3: HDFS with HMS
CREATE EXTERNAL CATALOG hudi_catalog
PROPERTIES
(
"type" = "hudi",
"hive.metastore.type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
);View Hudi catalogs
List all catalogs in your StarRocks cluster:
SHOW CATALOGS;View the creation statement of a specific Hudi catalog:
SHOW CREATE CATALOG hudi_catalog;Switch to a Hudi catalog and database
Use either of the following methods.
Option 1: Set catalog, then switch database
-- Switch to the Hudi catalog for the current session:
SET CATALOG <catalog_name>;
-- Switch to the target database:
USE <db_name>;Option 2: Switch catalog and database in one statement
USE <catalog_name>.<db_name>;Query a Hudi table
View the schema of a Hudi table:
DESC[RIBE] <catalog_name>.<database_name>.<table_name>;View the schema and file storage location:
SHOW CREATE TABLE <catalog_name>.<database_name>.<table_name>;Query data in a Hudi table:
SELECT * FROM <catalog_name>.<database_name>.<table_name>;Import Hudi data
Use INSERT INTO to transform and load Hudi data into a StarRocks internal table. The following example loads data from a Hudi table into an OLAP table named olap_tbl:
INSERT INTO default_catalog.olap_db.olap_tbl SELECT * FROM hudi_table;Refresh metadata cache
StarRocks caches Hudi metadata and updates it asynchronously by default to improve query performance. After schema changes or data updates to a Hudi table, manually refresh the metadata cache to make sure StarRocks generates accurate query plans immediately:
REFRESH EXTERNAL TABLE <table_name> [PARTITION ('partition_name', ...)];Delete a Hudi catalog
DROP CATALOG hudi_catalog;What's next
For an overview of Apache Hudi, see Overview.