A Hudi catalog is an external catalog. You can use a Hudi catalog to query data in Hudi. This topic describes how to create a Hudi catalog in an E-MapReduce (EMR) StarRocks cluster and use the Hudi catalog to query data in Hudi.
Prerequisites
A cluster that contains the Hudi service, such as a DataLake cluster or a custom cluster, is created. For more information, see Create a cluster.
A cluster that contains the StarRocks service, such as an online analytical processing (OLAP) cluster or a custom cluster, is created, and you have logged on to the cluster. For more information, see Create a cluster and Getting started.
limits
The preceding clusters must be deployed in the same virtual private cloud (VPC) and zone.
Create a Hudi catalog
Syntax
CREATE EXTERNAL CATALOG <catalog_name>
PROPERTIES
(
"key"="value",
...
);Parameter description
catalog_name: the name of the Hudi catalog. This parameter is required. The name must meet the following requirements:The name can contain letters, digits, and underscores (_). It must start with a letter.
The name must be 1 to 64 characters in length.
PROPERTIES: the properties of the Hudi catalog. This parameter is required. The configurations of this parameter vary based on the metadata service that is used by the Hudi data source. The following information describes the properties that you can configure for different metadata services.Property
Required
Description
typeYes
The type of the data source. Set the value to
hudi.hive.metastore.urisYes
The URI of the Hive Metastore service. Specify the value in the following format:
thrift://<IP address of the Hive Metastore service>:<Port number>. The default port number is 9083. This parameter can be left empty if you use Data Lake Formation (DLF) to store metadata.hive.metastore.typeNo
The type of the Metastore. By default, this parameter is left empty. This indicates that the Hive Metastore service is used to store metadata. If you want to use DLF to store metadata, set the value to
dlf.dlf.catalog.idNo
The ID of the DLF catalog from which you want to read data. This parameter is required only if you set the
hive.metastore.typeparameter todlf. If you do not configure this parameter, the ID of the default DLF catalog is used.Hive MetaStore
Property
Required
Description
typeYes
The type of the data source. Set the value to
hudi.hive.metastore.urisYes
The URI of the Hive Metastore service. Specify the value in the following format:
thrift://<IP address of the Hive Metastore service>:<Port number>. The default port number is 9083.DLF
For more information, see Access external tables whose metadata is stored in DLF.
Example
Run the following command to create a Hudi catalog named hudi_catalog:
CREATE EXTERNAL CATALOG hudi_catalog
PROPERTIES
(
"type" = "hudi",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
);Query data in a Hudi table
You can execute the following statement to query data in a specific table of a database:
SELECT * FROM <catalog_name>.<database_name>.<table_name>;References
For information about Hudi, see Overview.