All Products
Search
Document Center

E-MapReduce:Iceberg data source

Last Updated:Mar 13, 2024

An Iceberg catalog is an external catalog. You can use an Iceberg catalog to query data in Iceberg. This topic describes how to create an Iceberg catalog in an E-MapReduce (EMR) StarRocks cluster and use the Iceberg catalog to query data in Iceberg.

Prerequisites

  • A cluster that contains the Iceberg service, such as a DataLake cluster or a custom cluster, is created. For more information, see Create a cluster.

  • A cluster that contains the StarRocks service, such as an online analytical processing (OLAP) cluster or a custom cluster, is created, and you have logged on to the cluster. For more information, see Create a cluster and Getting started.

Limits

  • The preceding clusters must be deployed in the same virtual private cloud (VPC) and zone.

  • You can use a StarRocks cluster to query only data in analytic data tables (Version 1). Data in row-level deletes tables (Version 2) cannot be queried. For more information, see Iceberg Table Spec.

Create an Iceberg catalog

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
PROPERTIES
( 
  "key"="value", 
  ...
);

Parameter description

  • catalog_name: the name of the Iceberg catalog. This parameter is required. The name must meet the following requirements:

    • The name can contain letters, digits, and underscores (_). It must start with a letter.

    • The name must be 1 to 64 characters in length.

  • PROPERTIES: the properties of the Iceberg catalog. This parameter is required. The configurations of this parameter vary based on the metadata service that is used by the Iceberg data source. An Iceberg catalog stores the mappings between Iceberg tables and their storage paths. The following information describes the properties that you can configure for different metadata services:

    • Hive MetaStore

      Property

      Required

      Description

      type

      Yes

      The type of the data source. Set the value to iceberg.

      iceberg.catalog.type

      Yes

      The type of the catalog for the Iceberg data source. If you use the Hive metastore, set the value to HIVE.

      iceberg.catalog.hive.metastore.uris

      Yes

      The URI of the Hive metastore. Specify the value in the following format: thrift://<IP address of the Hive metastore>:<Port number>. The default port number is 9083.

    • Custom metadata service

      If you use a custom metadata service, you must develop a custom catalog class in StarRocks. The name of the custom catalog class must be different from existing class names in StarRocks. You must also implement relevant interfaces to ensure that StarRocks can access the custom metadata service by using the class that you developed. The custom catalog class must inherit the abstract class BaseMetastoreCatalog. For information about how to develop a custom catalog class and implement relevant interfaces, see IcebergHiveCatalog. After the custom catalog class is developed, you must package the custom catalog class and the related files, place them in the fe/lib directory on all frontend nodes, and then restart all frontend nodes. This way, the frontend nodes can recognize the class.

      Property

      Required

      Description

      type

      Yes

      The type of the data source. Set the value to iceberg.

      iceberg.catalog.type

      Yes

      The type of the catalog for the Iceberg data source. If you use a custom metadata service, set the value to CUSTOM.

      iceberg.catalog-impl

      Yes

      The fully qualified class name of the custom catalog.

      The frontend nodes find the custom catalog based on this class name. If you have defined custom parameters in the custom catalog and want these parameters to take effect when you query external data, you can add these parameters as key-value pairs to PROPERTIES of SQL statements when you create an Iceberg catalog.

Example

Execute the following statement to create an Iceberg catalog named iceberg_catalog:

CREATE EXTERNAL CATALOG iceberg_catalog
PROPERTIES
(
    "type" = "iceberg",
    "iceberg.catalog.type" = "HIVE",
    "iceberg.catalog.hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
);

Query data in an Iceberg table

Execute the following statement to query data in a specific table of a database:

SELECT * FROM <catalog_name>.<database_name>.<table_name>;

References

For more information about Iceberg, see Overview.