All Products
Search
Document Center

E-MapReduce:Iceberg Catalog

Last Updated:Dec 01, 2025

An Iceberg Catalog is an external catalog that StarRocks has supported since version 2.4.

Background

With an Iceberg Catalog, you can:

  • Query data in Iceberg directly through an Iceberg Catalog without creating tables manually.

  • Process, model, and import data from Iceberg into StarRocks using INSERT INTO or asynchronous materialized views (available in v2.5 and later).

  • Create or drop Iceberg databases and tables from StarRocks. You can also write data from StarRocks tables into Parquet-formatted Iceberg tables using INSERT INTO (available in v3.1 and later).

To ensure proper access to data in Iceberg, your StarRocks cluster must be able to access the storage system and metadata service of the Iceberg cluster. StarRocks currently supports the following storage systems and metadata services:

  • Hadoop Distributed File System (HDFS) or Alibaba Cloud Object Storage Service (OSS).

  • Metadata services. Currently, supported metadata services include Hive Metastore (HMS) and Data Lake Formation (DLF) 1.0 (Legacy).

Usage notes

When you query Iceberg data from StarRocks, note the following.

File format

Compression format

Iceberg table version

Parquet

SNAPPY, LZ4, ZSTD, GZIP, and NO_COMPRESSION

  • v1 tables: Supported.

  • v2 tables: Position deletes are supported from StarRocks v3.1. Equality deletes are supported from v3.1.10+, v3.2.5+, and v3.3.

ORC

ZLIB, SNAPPY, LZO, LZ4, ZSTD, and NO_COMPRESSION

  • v1 tables: Supported.

  • v2 tables: Position deletes are supported from StarRocks v3.0. Equality deletes are supported from v3.1.8+, v3.2.3+, and v3.3.

Create an Iceberg catalog

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
    "type" = "iceberg",
    MetastoreParams
)

Parameters

The parameters vary depending on the metadata service that Iceberg uses.

Use HMS

  • catalog_name: The name of the Iceberg catalog. This parameter is required. The name must meet the following requirements:

    • It must consist of letters (a-z or A-Z), digits (0-9), or underscores (_), and must start with a letter.

    • The total length cannot exceed 64 characters.

    • The catalog name is case-sensitive.

  • comment: The description of the Iceberg catalog. This parameter is optional.

  • type: The type of the data source. Set this to iceberg.

  • MetastoreParams: The parameters for StarRocks to access the metadata service of the Iceberg cluster.

    Property

    Description

    iceberg.catalog.type

    The type of catalog in Iceberg. The value must be hive.

    hive.metastore.uris

    The URI of the Hive Metastore. The format is thrift://<ip_address_of_hive_metastore>:<port>. The default port is 9083.

Use DLF 1.0 (Legacy)

  • catalog_name: The name of the Iceberg catalog. This parameter is required. The name must meet the following requirements:

    • It must consist of letters (a-z or A-Z), digits (0-9), or underscores (_), and must start with a letter.

    • The total length cannot exceed 64 characters.

    • The catalog name is case-sensitive.

  • comment: The description of the Iceberg catalog. This parameter is optional.

  • type: The type of the data source. Set this to iceberg.

  • MetastoreParams: The parameters for StarRocks to access the metadata service of the Iceberg cluster.

    Property

    Description

    iceberg.catalog.type

    The type of catalog in Iceberg. The value must be dlf.

    dlf.catalog.id

    The ID of an existing data catalog in DLF. If you do not configure the dlf.catalog.id parameter, the system uses the default DLF catalog.

Use DLF

Important

If you use DLF, you must use a configured Resource Access Management (RAM) user to perform operations in StarRocks Manager. For more information, see Use a DLF Catalog.

  • catalog_name: The name of the Iceberg Catalog. This parameter is required. The name must meet the following requirements:

    • It must start with a letter and can contain only letters (a-z or A-Z), digits (0-9), and underscores (_).

    • The length cannot exceed 64 characters.

  • comment: A description of the Iceberg Catalog. This parameter is optional.

  • type: The type of the data source. Set this to iceberg.

  • CatalogParams: The parameters for StarRocks to access the metadata of the Iceberg cluster. The parameter settings vary depending on the metadata type used by the Iceberg cluster.

    Parameter

    Required

    Description

    <catalog_name>

    Yes

    The name of the DLF data catalog, for example, dlf_catalog.

    type

    Yes

    The catalog type. For an Iceberg data source, enter the fixed value iceberg.

    uri

    Yes

    The REST API address of DLF. The format is http://<VPC_Endpoint>/iceberg. <VPC_Endpoint> is the VPC endpoint of DLF in the specified region. For specific values, see Service endpoints.
    For example: http://cn-hangzhou-vpc.dlf.aliyuncs.com/iceberg.

    iceberg.catalog.type

    Yes

    The Iceberg Catalog type. For a DLF scenario, enter the fixed value dlf_rest.

    warehouse

    Yes

    The name of the Iceberg Catalog. You can obtain it from the Data Catalog page of the Data Lake Formation console.

    rest.signing-region

    Yes

    The Region ID of the DLF service, for example, cn-hangzhou.

Example

The following example creates an Iceberg catalog named iceberg_catalog_hms.

Use HMS

CREATE EXTERNAL CATALOG iceberg_catalog_hms
PROPERTIES
(
    "type" = "iceberg",
    "iceberg.catalog.type" = "hive",
    "hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
);

Use DLF 1.0 (Legacy)

CREATE EXTERNAL CATALOG iceberg_catalog_hms
PROPERTIES
(
    "type" = "iceberg",
    "iceberg.catalog.type" = "dlf",
    "dlf.catalog.id" = "sr_dlf"
);

Use DLF

CREATE EXTERNAL CATALOG iceberg_catalog
properties
( 
    "type" = "iceberg",
    "iceberg.catalog.type" = "dlf_rest",
    "uri" = "http://cn-hangzhou-vpc.dlf.aliyuncs.com/iceberg",
    "warehouse" = "iceberg_test",
    "rest.signing-region" = "cn-hangzhou"
);

View an Iceberg catalog

You can run SHOW CATALOGS to query all catalogs in the current StarRocks cluster.

SHOW CATALOGS;

You can also run SHOW CREATE CATALOG to view the creation statement of an external catalog. For example, run the following command to view the creation statement of the Iceberg catalog iceberg_catalog_hms.

SHOW CREATE CATALOG iceberg_catalog_hms;

Create an Iceberg database

Similar to the internal catalog of StarRocks, if you have the CREATE DATABASE permission on an Iceberg catalog, you can use CREATE DATABASE to create a database in that catalog. This feature is available from v3.1.

Note

You can use the GRANT and REVOKE statements to grant and revoke permissions for users and roles.

Syntax

Switch to the target Iceberg catalog and then run the following statement to create an Iceberg database.

CREATE DATABASE <database_name>
[PROPERTIES ("location" = "<prefix>://<path_to_database>/<database_name.db>/")]

Parameters

The location parameter specifies the file path of the database. It supports HDFS and Alibaba Cloud OSS:

  • If you use HDFS as the storage system, set Prefix to hdfs.

  • If you use Alibaba Cloud OSS as the storage system, set Prefix to oss.

Note

If you do not specify the location parameter, StarRocks creates the database in the default path of the current Iceberg catalog.

Switch between Iceberg catalogs and databases

You can switch to the target Iceberg catalog and database in the following ways:

  • First, use SET CATALOG to specify the Iceberg catalog for the current session, and then use USE to specify the database.

    -- Switch the active catalog for the current session.
    SET CATALOG <catalog_name>;
    
    -- Specify the active database for the current session.
    USE <db_name>;
  • Run USE to directly switch the session to a specific database in the target Iceberg catalog.

    USE <catalog_name>.<db_name>;

Drop an Iceberg database

Similar to an internal StarRocks database, if you have the DROP permission on an Iceberg database, you can use DROP DATABASE to drop it. This feature is available from v3.1 and only supports dropping empty databases.

Note

You can use the GRANT and REVOKE statements to grant and revoke permissions for users and roles.

The drop database operation does not delete the corresponding file path on HDFS or OSS. Switch to the target Iceberg catalog and then run the following statement to drop an Iceberg database.

DROP DATABASE <database_name>;

Drop an Iceberg catalog

You can run DROP CATALOG to drop an external catalog. For example, run the following command to drop iceberg_catalog_hms.

DROP Catalog iceberg_catalog_hms;

Create an Iceberg table

Similar to an internal StarRocks database, if you have the CREATE TABLE permission on an Iceberg database, you can use CREATE TABLE or CREATE TABLE AS SELECT (CTAS) to create a table in that database. This feature is available from v3.1. Switch to the target Iceberg catalog and database, and then run the following syntax to create an Iceberg table.

Syntax

CREATE TABLE [IF NOT EXISTS] [database.]table_name
(column_definition1[, column_definition2, ...
partition_column_definition1,partition_column_definition2...])
[partition_desc]
[PROPERTIES ("key" = "value", ...)]
[AS SELECT query]

Parameters

  • column_definition

    The syntax for column_definition is as follows.

    col_name col_type [COMMENT 'comment']

    The parameters are described in the following table.

    Parameter

    Description

    col_name

    The name of the column.

    col_type

    The data type of the column.

    • The following data types are currently supported: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR[(length)], ARRAY, MAP, and STRUCT.

    • The LARGEINT, HLL, and BITMAP types are not supported.

    Note

    The default value for all non-partition key columns is NULL. Partition key columns must be declared last in the column list and cannot be NULL.

  • partition_desc

    The syntax for partition_desc is as follows.

    PARTITION BY (par_col1[, par_col2...])

    Currently, StarRocks only supports identity transforms. This means a partition is created for each unique partition value.

    Note

    Partition key columns must be declared last in the column list. They support data types other than FLOAT, DOUBLE, DECIMAL, or DATETIME. NULL values are not supported.

  • PROPERTIES

    You can declare the properties of the Iceberg table in the PROPERTIES clause in the "key"="value" format. For more information, see Iceberg table properties. The following table lists some common properties.

    Property

    Description

    location

    The file path of the Iceberg table. When you use HMS as the metadata service, you do not need to specify the location parameter.

    file_format

    The file format of the Iceberg table. Currently, only the Parquet format is supported. Default: parquet.

    compression_codec

    The compression format for Iceberg tables. The supported formats are SNAPPY, GZIP, ZSTD, and LZ4. Default value: gzip. This property has been deprecated since version 3.2.3. After this version, the compression algorithm for writing to Iceberg tables is uniformly controlled by the connector_sink_compression_codec session variable.

Examples

  • Create a non-partitioned table unpartition_tbl that contains the id and score columns.

    CREATE TABLE unpartition_tbl
    (
        id int,
        score double
    );
  • Create a partitioned table partition_tbl_1 that contains the action, id, and dt columns, and define id and dt as partition key columns.

    CREATE TABLE partition_tbl_1
    (
        action varchar(20),
        id int NOT NULL,
        dt date NOT NULL
    )
    PARTITION BY (id,dt);
  • Query data from the source table partition_tbl_1 and create a partitioned table partition_tbl_2 based on the query result. Define id and dt as the partition key columns for partition_tbl_2.

    CREATE TABLE partition_tbl_2
    PARTITION BY (id, dt)
    AS SELECT * from partition_tbl_1;

View an Iceberg table schema

You can view the schema of an Iceberg table in the following ways.

  • View the table schema.

    DESC[RIBE] <catalog_name>.<database_name>.<table_name>;
  • View the table schema and the storage location of the table file from the CREATE command.

    SHOW CREATE TABLE <catalog_name>.<database_name>.<table_name>;

Insert data into an Iceberg table

Similar to an internal StarRocks table, if you have the INSERT permission on an Iceberg table, you can use INSERT to write data from a StarRocks table into that Iceberg table. Currently, you can only write data to Parquet-formatted Iceberg tables. This feature is available from v3.1.

Note

You can use the GRANT and REVOKE statements to grant and revoke permissions for users and roles.

Switch to the target Iceberg catalog and database, and then run the following syntax to write data from a StarRocks table into a Parquet-formatted Iceberg table.

Syntax

INSERT {INTO | OVERWRITE} <table_name>
[ (column_name [, ...]) ]
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }

-- Write data to a specified partition.
INSERT {INTO | OVERWRITE} <table_name>
PARTITION (par_col1=<value> [, par_col2=<value>...])
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
Note

Partition key columns cannot be NULL. Therefore, you must ensure that partition key columns have values during data loading.

Parameters

Parameter

Description

INTO

Appends data to the target table.

OVERWRITE

Overwrites the data in the target table.

column_name

The target columns for the import. You can specify one or more columns. When you specify multiple columns, you must separate them with a comma (,). The specified columns must exist in the target table and must include the partition key columns. The specified column names can be different from those in the source table, but their order must match the order of columns in the search statement result. If you do not specify this parameter, data is imported into all columns of the target table by default. If a non-partition key column from the source table is not included in the list of target columns, the default value NULL is written. If the data type of a result column in the search statement does not match the data type of the corresponding target column, an implicit conversion is attempted. If the conversion fails, the INSERT INTO statement reports a syntax parsing error.

expression

An expression used to assign a value to the corresponding column.

DEFAULT

Assigns the default value to the corresponding column.

query

A query statement. The result of the query is loaded into the target table. The query statement supports any SQL query syntax that StarRocks supports.

PARTITION

The target partition for the data load. You must specify all partition key columns of the target table. The order of the specified partition key columns can be different from the order defined when the table was created. When you specify a partition, you cannot specify the target columns for the data load using column_name.

Examples

The following write statements use the default Parquet format as an example.

  • Insert the following three rows of data into the partition_tbl_1 table.

    INSERT INTO partition_tbl_1
    VALUES
        ("buy", 1, "2023-09-01"),
        ("sell", 2, "2023-09-02"),
        ("buy", 3, "2023-09-03");
  • Insert the result data of a SELECT query that includes a simple calculation into the partition_tbl_1 table in the specified column order.

    INSERT INTO partition_tbl_1 (id, action, dt) SELECT 1+1, 'buy', '2023-09-03';
  • Insert the result data of a SELECT query that reads data from the partition_tbl_1 table itself.

    INSERT INTO partition_tbl_1 SELECT 'buy', 1, date_add(dt, INTERVAL 2 DAY) FROM partition_tbl_1
    WHERE id=1;
  • Insert the result data of a SELECT query into the partition where dt='2023-09-01' and id=1 in the partition_tbl_2 table.

    • Method 1

      INSERT INTO partition_tbl_2 SELECT 'order', 1, '2023-09-01';
    • Method 2

      INSERT INTO partition_tbl_2 partition(dt='2023-09-01',id=1) SELECT 'order';
  • Overwrite all values in the action column with close in the partition where dt='2023-09-01' and id=1 in the partition_tbl_1 table:

    • Method 1

      INSERT OVERWRITE partition_tbl_1 SELECT 'close', 1, '2023-09-01';
    • Method 2

      INSERT OVERWRITE partition_tbl_1 partition(dt='2023-09-01',id=1) SELECT 'close';

Query data from an Iceberg table

  1. Run SHOW DATABASES to view the databases in the Iceberg cluster that belong to the specified catalog.

    SHOW DATABASES FROM <catalog_name>;
  2. Switch to the target Iceberg catalog and database.

  3. Run SELECT to query the target table in the target database.

    SELECT count(*) FROM <table_name> LIMIT 10;

Drop an Iceberg table

Similar to an internal StarRocks table, if you have the DROP permission on an Iceberg table, you can use DROP TABLE to drop it. This feature is available from v3.1.

Note

You can use the GRANT and REVOKE statements to grant and revoke permissions for users and roles.

The drop table operation does not delete the corresponding file path and data on HDFS or OSS. A force delete, which is performed by adding the FORCE keyword, deletes the data on HDFS or OSS but does not delete the corresponding file path. Switch to the target Iceberg catalog and database, and then run the following statement to drop an Iceberg table.

DROP TABLE <table_name> FORCE;

References

For more information about Iceberg, see Overview.