All Products
Search
Document Center

E-MapReduce:Iceberg Catalog

Last Updated:Mar 26, 2026

An Iceberg catalog is an external catalog that lets StarRocks access Iceberg data without manual table creation. Supported since StarRocks v2.4.

With an Iceberg catalog, you can:

  • Query Iceberg data directly without creating tables manually.

  • Process, model, and load Iceberg data into StarRocks using INSERT INTO or asynchronous materialized views (v2.5 and later).

  • Create and drop Iceberg databases and tables from StarRocks, and write data to Parquet-formatted Iceberg tables using INSERT INTO (v3.1 and later).

Prerequisites

Before you begin, make sure your StarRocks cluster can reach both the storage system and the metadata service of the Iceberg cluster.

Supported storage systems:

  • Hadoop Distributed File System (HDFS)

  • Alibaba Cloud Object Storage Service (OSS)

Supported metadata services:

Usage notes

The following table summarizes supported file formats, compression formats, and Iceberg table versions when querying Iceberg data from StarRocks.

File formatCompression formatsIceberg table version
ParquetSNAPPY, LZ4, ZSTD, GZIP, NO_COMPRESSIONv1: Supported. v2: Position deletes supported from StarRocks v3.1; equality deletes from v3.1.10+, v3.2.5+, and v3.3.
ORCZLIB, SNAPPY, LZO, LZ4, ZSTD, NO_COMPRESSIONv1: Supported. v2: Position deletes supported from StarRocks v3.0; equality deletes from v3.1.8+, v3.2.3+, and v3.3.

Create an Iceberg catalog

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
    "type" = "iceberg",
    MetastoreParams
)

Common parameters

The following parameters apply to all metadata service types.

ParameterRequiredDescription
catalog_nameYesThe catalog name. Must start with a letter and contain only letters (a–z, A–Z), digits (0–9), and underscores (_). Maximum 64 characters. Case-sensitive.
commentNoA description of the catalog.
typeYesThe data source type. Set to iceberg.

MetastoreParams

The MetastoreParams settings vary by metadata service. Choose the tab that matches your environment.

Use HMS

PropertyRequiredDescription
iceberg.catalog.typeYesThe catalog type in Iceberg. Set to hive.
hive.metastore.urisYesThe HMS URI. Format: thrift://<metastore_ip>:<port>. Default port: 9083.

Example

CREATE EXTERNAL CATALOG iceberg_catalog_hms
PROPERTIES
(
    "type" = "iceberg",
    "iceberg.catalog.type" = "hive",
    "hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
);

Use DLF 1.0 (Legacy)

PropertyRequiredDescription
iceberg.catalog.typeYesThe catalog type in Iceberg. Set to dlf.
dlf.catalog.idNoThe ID of an existing DLF data catalog. If not set, the system uses the default DLF catalog.

Example

CREATE EXTERNAL CATALOG iceberg_catalog_hms
PROPERTIES
(
    "type" = "iceberg",
    "iceberg.catalog.type" = "dlf",
    "dlf.catalog.id" = "sr_dlf"
);

Use DLF

Important

To use DLF, configure a Resource Access Management (RAM) user to perform operations in StarRocks Manager. For details, see Use a DLF catalog.

PropertyRequiredDescription
uriYesThe DLF REST API address. Format: http://<VPC_endpoint>/iceberg, where <VPC_endpoint> is the DLF VPC endpoint for your region. For endpoint values, see Service endpoints. Example: http://cn-hangzhou-vpc.dlf.aliyuncs.com/iceberg.
iceberg.catalog.typeYesThe Iceberg catalog type. Set to dlf_rest.
warehouseYesThe Iceberg catalog name. Get this value from the Data Catalog page in the Data Lake Formation console.
rest.signing-regionYesThe DLF region ID. Example: cn-hangzhou.

Example

CREATE EXTERNAL CATALOG iceberg_catalog
PROPERTIES
(
    "type" = "iceberg",
    "iceberg.catalog.type" = "dlf_rest",
    "uri" = "http://cn-hangzhou-vpc.dlf.aliyuncs.com/iceberg",
    "warehouse" = "iceberg_test",
    "rest.signing-region" = "cn-hangzhou"
);

View an Iceberg catalog

Run SHOW CATALOGS to list all catalogs in the current StarRocks cluster.

SHOW CATALOGS;

To view the creation statement of a specific catalog, run SHOW CREATE CATALOG. For example:

SHOW CREATE CATALOG iceberg_catalog_hms;

Create an Iceberg database

If you have the CREATE DATABASE permission on an Iceberg catalog, use CREATE DATABASE to create a database in that catalog. Available from v3.1.

Note

Use GRANT and REVOKE to manage permissions for users and roles.

Switch to the target Iceberg catalog, then run:

CREATE DATABASE <database_name>
[PROPERTIES ("location" = "<prefix>://<path_to_database>/<database_name.db>/")]

The location parameter specifies the database file path. Set <prefix> based on your storage system:

  • HDFS: hdfs

  • OSS: oss

If you omit location, StarRocks creates the database in the default path of the current Iceberg catalog.

Switch between Iceberg catalogs and databases

Use either of the following approaches:

  • Set the catalog, then set the database:

    -- Switch the active catalog for the current session.
    SET CATALOG <catalog_name>;
    
    -- Set the active database.
    USE <db_name>;
  • Switch to a specific database in the target catalog directly:

    USE <catalog_name>.<db_name>;

Drop an Iceberg database

If you have the DROP permission on an Iceberg database, use DROP DATABASE to drop it. Available from v3.1. Only empty databases can be dropped.

Note

Use GRANT and REVOKE to manage permissions for users and roles.

Dropping a database does not delete the corresponding file path on HDFS or OSS. Switch to the target Iceberg catalog, then run:

DROP DATABASE <database_name>;

Drop an Iceberg catalog

Run DROP CATALOG to drop an external catalog. For example:

DROP CATALOG iceberg_catalog_hms;

Create an Iceberg table

If you have the CREATE TABLE permission on an Iceberg database, use CREATE TABLE or CREATE TABLE AS SELECT (CTAS) to create a table. Available from v3.1.

Switch to the target Iceberg catalog and database, then run:

Syntax

CREATE TABLE [IF NOT EXISTS] [database.]table_name
(column_definition1[, column_definition2, ...
partition_column_definition1, partition_column_definition2...])
[partition_desc]
[PROPERTIES ("key" = "value", ...)]
[AS SELECT query]

Parameters

column_definition

col_name col_type [COMMENT 'comment']
ParameterDescription
col_nameThe column name.
col_typeThe column data type. Supported types: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR[(length)], ARRAY, MAP, STRUCT. Not supported: LARGEINT, HLL, BITMAP.
Note

All non-partition key columns default to NULL. Partition key columns must be declared last in the column list and cannot be NULL.

partition_desc

PARTITION BY (par_col1[, par_col2...])

StarRocks supports identity transforms only — one partition is created per unique partition value.

Note

Partition key columns must be declared last in the column list. Supported data types exclude FLOAT, DOUBLE, DECIMAL, and DATETIME. NULL values are not supported.

PROPERTIES

Declare Iceberg table properties in "key"="value" format.

PropertyDescriptionDefault
locationThe Iceberg table file path. Not required when HMS is the metadata service.
file_formatThe table file format. Currently only Parquet is supported.parquet
compression_codecThe compression format. Supported values: SNAPPY, GZIP, ZSTD, LZ4. Deprecated since v3.2.3 — use the connector_sink_compression_codec session variable instead.gzip

Examples

Create a non-partitioned table with id and score columns:

CREATE TABLE unpartition_tbl
(
    id int,
    score double
);

Create a partitioned table with id and dt as partition keys:

CREATE TABLE partition_tbl_1
(
    action varchar(20),
    id int NOT NULL,
    dt date NOT NULL
)
PARTITION BY (id, dt);

Create a partitioned table from a query result using CTAS:

CREATE TABLE partition_tbl_2
PARTITION BY (id, dt)
AS SELECT * FROM partition_tbl_1;

View an Iceberg table schema

View the table schema:

DESC[RIBE] <catalog_name>.<database_name>.<table_name>;

View the schema and storage location from the CREATE statement:

SHOW CREATE TABLE <catalog_name>.<database_name>.<table_name>;

Insert data into an Iceberg table

If you have the INSERT permission on an Iceberg table, use INSERT to write data from a StarRocks table. Only Parquet-formatted Iceberg tables are supported. Available from v3.1.

Note

Use GRANT and REVOKE to manage permissions for users and roles.

Switch to the target Iceberg catalog and database, then run:

Syntax

INSERT {INTO | OVERWRITE} <table_name>
[ (column_name [, ...]) ]
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }

-- Write data to a specific partition.
INSERT {INTO | OVERWRITE} <table_name>
PARTITION (par_col1=<value> [, par_col2=<value>...])
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
Note

Partition key columns cannot be NULL. Make sure partition key columns have values during data loading.

Parameters

ParameterDescription
INTOAppends data to the target table.
OVERWRITEOverwrites all data in the target table.
column_nameTarget columns for the import. Multiple columns are comma-separated. Specified columns must exist in the target table and include all partition key columns. Column order must match the query result. Unspecified non-partition columns default to NULL. If a data type mismatch occurs, an implicit conversion is attempted; if conversion fails, INSERT INTO returns a parsing error.
expressionAn expression that assigns a value to the corresponding column.
DEFAULTAssigns the default value to the corresponding column.
queryA query whose results are loaded into the target table. Supports any StarRocks SQL query syntax.
PARTITIONThe target partition for data loading. All partition key columns of the target table must be specified. When PARTITION is specified, column_name cannot be used.

Examples

Insert three rows into partition_tbl_1:

INSERT INTO partition_tbl_1
VALUES
    ("buy", 1, "2023-09-01"),
    ("sell", 2, "2023-09-02"),
    ("buy", 3, "2023-09-03");

Insert a calculated result in a specific column order:

INSERT INTO partition_tbl_1 (id, action, dt) SELECT 1+1, 'buy', '2023-09-03';

Insert a result derived from the table itself:

INSERT INTO partition_tbl_1 SELECT 'buy', 1, date_add(dt, INTERVAL 2 DAY) FROM partition_tbl_1
WHERE id=1;

Insert into a specific partition — two equivalent methods:

-- Method 1
INSERT INTO partition_tbl_2 SELECT 'order', 1, '2023-09-01';

-- Method 2
INSERT INTO partition_tbl_2 PARTITION(dt='2023-09-01', id=1) SELECT 'order';

Overwrite a specific partition — two equivalent methods:

-- Method 1
INSERT OVERWRITE partition_tbl_1 SELECT 'close', 1, '2023-09-01';

-- Method 2
INSERT OVERWRITE partition_tbl_1 PARTITION(dt='2023-09-01', id=1) SELECT 'close';

Query data from an Iceberg table

  1. List databases in the Iceberg cluster for the specified catalog:

    SHOW DATABASES FROM <catalog_name>;
  2. Switch to the target Iceberg catalog and database.

  3. Query the target table:

    SELECT count(*) FROM <table_name> LIMIT 10;

Drop an Iceberg table

If you have the DROP permission on an Iceberg table, use DROP TABLE to drop it. Available from v3.1.

Note

Use GRANT and REVOKE to manage permissions for users and roles.

Dropping a table does not delete the corresponding file path and data on HDFS or OSS. A force drop (using the FORCE keyword) deletes the data but not the file path.

Switch to the target Iceberg catalog and database, then run:

DROP TABLE <table_name> FORCE;

References

For more information about Iceberg, see Overview.