An Iceberg catalog is an external catalog that lets StarRocks access Iceberg data without manual table creation. Supported since StarRocks v2.4.
With an Iceberg catalog, you can:
Query Iceberg data directly without creating tables manually.
Process, model, and load Iceberg data into StarRocks using INSERT INTO or asynchronous materialized views (v2.5 and later).
Create and drop Iceberg databases and tables from StarRocks, and write data to Parquet-formatted Iceberg tables using INSERT INTO (v3.1 and later).
Prerequisites
Before you begin, make sure your StarRocks cluster can reach both the storage system and the metadata service of the Iceberg cluster.
Supported storage systems:
Hadoop Distributed File System (HDFS)
Alibaba Cloud Object Storage Service (OSS)
Supported metadata services:
Hive Metastore (HMS)
Data Lake Formation (DLF)
Usage notes
The following table summarizes supported file formats, compression formats, and Iceberg table versions when querying Iceberg data from StarRocks.
| File format | Compression formats | Iceberg table version |
|---|---|---|
| Parquet | SNAPPY, LZ4, ZSTD, GZIP, NO_COMPRESSION | v1: Supported. v2: Position deletes supported from StarRocks v3.1; equality deletes from v3.1.10+, v3.2.5+, and v3.3. |
| ORC | ZLIB, SNAPPY, LZO, LZ4, ZSTD, NO_COMPRESSION | v1: Supported. v2: Position deletes supported from StarRocks v3.0; equality deletes from v3.1.8+, v3.2.3+, and v3.3. |
Create an Iceberg catalog
Syntax
CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
"type" = "iceberg",
MetastoreParams
)Common parameters
The following parameters apply to all metadata service types.
| Parameter | Required | Description |
|---|---|---|
catalog_name | Yes | The catalog name. Must start with a letter and contain only letters (a–z, A–Z), digits (0–9), and underscores (_). Maximum 64 characters. Case-sensitive. |
comment | No | A description of the catalog. |
type | Yes | The data source type. Set to iceberg. |
MetastoreParams
The MetastoreParams settings vary by metadata service. Choose the tab that matches your environment.
Use HMS
| Property | Required | Description |
|---|---|---|
iceberg.catalog.type | Yes | The catalog type in Iceberg. Set to hive. |
hive.metastore.uris | Yes | The HMS URI. Format: thrift://<metastore_ip>:<port>. Default port: 9083. |
Example
CREATE EXTERNAL CATALOG iceberg_catalog_hms
PROPERTIES
(
"type" = "iceberg",
"iceberg.catalog.type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
);Use DLF 1.0 (Legacy)
| Property | Required | Description |
|---|---|---|
iceberg.catalog.type | Yes | The catalog type in Iceberg. Set to dlf. |
dlf.catalog.id | No | The ID of an existing DLF data catalog. If not set, the system uses the default DLF catalog. |
Example
CREATE EXTERNAL CATALOG iceberg_catalog_hms
PROPERTIES
(
"type" = "iceberg",
"iceberg.catalog.type" = "dlf",
"dlf.catalog.id" = "sr_dlf"
);Use DLF
To use DLF, configure a Resource Access Management (RAM) user to perform operations in StarRocks Manager. For details, see Use a DLF catalog.
| Property | Required | Description |
|---|---|---|
uri | Yes | The DLF REST API address. Format: http://<VPC_endpoint>/iceberg, where <VPC_endpoint> is the DLF VPC endpoint for your region. For endpoint values, see Service endpoints. Example: http://cn-hangzhou-vpc.dlf.aliyuncs.com/iceberg. |
iceberg.catalog.type | Yes | The Iceberg catalog type. Set to dlf_rest. |
warehouse | Yes | The Iceberg catalog name. Get this value from the Data Catalog page in the Data Lake Formation console. |
rest.signing-region | Yes | The DLF region ID. Example: cn-hangzhou. |
Example
CREATE EXTERNAL CATALOG iceberg_catalog
PROPERTIES
(
"type" = "iceberg",
"iceberg.catalog.type" = "dlf_rest",
"uri" = "http://cn-hangzhou-vpc.dlf.aliyuncs.com/iceberg",
"warehouse" = "iceberg_test",
"rest.signing-region" = "cn-hangzhou"
);View an Iceberg catalog
Run SHOW CATALOGS to list all catalogs in the current StarRocks cluster.
SHOW CATALOGS;To view the creation statement of a specific catalog, run SHOW CREATE CATALOG. For example:
SHOW CREATE CATALOG iceberg_catalog_hms;Create an Iceberg database
If you have the CREATE DATABASE permission on an Iceberg catalog, use CREATE DATABASE to create a database in that catalog. Available from v3.1.
Use GRANT and REVOKE to manage permissions for users and roles.
Switch to the target Iceberg catalog, then run:
CREATE DATABASE <database_name>
[PROPERTIES ("location" = "<prefix>://<path_to_database>/<database_name.db>/")]The location parameter specifies the database file path. Set <prefix> based on your storage system:
HDFS:
hdfsOSS:
oss
If you omit location, StarRocks creates the database in the default path of the current Iceberg catalog.
Switch between Iceberg catalogs and databases
Use either of the following approaches:
Set the catalog, then set the database:
-- Switch the active catalog for the current session. SET CATALOG <catalog_name>; -- Set the active database. USE <db_name>;Switch to a specific database in the target catalog directly:
USE <catalog_name>.<db_name>;
Drop an Iceberg database
If you have the DROP permission on an Iceberg database, use DROP DATABASE to drop it. Available from v3.1. Only empty databases can be dropped.
Use GRANT and REVOKE to manage permissions for users and roles.
Dropping a database does not delete the corresponding file path on HDFS or OSS. Switch to the target Iceberg catalog, then run:
DROP DATABASE <database_name>;Drop an Iceberg catalog
Run DROP CATALOG to drop an external catalog. For example:
DROP CATALOG iceberg_catalog_hms;Create an Iceberg table
If you have the CREATE TABLE permission on an Iceberg database, use CREATE TABLE or CREATE TABLE AS SELECT (CTAS) to create a table. Available from v3.1.
Switch to the target Iceberg catalog and database, then run:
Syntax
CREATE TABLE [IF NOT EXISTS] [database.]table_name
(column_definition1[, column_definition2, ...
partition_column_definition1, partition_column_definition2...])
[partition_desc]
[PROPERTIES ("key" = "value", ...)]
[AS SELECT query]Parameters
column_definition
col_name col_type [COMMENT 'comment']| Parameter | Description |
|---|---|
col_name | The column name. |
col_type | The column data type. Supported types: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR[(length)], ARRAY, MAP, STRUCT. Not supported: LARGEINT, HLL, BITMAP. |
All non-partition key columns default to NULL. Partition key columns must be declared last in the column list and cannot be NULL.
partition_desc
PARTITION BY (par_col1[, par_col2...])StarRocks supports identity transforms only — one partition is created per unique partition value.
Partition key columns must be declared last in the column list. Supported data types exclude FLOAT, DOUBLE, DECIMAL, and DATETIME. NULL values are not supported.
PROPERTIES
Declare Iceberg table properties in "key"="value" format.
| Property | Description | Default |
|---|---|---|
location | The Iceberg table file path. Not required when HMS is the metadata service. | — |
file_format | The table file format. Currently only Parquet is supported. | parquet |
compression_codec | The compression format. Supported values: SNAPPY, GZIP, ZSTD, LZ4. Deprecated since v3.2.3 — use the connector_sink_compression_codec session variable instead. | gzip |
Examples
Create a non-partitioned table with id and score columns:
CREATE TABLE unpartition_tbl
(
id int,
score double
);Create a partitioned table with id and dt as partition keys:
CREATE TABLE partition_tbl_1
(
action varchar(20),
id int NOT NULL,
dt date NOT NULL
)
PARTITION BY (id, dt);Create a partitioned table from a query result using CTAS:
CREATE TABLE partition_tbl_2
PARTITION BY (id, dt)
AS SELECT * FROM partition_tbl_1;View an Iceberg table schema
View the table schema:
DESC[RIBE] <catalog_name>.<database_name>.<table_name>;View the schema and storage location from the CREATE statement:
SHOW CREATE TABLE <catalog_name>.<database_name>.<table_name>;Insert data into an Iceberg table
If you have the INSERT permission on an Iceberg table, use INSERT to write data from a StarRocks table. Only Parquet-formatted Iceberg tables are supported. Available from v3.1.
Use GRANT and REVOKE to manage permissions for users and roles.
Switch to the target Iceberg catalog and database, then run:
Syntax
INSERT {INTO | OVERWRITE} <table_name>
[ (column_name [, ...]) ]
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
-- Write data to a specific partition.
INSERT {INTO | OVERWRITE} <table_name>
PARTITION (par_col1=<value> [, par_col2=<value>...])
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }Partition key columns cannot be NULL. Make sure partition key columns have values during data loading.
Parameters
| Parameter | Description |
|---|---|
INTO | Appends data to the target table. |
OVERWRITE | Overwrites all data in the target table. |
column_name | Target columns for the import. Multiple columns are comma-separated. Specified columns must exist in the target table and include all partition key columns. Column order must match the query result. Unspecified non-partition columns default to NULL. If a data type mismatch occurs, an implicit conversion is attempted; if conversion fails, INSERT INTO returns a parsing error. |
expression | An expression that assigns a value to the corresponding column. |
DEFAULT | Assigns the default value to the corresponding column. |
query | A query whose results are loaded into the target table. Supports any StarRocks SQL query syntax. |
PARTITION | The target partition for data loading. All partition key columns of the target table must be specified. When PARTITION is specified, column_name cannot be used. |
Examples
Insert three rows into partition_tbl_1:
INSERT INTO partition_tbl_1
VALUES
("buy", 1, "2023-09-01"),
("sell", 2, "2023-09-02"),
("buy", 3, "2023-09-03");Insert a calculated result in a specific column order:
INSERT INTO partition_tbl_1 (id, action, dt) SELECT 1+1, 'buy', '2023-09-03';Insert a result derived from the table itself:
INSERT INTO partition_tbl_1 SELECT 'buy', 1, date_add(dt, INTERVAL 2 DAY) FROM partition_tbl_1
WHERE id=1;Insert into a specific partition — two equivalent methods:
-- Method 1
INSERT INTO partition_tbl_2 SELECT 'order', 1, '2023-09-01';
-- Method 2
INSERT INTO partition_tbl_2 PARTITION(dt='2023-09-01', id=1) SELECT 'order';Overwrite a specific partition — two equivalent methods:
-- Method 1
INSERT OVERWRITE partition_tbl_1 SELECT 'close', 1, '2023-09-01';
-- Method 2
INSERT OVERWRITE partition_tbl_1 PARTITION(dt='2023-09-01', id=1) SELECT 'close';Query data from an Iceberg table
List databases in the Iceberg cluster for the specified catalog:
SHOW DATABASES FROM <catalog_name>;Switch to the target Iceberg catalog and database.
Query the target table:
SELECT count(*) FROM <table_name> LIMIT 10;
Drop an Iceberg table
If you have the DROP permission on an Iceberg table, use DROP TABLE to drop it. Available from v3.1.
Use GRANT and REVOKE to manage permissions for users and roles.
Dropping a table does not delete the corresponding file path and data on HDFS or OSS. A force drop (using the FORCE keyword) deletes the data but not the file path.
Switch to the target Iceberg catalog and database, then run:
DROP TABLE <table_name> FORCE;References
For more information about Iceberg, see Overview.