All Products
Search
Document Center

E-MapReduce:Paimon Catalog

Last Updated:Dec 04, 2025

A Paimon catalog is an external catalog supported by StarRocks since version 3.1. This topic describes how to create and view Paimon catalogs.

Background

A Paimon catalog lets you:

  • Query data directly in Apache Paimon.

  • Create Paimon databases and tables, and write data to Paimon.

  • Use the INSERT INTO feature to transform and import data.

To access data in Paimon, your StarRocks cluster must be able to access the storage system and metadata service of the Paimon cluster. StarRocks supports the following storage systems and metadata services:

  • A Hadoop Distributed File System (HDFS) or Alibaba Cloud Object Storage Service (OSS).

  • Metadata services. The supported metadata services are Data Lake Formation (DLF) 1.0 (Legacy), DLF, Hive Metastore (HMS), and File System.

Limitations

Creating Paimon databases, creating Paimon tables, and inserting data into Paimon tables are supported only in StarRocks 3.2.9 and later.

Create a Paimon catalog

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
    "type" = "paimon",
    CatalogParams,
    StorageCredentialParams
);

Parameters

Note

A Paimon catalog in StarRocks has a one-to-one mapping with a catalog in the native Paimon API. The configuration items and their meanings are identical.

The parameter configuration varies depending on the metadata type used by the Paimon cluster.

Use HMS

  • catalog_name: The name of the Paimon catalog. This parameter is required. The name must meet the following requirements:

    • It must start with a letter and can contain only letters (a-z or A-Z), digits (0-9), or underscores (_).

    • The total length cannot exceed 64 characters.

  • comment: The description of the Paimon catalog. This parameter is optional.

  • type: The type of the data source. Set this parameter to paimon.

  • CatalogParams: The parameters for StarRocks to access the metadata of the Paimon cluster.

    Property

    Required

    Description

    paimon.catalog.type

    Yes

    The type of the data source. The value is hive.

    hive.metastore.uris

    Yes

    The URI of the HMS. The format is thrift://<HMS IP address>:<port number>. The default port is 9083.

    If your HMS is in High-availability Mode, you can specify multiple HMS addresses separated by commas, for example, "thrift://<HMS IP address 1>:<HMS port 1>,thrift://<HMS IP address 2>:<HMS port 2>,thrift://<HMS IP address 3>:<HMS port 3>".

  • StorageCredentialParams: The parameters for StarRocks to access the file storage of the Paimon cluster.

    • If you use HDFS as the storage system, you do not need to configure StorageCredentialParams.

    • If you use OSS or OSS-HDFS, you must configure StorageCredentialParams.

      "aliyun.oss.endpoint" = "<YourAliyunOSSEndpoint>" 

      The parameters are described in the following table.

      Property

      Description

      aliyun.oss.endpoint

      The endpoint information for OSS or OSS-HDFS is as follows:

      • OSS: Go to the Overview page of your bucket and find the endpoint in the Port section. You can also see OSS regions and endpoints to view the endpoint of the corresponding region. For example, oss-cn-hangzhou.aliyuncs.com.

      • OSS-HDFS: Go to the Overview page of your bucket and find the endpoint for the OSS-HDFS in the Port section. For example, the endpoint for the China (Hangzhou) region is cn-hangzhou.oss-dls.aliyuncs.com.

        Important

        After you configure this parameter, you must also go to the Parameter Configuration page in the EMR Serverless StarRocks console. Then, modify the fs.oss.endpoint parameter in core-site.xml and jindosdk.cfg to be consistent with the value of aliyun.oss.endpoint.

Use filesystem

  • catalog_name: The name of the Paimon catalog. This parameter is required. The name must meet the following requirements:

    • It must start with a letter and can contain only letters (a-z or A-Z), digits (0-9), or underscores (_).

    • The total length cannot exceed 64 characters.

  • comment: The description of the Paimon catalog. This parameter is optional.

  • type: The type of the data source. Set this parameter to paimon.

  • CatalogParams: The parameters for StarRocks to access the metadata of the Paimon cluster.

    Property

    Required

    Description

    paimon.catalog.type

    Yes

    The type of the data source. The value is filesystem.

    paimon.catalog.warehouse

    Yes

    The storage path of the warehouse where Paimon data is stored. HDFS, OSS, and OSS-HDFS are supported. The format for OSS or OSS-HDFS is oss://<yourBucketName>/<yourPath>.

    Important

    If you use OSS or OSS-HDFS as the warehouse, you must configure the aliyun.oss.endpoint parameter. For more information, see StorageCredentialParams: Parameters for StarRocks to access the file storage of the Paimon cluster.

  • StorageCredentialParams: The parameters for StarRocks to access the file storage of the Paimon cluster.

    • If you use HDFS as the storage system, you do not need to configure StorageCredentialParams.

    • If you use OSS or OSS-HDFS, you must configure StorageCredentialParams.

      "aliyun.oss.endpoint" = "<YourAliyunOSSEndpoint>" 

      The parameters are described in the following table.

      Property

      Description

      aliyun.oss.endpoint

      The endpoint information for OSS or OSS-HDFS is as follows:

      • OSS: Go to the Overview page of your bucket and find the endpoint in the Port section. You can also see OSS regions and endpoints to view the endpoint of the corresponding region. For example, oss-cn-hangzhou.aliyuncs.com.

      • OSS-HDFS: Go to the Overview page of your bucket and find the endpoint for the OSS-HDFS in the Port section. For example, the endpoint for the China (Hangzhou) region is cn-hangzhou.oss-dls.aliyuncs.com.

        Important

        After you configure this parameter, you must also go to the Parameter Configuration page in the EMR Serverless StarRocks console. Then, modify the fs.oss.endpoint parameter in core-site.xml and jindosdk.cfg to be consistent with the value of aliyun.oss.endpoint.

Use DLF 1.0 (Legacy)

  • catalog_name: The name of the Paimon catalog. This parameter is required. The name must meet the following requirements:

    • It must start with a letter and can contain only letters (a-z or A-Z), digits (0-9), or underscores (_).

    • The total length cannot exceed 64 characters.

  • comment: The description of the Paimon catalog. This parameter is optional.

  • type: The type of the data source. Set this parameter to paimon.

  • CatalogParams: The parameters for StarRocks to access the metadata of the Paimon cluster.

    Property

    Required

    Description

    paimon.catalog.type

    Yes

    The type of the data source. The value is dlf.

    paimon.catalog.warehouse

    Yes

    The storage path of the warehouse where Paimon data is stored. HDFS, OSS, and OSS-HDFS are supported. The format for OSS or OSS-HDFS is oss://<yourBucketName>/<yourPath>.

    Important

    If you use OSS or OSS-HDFS as the warehouse, you must configure the aliyun.oss.endpoint parameter. For more information, see StorageCredentialParams: Parameters for StarRocks to access the file storage of the Paimon cluster.

    dlf.catalog.id

    No

    The ID of an existing data catalog in DLF. If you do not configure the dlf.catalog.id parameter, the system uses the default DLF catalog.

  • StorageCredentialParams: The parameters for StarRocks to access the file storage of the Paimon cluster.

    • If you use HDFS as the storage system, you do not need to configure StorageCredentialParams.

    • If you use OSS or OSS-HDFS, you must configure StorageCredentialParams.

      "aliyun.oss.endpoint" = "<YourAliyunOSSEndpoint>" 

      The parameters are described in the following table.

      Property

      Description

      aliyun.oss.endpoint

      The endpoint information for OSS or OSS-HDFS is as follows:

      • OSS: Go to the Overview page of your bucket and find the endpoint in the Port section. You can also see OSS regions and endpoints to view the endpoint of the corresponding region. For example, oss-cn-hangzhou.aliyuncs.com.

      • OSS-HDFS: Go to the Overview page of your bucket and find the endpoint for the OSS-HDFS in the Port section. For example, the endpoint for the China (Hangzhou) region is cn-hangzhou.oss-dls.aliyuncs.com.

        Important

        After you configure this parameter, you must also go to the Parameter Configuration page in the EMR Serverless StarRocks console. Then, modify the fs.oss.endpoint parameter in core-site.xml and jindosdk.cfg to be consistent with the value of aliyun.oss.endpoint.

Use DLF

  • catalog_name: The name of the Paimon catalog. This parameter is required. The name must meet the following requirements:

    • It must start with a letter and can contain only letters (a-z or A-Z), digits (0-9), or underscores (_).

    • The total length cannot exceed 64 characters.

  • comment: The description of the Paimon catalog. This parameter is optional.

  • type: The type of the data source. Set this parameter to paimon.

  • CatalogParams: The parameters for StarRocks to access the metadata of the Paimon cluster.

    Important

    If you use DLF, you must use a configured Resource Access Management (RAM) user to perform operations in StarRocks Manager. For more information, see Use a DLF catalog.

    Parameter

    Required

    Description

    <catalog_name>

    Yes

    The name of the DLF data catalog, for example, dlf_catalog.

    type

    Yes

    The catalog type. Set this parameter to the static field paimon.

    uri

    Yes

    The REST API address of DLF.

    The format is http://<VPC endpoint>. <VPC endpoint> is the endpoint of the DLF service in the specified region for VPC access.

    For example, http://cn-hangzhou-vpc.dlf.aliyuncs.com.

    paimon.catalog.type

    Yes

    The Paimon catalog type. Set this parameter to the static field rest.

    paimon.catalog.warehouse

    Yes

    The name of the Paimon catalog. You can obtain it from the Catalogs page in the Data Lake Formation console.

    token.provider

    Yes

    The REST service provider. Set this parameter to the static field dlf.

  • StorageCredentialParams: The parameters for StarRocks to access the file storage of the Paimon cluster.

    • If you use HDFS as the storage system, you do not need to configure StorageCredentialParams.

    • If you use OSS or OSS-HDFS, you must configure StorageCredentialParams.

      "aliyun.oss.endpoint" = "<YourAliyunOSSEndpoint>" 

      The parameters are described in the following table.

      Property

      Description

      aliyun.oss.endpoint

      The endpoint information for OSS or OSS-HDFS is as follows:

      • OSS: Go to the Overview page of your bucket and find the endpoint in the Port section. You can also see OSS regions and endpoints to view the endpoint of the corresponding region. For example, oss-cn-hangzhou.aliyuncs.com.

      • OSS-HDFS: Go to the Overview page of your bucket and find the endpoint for the OSS-HDFS in the Port section. For example, the endpoint for the China (Hangzhou) region is cn-hangzhou.oss-dls.aliyuncs.com.

        Important

        After you configure this parameter, you must also go to the Parameter Configuration page in the EMR Serverless StarRocks console. Then, modify the fs.oss.endpoint parameter in core-site.xml and jindosdk.cfg to be consistent with the value of aliyun.oss.endpoint.

Examples

The following examples show how to create a Paimon catalog named paimon_catalog to query data in a Paimon data source.

Use HMS

CREATE EXTERNAL CATALOG paimon_catalog
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "hive",
    "hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
);

Use filesystem

CREATE EXTERNAL CATALOG paimon_catalog
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "filesystem",
    "paimon.catalog.warehouse" = "oss://<yourBucketName>/<yourPath>",
    "aliyun.oss.endpoint" = "cn-hangzhou.oss-dls.aliyuncs.com"
);

Use DLF 1.0 (Legacy)

CREATE EXTERNAL CATALOG paimon_catalog
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "dlf",
    "paimon.catalog.warehouse" = "oss://<yourBucketName>/<yourPath>",
    "dlf.catalog.id" = "paimon_dlf_test"
);

Use DLF

CREATE EXTERNAL CATALOG dlf_catalog
PROPERTIES
(
   "type"= "paimon",
   "uri" = "http://cn-hangzhou-vpc.dlf.aliyuncs.com",
   "paimon.catalog.type" = "rest",
   "paimon.catalog.warehouse" = "test_paimon",
   "token.provider" = "dlf"
);

View Paimon catalogs

  • You can run SHOW CATALOGS to view all catalogs in the current StarRocks cluster.

    SHOW CATALOGS;
  • You can run SHOW CREATE CATALOG to view the creation statement of an external catalog.

    SHOW CREATE CATALOG paimon_catalog;

Delete a Paimon catalog

You can run DROP CATALOG to delete an external catalog. For example, to delete the Paimon catalog named paimon_catalog, run the following command:

DROP CATALOG paimon_catalog;

Create a Paimon database

CREATE DATABASE IF NOT EXISTS <catalog_name>.<database_name>;

Create a Paimon table

Syntax

CREATE TABLE IF NOT EXISTS <catalog_name>.<database_name>.<table_name>
(
   id STRING,
   name STRING,
   day INT
)
PRIMARY KEY (id, day)  -- Optional for creating append-only tables --
PARTITION BY (`day`) -- Optional for creating non-partitioned tables --
PROPERTIES (  -- Optional if no PROPERTIES are specified --
  key = value
);

Parameters

Common `PROPERTIES` parameters are described in the following table.

Key

Description

file.format

Defines the data storage format for the table. The default is parquet.

bucket

Sets the number of buckets for the table, used for data distribution and query optimization. The default is -1, which means bucketing is not used.

bucket_key

Specifies the column used to distribute records into buckets.

Limitations

  • When you create a partitioned table, all partition key columns must be placed at the end of the column definition.

  • For partitioned tables, all partition key columns must be included in the primary key definition.

  • The specified bucket_key cannot be part of a partition key or primary key.

Example

CREATE TABLE dlf_catalog.sr_dlf_db.ads_age_pvalue_analytics(
    final_gender_code STRING COMMENT 'gender',
    age_level STRING COMMENT 'age_level',
    pvalue_level STRING COMMENT 'consumption_level',
    clicks INT COMMENT 'clicks',
    total_behaviors INT COMMENT 'total_behaviors'
);

View a Paimon table schema

You can view the schema of a Paimon table in the following ways.

  • View the table schema

    DESC[RIBE] <catalog_name>.<database_name>.<table_name>;
  • View the table schema and file storage location from the CREATE TABLE statement

    SHOW CREATE TABLE <catalog_name>.<database_name>.<table_name>;

Query data in a Paimon table

  • To query data in a table within a Paimon catalog, perform the following steps:

    1. Run SHOW DATABASES to view the databases in the specified Paimon catalog.

      SHOW DATABASES FROM <catalog_name>;
    2. Run SET CATALOG to switch to the desired catalog for the current session.

      SET CATALOG <catalog_name>;

      Then, run USE to specify the desired database for the current session.

      USE <db_name>;

      Alternatively, you can run USE to switch the session directly to a specific database within the target catalog.

      USE <catalog_name>.<db_name>;
    3. Run a SELECT statement to query the target table in the target database.

      SELECT count(*) FROM <table_name> LIMIT 10;
  • You can also query the target table directly by specifying the catalog and database in the SELECT statement.

    SELECT * FROM <catalog_name>.<database_name>.<table_name>;

Write data to Paimon

Write to a Paimon table

Note
  • In StarRocks, data is written to Paimon tables in batches (Batch Write). Because of Paimon's limitations, you cannot write data to tables whose bucket mode is HASH_DYNAMIC or CROSS_PARTITION. A typical scenario is writing to a primary key table. If you do not specify the `bucket` property in the table creation statement, the default value is -1 (Dynamic Bucket Mode), which prevents write operations.

  • The Paimon software development kit (SDK) requires that partition key or primary key columns in the data to be written cannot be null. StarRocks does not precheck data before writing it to a Paimon table. Therefore, an exception is thrown if you attempt to write a null value.

INSERT INTO <catalog_name>.<database_name>.<table_name> (column1, column2, ...) VALUES (value1, value2, ...);

For example, you can insert the following data directly into the `ads_age_pvalue_analytics` table.

INSERT INTO dlf_catalog.sr_dlf_db.ads_age_pvalue_analytics (final_gender_code, age_level, pvalue_level, clicks, total_behaviors)
VALUES 
('M', '18-24', 'Low', 1500, 2500),
('F', '25-34', 'Medium', 2200, 3300),
('M', '35-44', 'High', 2800, 4000);

Import Paimon data into an internal table

Assume that an OLAP table named olap_tbl exists in StarRocks. You can transform data from a Paimon table and import it into the StarRocks table olap_tbl as follows:

INSERT INTO default_catalog.olap_db.olap_tbl SELECT * FROM <paimon_catalog>.<db_name>.<table_name>;

References

For more information about Paimon, see Paimon overview.