A Paimon catalog is an external catalog supported by StarRocks since version 3.1. This topic describes how to create and view Paimon catalogs.
Background
A Paimon catalog lets you:
Query data directly in Apache Paimon.
Create Paimon databases and tables, and write data to Paimon.
Use the INSERT INTO feature to transform and import data.
To access data in Paimon, your StarRocks cluster must be able to access the storage system and metadata service of the Paimon cluster. StarRocks supports the following storage systems and metadata services:
A Hadoop Distributed File System (HDFS) or Alibaba Cloud Object Storage Service (OSS).
Metadata services. The supported metadata services are Data Lake Formation (DLF) 1.0 (Legacy), DLF, Hive Metastore (HMS), and File System.
Limitations
Creating Paimon databases, creating Paimon tables, and inserting data into Paimon tables are supported only in StarRocks 3.2.9 and later.
Create a Paimon catalog
Syntax
CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
"type" = "paimon",
CatalogParams,
StorageCredentialParams
);Parameters
A Paimon catalog in StarRocks has a one-to-one mapping with a catalog in the native Paimon API. The configuration items and their meanings are identical.
The parameter configuration varies depending on the metadata type used by the Paimon cluster.
Use HMS
catalog_name: The name of the Paimon catalog. This parameter is required. The name must meet the following requirements:It must start with a letter and can contain only letters (a-z or A-Z), digits (0-9), or underscores (_).
The total length cannot exceed 64 characters.
comment: The description of the Paimon catalog. This parameter is optional.type: The type of the data source. Set this parameter topaimon.CatalogParams: The parameters for StarRocks to access the metadata of the Paimon cluster.Property
Required
Description
paimon.catalog.type
Yes
The type of the data source. The value is
hive.hive.metastore.uris
Yes
The URI of the HMS. The format is
thrift://<HMS IP address>:<port number>. The default port is 9083.If your HMS is in High-availability Mode, you can specify multiple HMS addresses separated by commas, for example,
"thrift://<HMS IP address 1>:<HMS port 1>,thrift://<HMS IP address 2>:<HMS port 2>,thrift://<HMS IP address 3>:<HMS port 3>".StorageCredentialParams: The parameters for StarRocks to access the file storage of the Paimon cluster.If you use HDFS as the storage system, you do not need to configure
StorageCredentialParams.If you use OSS or OSS-HDFS, you must configure
StorageCredentialParams."aliyun.oss.endpoint" = "<YourAliyunOSSEndpoint>"The parameters are described in the following table.
Property
Description
aliyun.oss.endpoint
The endpoint information for OSS or OSS-HDFS is as follows:
OSS: Go to the Overview page of your bucket and find the endpoint in the Port section. You can also see OSS regions and endpoints to view the endpoint of the corresponding region. For example,
oss-cn-hangzhou.aliyuncs.com.OSS-HDFS: Go to the Overview page of your bucket and find the endpoint for the OSS-HDFS in the Port section. For example, the endpoint for the China (Hangzhou) region is
cn-hangzhou.oss-dls.aliyuncs.com.ImportantAfter you configure this parameter, you must also go to the Parameter Configuration page in the EMR Serverless StarRocks console. Then, modify the fs.oss.endpoint parameter in
core-site.xmlandjindosdk.cfgto be consistent with the value of aliyun.oss.endpoint.
Use filesystem
catalog_name: The name of the Paimon catalog. This parameter is required. The name must meet the following requirements:It must start with a letter and can contain only letters (a-z or A-Z), digits (0-9), or underscores (_).
The total length cannot exceed 64 characters.
comment: The description of the Paimon catalog. This parameter is optional.type: The type of the data source. Set this parameter topaimon.CatalogParams: The parameters for StarRocks to access the metadata of the Paimon cluster.Property
Required
Description
paimon.catalog.type
Yes
The type of the data source. The value is
filesystem.paimon.catalog.warehouse
Yes
The storage path of the warehouse where Paimon data is stored. HDFS, OSS, and OSS-HDFS are supported. The format for OSS or OSS-HDFS is
oss://<yourBucketName>/<yourPath>.ImportantIf you use OSS or OSS-HDFS as the warehouse, you must configure the aliyun.oss.endpoint parameter. For more information, see StorageCredentialParams: Parameters for StarRocks to access the file storage of the Paimon cluster.
StorageCredentialParams: The parameters for StarRocks to access the file storage of the Paimon cluster.If you use HDFS as the storage system, you do not need to configure
StorageCredentialParams.If you use OSS or OSS-HDFS, you must configure
StorageCredentialParams."aliyun.oss.endpoint" = "<YourAliyunOSSEndpoint>"The parameters are described in the following table.
Property
Description
aliyun.oss.endpoint
The endpoint information for OSS or OSS-HDFS is as follows:
OSS: Go to the Overview page of your bucket and find the endpoint in the Port section. You can also see OSS regions and endpoints to view the endpoint of the corresponding region. For example,
oss-cn-hangzhou.aliyuncs.com.OSS-HDFS: Go to the Overview page of your bucket and find the endpoint for the OSS-HDFS in the Port section. For example, the endpoint for the China (Hangzhou) region is
cn-hangzhou.oss-dls.aliyuncs.com.ImportantAfter you configure this parameter, you must also go to the Parameter Configuration page in the EMR Serverless StarRocks console. Then, modify the fs.oss.endpoint parameter in
core-site.xmlandjindosdk.cfgto be consistent with the value of aliyun.oss.endpoint.
Use DLF 1.0 (Legacy)
catalog_name: The name of the Paimon catalog. This parameter is required. The name must meet the following requirements:It must start with a letter and can contain only letters (a-z or A-Z), digits (0-9), or underscores (_).
The total length cannot exceed 64 characters.
comment: The description of the Paimon catalog. This parameter is optional.type: The type of the data source. Set this parameter topaimon.CatalogParams: The parameters for StarRocks to access the metadata of the Paimon cluster.Property
Required
Description
paimon.catalog.type
Yes
The type of the data source. The value is
dlf.paimon.catalog.warehouse
Yes
The storage path of the warehouse where Paimon data is stored. HDFS, OSS, and OSS-HDFS are supported. The format for OSS or OSS-HDFS is
oss://<yourBucketName>/<yourPath>.ImportantIf you use OSS or OSS-HDFS as the warehouse, you must configure the aliyun.oss.endpoint parameter. For more information, see StorageCredentialParams: Parameters for StarRocks to access the file storage of the Paimon cluster.
dlf.catalog.id
No
The ID of an existing data catalog in DLF. If you do not configure the
dlf.catalog.idparameter, the system uses the default DLF catalog.StorageCredentialParams: The parameters for StarRocks to access the file storage of the Paimon cluster.If you use HDFS as the storage system, you do not need to configure
StorageCredentialParams.If you use OSS or OSS-HDFS, you must configure
StorageCredentialParams."aliyun.oss.endpoint" = "<YourAliyunOSSEndpoint>"The parameters are described in the following table.
Property
Description
aliyun.oss.endpoint
The endpoint information for OSS or OSS-HDFS is as follows:
OSS: Go to the Overview page of your bucket and find the endpoint in the Port section. You can also see OSS regions and endpoints to view the endpoint of the corresponding region. For example,
oss-cn-hangzhou.aliyuncs.com.OSS-HDFS: Go to the Overview page of your bucket and find the endpoint for the OSS-HDFS in the Port section. For example, the endpoint for the China (Hangzhou) region is
cn-hangzhou.oss-dls.aliyuncs.com.ImportantAfter you configure this parameter, you must also go to the Parameter Configuration page in the EMR Serverless StarRocks console. Then, modify the fs.oss.endpoint parameter in
core-site.xmlandjindosdk.cfgto be consistent with the value of aliyun.oss.endpoint.
Use DLF
catalog_name: The name of the Paimon catalog. This parameter is required. The name must meet the following requirements:It must start with a letter and can contain only letters (a-z or A-Z), digits (0-9), or underscores (_).
The total length cannot exceed 64 characters.
comment: The description of the Paimon catalog. This parameter is optional.type: The type of the data source. Set this parameter topaimon.CatalogParams: The parameters for StarRocks to access the metadata of the Paimon cluster.ImportantIf you use DLF, you must use a configured Resource Access Management (RAM) user to perform operations in StarRocks Manager. For more information, see Use a DLF catalog.
Parameter
Required
Description
<catalog_name>Yes
The name of the DLF data catalog, for example,
dlf_catalog.typeYes
The catalog type. Set this parameter to the static field
paimon.uriYes
The REST API address of DLF.
The format is
http://<VPC endpoint>.<VPC endpoint>is the endpoint of the DLF service in the specified region for VPC access.For example,
http://cn-hangzhou-vpc.dlf.aliyuncs.com.paimon.catalog.typeYes
The Paimon catalog type. Set this parameter to the static field
rest.paimon.catalog.warehouseYes
The name of the Paimon catalog. You can obtain it from the Catalogs page in the Data Lake Formation console.
token.providerYes
The REST service provider. Set this parameter to the static field
dlf.StorageCredentialParams: The parameters for StarRocks to access the file storage of the Paimon cluster.If you use HDFS as the storage system, you do not need to configure
StorageCredentialParams.If you use OSS or OSS-HDFS, you must configure
StorageCredentialParams."aliyun.oss.endpoint" = "<YourAliyunOSSEndpoint>"The parameters are described in the following table.
Property
Description
aliyun.oss.endpoint
The endpoint information for OSS or OSS-HDFS is as follows:
OSS: Go to the Overview page of your bucket and find the endpoint in the Port section. You can also see OSS regions and endpoints to view the endpoint of the corresponding region. For example,
oss-cn-hangzhou.aliyuncs.com.OSS-HDFS: Go to the Overview page of your bucket and find the endpoint for the OSS-HDFS in the Port section. For example, the endpoint for the China (Hangzhou) region is
cn-hangzhou.oss-dls.aliyuncs.com.ImportantAfter you configure this parameter, you must also go to the Parameter Configuration page in the EMR Serverless StarRocks console. Then, modify the fs.oss.endpoint parameter in
core-site.xmlandjindosdk.cfgto be consistent with the value of aliyun.oss.endpoint.
Examples
The following examples show how to create a Paimon catalog named paimon_catalog to query data in a Paimon data source.
Use HMS
CREATE EXTERNAL CATALOG paimon_catalog
PROPERTIES
(
"type" = "paimon",
"paimon.catalog.type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
);Use filesystem
CREATE EXTERNAL CATALOG paimon_catalog
PROPERTIES
(
"type" = "paimon",
"paimon.catalog.type" = "filesystem",
"paimon.catalog.warehouse" = "oss://<yourBucketName>/<yourPath>",
"aliyun.oss.endpoint" = "cn-hangzhou.oss-dls.aliyuncs.com"
);Use DLF 1.0 (Legacy)
CREATE EXTERNAL CATALOG paimon_catalog
PROPERTIES
(
"type" = "paimon",
"paimon.catalog.type" = "dlf",
"paimon.catalog.warehouse" = "oss://<yourBucketName>/<yourPath>",
"dlf.catalog.id" = "paimon_dlf_test"
);Use DLF
CREATE EXTERNAL CATALOG dlf_catalog
PROPERTIES
(
"type"= "paimon",
"uri" = "http://cn-hangzhou-vpc.dlf.aliyuncs.com",
"paimon.catalog.type" = "rest",
"paimon.catalog.warehouse" = "test_paimon",
"token.provider" = "dlf"
);View Paimon catalogs
You can run
SHOW CATALOGSto view all catalogs in the current StarRocks cluster.SHOW CATALOGS;You can run
SHOW CREATE CATALOGto view the creation statement of an external catalog.SHOW CREATE CATALOG paimon_catalog;
Delete a Paimon catalog
You can run DROP CATALOG to delete an external catalog. For example, to delete the Paimon catalog named paimon_catalog, run the following command:
DROP CATALOG paimon_catalog;Create a Paimon database
CREATE DATABASE IF NOT EXISTS <catalog_name>.<database_name>;Create a Paimon table
Syntax
CREATE TABLE IF NOT EXISTS <catalog_name>.<database_name>.<table_name>
(
id STRING,
name STRING,
day INT
)
PRIMARY KEY (id, day) -- Optional for creating append-only tables --
PARTITION BY (`day`) -- Optional for creating non-partitioned tables --
PROPERTIES ( -- Optional if no PROPERTIES are specified --
key = value
);
Parameters
Common `PROPERTIES` parameters are described in the following table.
Key | Description |
file.format | Defines the data storage format for the table. The default is |
bucket | Sets the number of buckets for the table, used for data distribution and query optimization. The default is -1, which means bucketing is not used. |
bucket_key | Specifies the column used to distribute records into buckets. |
Limitations
When you create a partitioned table, all partition key columns must be placed at the end of the column definition.
For partitioned tables, all partition key columns must be included in the primary key definition.
The specified
bucket_keycannot be part of a partition key or primary key.
Example
CREATE TABLE dlf_catalog.sr_dlf_db.ads_age_pvalue_analytics(
final_gender_code STRING COMMENT 'gender',
age_level STRING COMMENT 'age_level',
pvalue_level STRING COMMENT 'consumption_level',
clicks INT COMMENT 'clicks',
total_behaviors INT COMMENT 'total_behaviors'
);View a Paimon table schema
You can view the schema of a Paimon table in the following ways.
View the table schema
DESC[RIBE] <catalog_name>.<database_name>.<table_name>;View the table schema and file storage location from the CREATE TABLE statement
SHOW CREATE TABLE <catalog_name>.<database_name>.<table_name>;
Query data in a Paimon table
To query data in a table within a Paimon catalog, perform the following steps:
Run
SHOW DATABASESto view the databases in the specified Paimon catalog.SHOW DATABASES FROM <catalog_name>;Run
SET CATALOGto switch to the desired catalog for the current session.SET CATALOG <catalog_name>;Then, run
USEto specify the desired database for the current session.USE <db_name>;Alternatively, you can run
USEto switch the session directly to a specific database within the target catalog.USE <catalog_name>.<db_name>;Run a
SELECTstatement to query the target table in the target database.SELECT count(*) FROM <table_name> LIMIT 10;
You can also query the target table directly by specifying the catalog and database in the SELECT statement.
SELECT * FROM <catalog_name>.<database_name>.<table_name>;
Write data to Paimon
Write to a Paimon table
In StarRocks, data is written to Paimon tables in batches (Batch Write). Because of Paimon's limitations, you cannot write data to tables whose bucket mode is HASH_DYNAMIC or CROSS_PARTITION. A typical scenario is writing to a primary key table. If you do not specify the `bucket` property in the table creation statement, the default value is -1 (Dynamic Bucket Mode), which prevents write operations.
The Paimon software development kit (SDK) requires that partition key or primary key columns in the data to be written cannot be null. StarRocks does not precheck data before writing it to a Paimon table. Therefore, an exception is thrown if you attempt to write a null value.
INSERT INTO <catalog_name>.<database_name>.<table_name> (column1, column2, ...) VALUES (value1, value2, ...);For example, you can insert the following data directly into the `ads_age_pvalue_analytics` table.
INSERT INTO dlf_catalog.sr_dlf_db.ads_age_pvalue_analytics (final_gender_code, age_level, pvalue_level, clicks, total_behaviors)
VALUES
('M', '18-24', 'Low', 1500, 2500),
('F', '25-34', 'Medium', 2200, 3300),
('M', '35-44', 'High', 2800, 4000);Import Paimon data into an internal table
Assume that an OLAP table named olap_tbl exists in StarRocks. You can transform data from a Paimon table and import it into the StarRocks table olap_tbl as follows:
INSERT INTO default_catalog.olap_db.olap_tbl SELECT * FROM <paimon_catalog>.<db_name>.<table_name>;References
For more information about Paimon, see Paimon overview.