All Products
Search
Document Center

E-MapReduce:Paimon data source

Last Updated:Sep 12, 2024

StarRocks 3.1 or later supports Paimon catalogs. A Paimon catalog is an external catalog. You can use a Paimon catalog to query data in Paimon. This topic describes how to create a Paimon catalog in an E-MapReduce (EMR) StarRocks cluster and use the Paimon catalog to query data in Paimon.

Prerequisites

  • A cluster that contains the Paimon service, such as a DataLake cluster or a custom cluster, is created. For more information, see Create a cluster.

  • A cluster that contains the StarRocks service, such as an online analytical processing (OLAP) cluster or a custom cluster, is created, and you have logged on to the cluster. For more information, see Create a cluster and Getting started.

Limits

The preceding clusters must be deployed in the same virtual private cloud (VPC) and zone.

Create a Paimon catalog

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
PROPERTIES
( 
  "key"="value", 
  ...
);

Parameter description

  • catalog_name: the name of the Paimon catalog. This parameter is required. The name must meet the following requirements:

    • The name can contain letters, digits, and underscores (_). It must start with a letter.

    • The name must be 1 to 64 characters in length.

  • PROPERTIES: the properties of the Paimon catalog. This parameter is required.

    Note

    The Paimon catalogs of StarRocks have a one-to-one mapping relationship with the catalogs in the native Paimon API. The names and meanings of configuration items for the two types of catalogs are the same.

    Property

    Required

    Description

    type

    Yes

    The type of the data source. Set the value to paimon.

    paimon.catalog.type

    Yes

    The metadata storage type that is used by Paimon. Valid values:

    • hive: Use the Hive metastore to store metadata.

    • filesystem: Use a file system to store metadata.

    • dlf: Use Data Lake Formation (DLF) to store metadata.

    paimon.catalog.warehouse

    Yes

    The path where the warehouse resides. HDFS and OSS paths are supported.

    hive.metastore.uris

    No

    The Uniform Resource Identifier (URI) of the Hive Metastore. This parameter is required if you set paimon.catalog.type to hive. Specify the value in the following format: thrift://<IP address of the Hive metastore>:<port number>. The default port number is 9083.

    aliyun.oss.endpoint

    No

    The endpoint of OSS. This parameter is required if you set the value of the paimon.catalog.warehouse parameter to an OSS path.

    dlf.catalog.id

    No

    The ID of the DLF data catalog. This parameter is required only if you set the paimon.catalog.type parameter to dlf. If you do not configure the dlf.catalog.id parameter, the default DLF catalog is used.

Example

Execute the following statement to create a Paimon catalog named paimon_catalog. The paimon.catalog.type parameter is set to dlf.

CREATE EXTERNAL CATALOG paimon_catalog
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "dlf",
    "paimon.catalog.warehouse" = "oss://<yourBucketName>/<yourPath>/",
);

Query data in a Paimon table

Execute the following statement to query data in a specific table of a database:

SELECT * FROM <catalog_name>.<database_name>.<table_name>;

References

For more information about Paimon, see Overview.