EMR Serverless StarRocks can query Paimon, Iceberg, and Hive tables stored in Data Lake Formation (DLF) by connecting through an external catalog. To enable this, you set up RAM-based access control and create a DLF-backed catalog in StarRocks.
Choose your DLF version
DLF has two versions with different catalog types:
| Version | Catalog types | When to use |
|---|---|---|
| DLF (current) | Paimon catalog (REST), Iceberg catalog (REST) | Use for catalogs created in DLF with REST-based access |
| DLF 1.0 (legacy) | Hive catalog, Iceberg catalog, Paimon catalog | Use if your catalog was created in the legacy DLF 1.0 service |
Use DLF
Prerequisites
Before you begin, ensure that you have:
-
A Serverless StarRocks instance at version 3.3 or later, with a Minor Version of 3.3.8-1.99 or later. To check the minor version, go to the Version Information section on the Instance Details page. If the minor version is earlier than 3.3.8-1.99, update it. To create an instance, see Create an instance
-
A data catalog in DLF
-
A RAM user. To create one, see Create a RAM user
Step 1: Add a RAM user in StarRocks
DLF uses Resource Access Management (RAM) for access control. By default, StarRocks users have no permissions on DLF resources. Add an existing RAM user to StarRocks before granting DLF permissions.
-
Go to the instance list page.
-
Log on to the E-MapReduce console.
-
In the navigation pane on the left, choose EMR Serverless > StarRocks.
-
In the top menu bar, select the required region.
-
-
On the Instance List page, find your instance and click Connect in the Actions column. For more information, see Connect to a StarRocks instance using EMR StarRocks Manager. Connect using the admin user or a StarRocks super administrator account.
-
In the left-side menu, choose Security Center > User Management, then click Create User.
-
In the Create User dialog box, set the following parameters and click OK.
Parameter Value User Source RAM User Username Select the RAM user (for example, dlf-test)Password / Confirm Password Enter a custom password Roles Keep the default value public
Step 2: Grant catalog permissions in DLF
-
Log on to the Data Lake Formation console.
-
On the Catalogs page, click the name of your catalog.
-
Click the Permissions tab, then click Grant Permissions.
-
From the Select DLF User drop-down list, select the RAM user you added in Step 1 (for example,
dlf-test). -
Set Preset Permission Type to Custom and grant the ALL permission on the current data catalog and all its resources.
-
Click OK.
Step 3: Create a DLF catalog in StarRocks
Reconnect to the StarRocks instance using the RAM user you added in Step 1. All catalog creation and data access in the following steps uses this RAM user.
To create a query in the SQL Editor, go to the Querys page and click the
icon.
Paimon catalog
Run the following SQL statement to create a Paimon catalog backed by DLF:
CREATE EXTERNAL CATALOG `dlf_catalog`
PROPERTIES (
'type' = 'paimon',
'uri' = 'http://cn-hangzhou-vpc.dlf.aliyuncs.com',
'paimon.catalog.type' = 'rest',
'paimon.catalog.warehouse' = 'StarRocks_test',
'token.provider' = 'dlf'
);
Iceberg catalog
Run the following SQL statement to create an Iceberg catalog backed by DLF:
CREATE EXTERNAL CATALOG `iceberg_catalog`
PROPERTIES
(
'type' = 'iceberg',
'iceberg.catalog.type' = 'dlf_rest',
'uri' = 'http://cn-hangzhou-vpc.dlf.aliyuncs.com/iceberg',
'warehouse' = 'iceberg_test',
'rest.signing-region' = 'cn-hangzhou'
);
Iceberg foreign tables are read-only. You can run SELECT queries but cannot write data to Iceberg tables from StarRocks.
Step 4: Read and write data
Read and write data (Paimon catalog)
Create a database and table, insert data, and then run a query:
-- Create a database
CREATE DATABASE IF NOT EXISTS dlf_catalog.sr_dlf_db;
-- Create a table
CREATE TABLE dlf_catalog.sr_dlf_db.ads_age_pvalue_analytics (
final_gender_code STRING COMMENT 'Gender',
age_level STRING COMMENT 'Age level',
pvalue_level STRING COMMENT 'Consumption level',
clicks INT COMMENT 'Number of clicks',
total_behaviors INT COMMENT 'Total number of behaviors'
);
-- Insert data
INSERT INTO dlf_catalog.sr_dlf_db.ads_age_pvalue_analytics
(final_gender_code, age_level, pvalue_level, clicks, total_behaviors)
VALUES
('M', '18-24', 'Low', 1500, 2500),
('F', '25-34', 'Medium', 2200, 3300),
('M', '35-44', 'High', 2800, 4000);
-- Query data
SELECT * FROM dlf_catalog.sr_dlf_db.ads_age_pvalue_analytics;
The query returns the inserted rows:
Query data (Iceberg catalog)
SELECT * FROM iceberg_catalog.`default`.test_iceberg;
The query result:
Use DLF 1.0 (legacy)
Prerequisites
Before you begin, ensure that you have:
-
A Serverless StarRocks instance. To create one, see Create an instance
-
A data catalog in DLF 1.0. To create one, see Data catalog
Create a Hive catalog
Use the following syntax to create a Hive catalog that points to DLF 1.0 as the metastore:
Syntax
CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
"type" = "hive",
GeneralParams,
MetastoreParams
)
Parameters
| Parameter | Required | Description |
|---|---|---|
catalog_name |
Yes | Name of the Hive catalog. Must start with a letter and contain only letters (a–z, A–Z), digits (0–9), and underscores (_). Maximum 64 characters. |
comment |
No | Description of the Hive catalog. |
type |
Yes | Type of the data source. Set to hive. |
GeneralParams supports the following parameter:
| Parameter | Required | Description |
|---|---|---|
enable_recursive_listing |
No | Whether StarRocks recursively reads data from subdirectories of a table or partition directory. true (default): traverse subdirectories. false: read only the top-level directory. |
MetastoreParams specifies how StarRocks accesses Hive metadata:
| Parameter | Required | Description |
|---|---|---|
hive.metastore.type |
Yes | Type of the metadata service. Set to dlf. |
dlf.catalog.id |
No | ID of an existing data catalog in DLF 1.0. If not specified, the system uses the default DLF catalog. |
Example
CREATE EXTERNAL CATALOG hive_catalog
PROPERTIES
(
"type" = "hive",
"hive.metastore.type" = "dlf",
"dlf.catalog.id" = "sr_dlf"
);
For more information, see Hive catalog.
Create an Iceberg catalog
Use the following syntax to create an Iceberg catalog that points to DLF 1.0 as the metastore:
Syntax
CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
"type" = "iceberg",
MetastoreParams
)
Parameters
| Parameter | Required | Description |
|---|---|---|
catalog_name |
Yes | Name of the Iceberg catalog. Must start with a letter and contain only letters (a–z, A–Z), digits (0–9), and underscores (_). Maximum 64 characters. The name is case-sensitive. |
comment |
No | Description of the Iceberg catalog. |
type |
Yes | Type of the data source. Set to iceberg. |
MetastoreParams specifies how StarRocks accesses Iceberg metadata:
| Parameter | Required | Description |
|---|---|---|
iceberg.catalog.type |
Yes | Type of the Iceberg catalog. Set to dlf. |
dlf.catalog.id |
No | ID of an existing data catalog in DLF 1.0. If not specified, the system uses the default DLF catalog. |
Example
CREATE EXTERNAL CATALOG iceberg_catalog_hms
PROPERTIES
(
"type" = "iceberg",
"iceberg.catalog.type" = "dlf",
"dlf.catalog.id" = "sr_dlf"
);
For more information, see Iceberg catalog.
Create a Paimon catalog
Use the following syntax to create a Paimon catalog that points to DLF 1.0 as the metastore:
Syntax
CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
"type" = "paimon",
CatalogParams,
StorageCredentialParams
);
Parameters
| Parameter | Required | Description |
|---|---|---|
catalog_name |
Yes | Name of the Paimon catalog. Must start with a letter and contain only letters (a–z, A–Z), digits (0–9), and underscores (_). Maximum 64 characters. |
comment |
No | Description of the Paimon catalog. |
type |
Yes | Type of the data source. Set to paimon. |
CatalogParams specifies how StarRocks accesses Paimon metadata:
| Parameter | Required | Description |
|---|---|---|
paimon.catalog.type |
Yes | Type of the catalog. Set to dlf. |
paimon.catalog.warehouse |
Yes | Storage path of the warehouse where Paimon data is stored. Supports HDFS, OSS, and OSS-HDFS. For OSS or OSS-HDFS, use the format oss://<yourBucketName>/<yourPath>. |
dlf.catalog.id |
No | ID of an existing data catalog in DLF 1.0. If not specified, the system uses the default DLF catalog. |
StorageCredentialParams specifies how StarRocks accesses file storage:
-
If you use HDFS, no additional configuration is required.
-
If you use OSS or OSS-HDFS, add the following parameter:
ImportantAfter setting
aliyun.oss.endpoint, go to the Parameter Configuration page in the EMR Serverless StarRocks console and updatefs.oss.endpointin bothcore-site.xmlandjindosdk.cfgto match this value.Parameter Description aliyun.oss.endpointThe endpoint of your OSS or OSS-HDFS storage. For OSS, find the endpoint on the Overview page of your bucket under the Port section, or see OSS regions and endpoints. Example: oss-cn-hangzhou.aliyuncs.com. For OSS-HDFS, find the endpoint under OSS-HDFS in the Port section. Example:cn-hangzhou.oss-dls.aliyuncs.com."aliyun.oss.endpoint" = "<YourAliyunOSSEndpoint>"
Example
CREATE EXTERNAL CATALOG paimon_catalog
PROPERTIES
(
"type" = "paimon",
"paimon.catalog.type" = "dlf",
"paimon.catalog.warehouse" = "oss://<yourBucketName>/<yourPath>",
"dlf.catalog.id" = "paimon_dlf_test"
);
For more information, see Paimon catalog.