All Products
Search
Document Center

E-MapReduce:Use a DLF Catalog

Last Updated:Dec 04, 2025

Data Lake Formation (DLF) is a fully managed service for storing and managing Paimon metadata and data. It supports multiple storage optimization policies to provide secure and high-performance data lake management. This topic describes how to use an Alibaba Cloud DLF catalog in EMR Serverless StarRocks.

Background

Alibaba Cloud Data Lake Formation (DLF) is a fully managed platform that provides unified metadata, data storage, and management services.

Use DLF

Prerequisites

  • You have created a Serverless StarRocks instance. For more information, see Create an instance.

    The instance must be version 3.3 or later, with a Minor Version of 3.3.8-1.99 or later.

    Note

    You can view the minor version in the Version Information section on the Instance Details page. If the minor version is earlier than 3.3.8-1.99, you must update it. For more information, see Update the minor version.

  • You have created a data catalog in DLF.

Example: Use a DLF Catalog

Step 1: Add a user in Serverless StarRocks

Important

DLF uses Resource Access Management (RAM) for access control. By default, StarRocks users do not have any permissions on DLF resources. You must add an existing RAM user and grant the required permissions to that user. If you have not created a RAM user, see Create a RAM user.

  1. Go to the EMR Serverless StarRocks instance list page.

    1. Log on to the E-MapReduce console.

    2. In the navigation pane on the left, choose EMR Serverless > StarRocks.

    3. In the top menu bar, select the required region.

  2. On the Instance List page, find your instance and click Connect in the Actions column. For more information, see Connect to a StarRocks instance using EMR StarRocks Manager.

    You can connect to the StarRocks instance using the admin user or a StarRocks super administrator account.

  3. In the left-side menu, choose Security Center > User Management, and then click Create User.

  4. In the Create User dialog box, configure the following parameters and click OK.

    • User Source: Select RAM User.

    • Username: Select the RAM user from the previous step (dlf-test).

    • Password and Confirm Password: Enter a custom password.

    • Roles: Keep the default value public.

Step 2: Grant permissions on the catalog in DLF

  1. Log on to the Data Lake Formation console.

  2. On the Catalogs page, click the name of your catalog.

  3. Click the Permissions tab, and then click Grant Permissions.

  4. From the Select DLF User drop-down list, select the RAM user (dlf-test).

  5. Set Preset Permission Type to Custom and grant the ALL permission on the current data catalog and all its resources to the user.

  6. Click OK.

Step 3: Create a DLF Catalog in Serverless StarRocks

Paimon Catalog

  1. Connect to the instance. For more information, see Connect to a StarRocks instance using EMR StarRocks Manager.

    Important

    Reconnect to the StarRocks instance using the RAM user that you added in Step 1 (dlf-test). You will use this user to create an SQL query to access the DLF foreign table.

  2. To create an SQL query, go to the Querys page in SQL Editor and click the image icon.

  3. Create a Paimon Catalog. Enter the following SQL statement and click Run.

    CREATE EXTERNAL CATALOG `dlf_catalog`
    PROPERTIES (
    'type' = 'paimon',
    'uri' = 'http://cn-hangzhou-vpc.dlf.aliyuncs.com',
    'paimon.catalog.type' = 'rest',
    'paimon.catalog.warehouse' = 'StarRocks_test',
    'token.provider' = 'dlf'
    );
  4. Read and write data.

    1. Create a database.

      CREATE DATABASE IF NOT EXISTS dlf_catalog.sr_dlf_db;
    2. Create a data table.

      CREATE TABLE dlf_catalog.sr_dlf_db.ads_age_pvalue_analytics(
          final_gender_code STRING COMMENT 'Gender',
          age_level STRING COMMENT 'Age level',
          pvalue_level STRING COMMENT 'Consumption level',
          clicks INT COMMENT 'Number of clicks',
          total_behaviors INT COMMENT 'Total number of behaviors'
      );
    3. Insert data.

      INSERT INTO dlf_catalog.sr_dlf_db.ads_age_pvalue_analytics (final_gender_code, age_level, pvalue_level, clicks, total_behaviors)
      VALUES 
      ('M', '18-24', 'Low', 1500, 2500),
      ('F', '25-34', 'Medium', 2200, 3300),
      ('M', '35-44', 'High', 2800, 4000);
    4. Query data.

      SELECT * FROM dlf_catalog.sr_dlf_db.ads_age_pvalue_analytics;

      The following figure shows the query result.

      image

Iceberg Catalog

  1. Connect to the instance. For more information, see Connect to a StarRocks instance using EMR StarRocks Manager.

    Important

    Reconnect to the StarRocks instance using the RAM user that you added in Step 1 (dlf-test). You will use this user to create an SQL query to access the DLF foreign table.

  2. On the Querys page of the SQL Editor, click the image icon to create an SQL query.

  3. Create an Iceberg Catalog. Enter the following SQL statement and click Run.

    CREATE EXTERNAL CATALOG `iceberg_catalog`
    PROPERTIES
    ( 
        'type' = 'iceberg',
        'iceberg.catalog.type' = 'dlf_rest',
        'uri' = 'http://cn-hangzhou-vpc.dlf.aliyuncs.com/iceberg',
        'warehouse' = 'iceberg_test',
        'rest.signing-region' = 'cn-hangzhou'
    );
  4. Query data.

    Note

    Iceberg foreign tables are read-only in StarRocks. You can execute SELECT queries, but you cannot write data to Iceberg tables from StarRocks.

    select * from  iceberg_catalog.`default`.test_iceberg;

    The following figure shows the query result.

    image

Use DLF 1.0 (legacy)

Prerequisites

  • You have created a Serverless StarRocks instance. For more information, see Create an instance.

  • You have created a data catalog in DLF 1.0 (legacy). For more information, see Data Catalog.

Create a catalog

Create a Hive Catalog

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
    "type" = "hive",
    GeneralParams,
    MetastoreParams
)

Parameters

  • catalog_name: The name of the Hive catalog. This parameter is required. The name must meet the following requirements:

    • It must start with a letter and can contain only letters (a-z or A-Z), numbers (0-9), and underscores (_).

    • The total length cannot exceed 64 characters.

  • comment: The description of the Hive catalog. This parameter is optional.

  • type: The type of the data source. Set this to hive.

  • GeneralParams: A set of parameters for general settings. GeneralParams includes the following parameter.

    Parameter

    Required

    Description

    enable_recursive_listing

    No

    Specifies whether StarRocks recursively reads data from files in a table or partition directory, including its subdirectories. Valid values:

    • true (default): Recursively traverses the directory.

    • false: Reads data only from files at the current level of the table or partition directory.

  • MetastoreParams: Parameters related to how StarRocks accesses the metadata of the Hive cluster.

    Property

    Description

    hive.metastore.type

    The type of metadata service used by Hive. Set this to dlf.

    dlf.catalog.id

    The ID of an existing data catalog in DLF 1.0. This parameter is required only when hive.metastore.type is set to dlf. If the dlf.catalog.id parameter is not specified, the system uses the default DLF Catalog.

Example

CREATE EXTERNAL CATALOG hive_catalog
PROPERTIES
(
    "type" = "hive",
    "hive.metastore.type" = "dlf",
    "dlf.catalog.id" = "sr_dlf"
);

For more information about Hive Catalogs, see Hive Catalog.

Create an Iceberg Catalog

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
    "type" = "iceberg",
    MetastoreParams
)

Parameters

  • catalog_name: The name of the Iceberg catalog. This parameter is required. The name must meet the following requirements:

    • It must consist of letters (a-z or A-Z), digits (0-9), or underscores (_), and must start with a letter.

    • The total length cannot exceed 64 characters.

    • The catalog name is case-sensitive.

  • comment: The description of the Iceberg catalog. This parameter is optional.

  • type: The type of the data source. Set this to iceberg.

  • MetastoreParams: The parameters for StarRocks to access the metadata service of the Iceberg cluster.

    Property

    Description

    iceberg.catalog.type

    The type of catalog in Iceberg. The value must be dlf.

    dlf.catalog.id

    The ID of an existing data catalog in DLF. If you do not configure the dlf.catalog.id parameter, the system uses the default DLF catalog.

Example

CREATE EXTERNAL CATALOG iceberg_catalog_hms
PROPERTIES
(
    "type" = "iceberg",
    "iceberg.catalog.type" = "dlf",
    "dlf.catalog.id" = "sr_dlf"
);

For more information about Iceberg Catalogs, see Iceberg Catalog.

Create a Paimon Catalog

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
    "type" = "paimon",
    CatalogParams,
    StorageCredentialParams
);

Parameters

  • catalog_name: The name of the Paimon catalog. This parameter is required. The name must meet the following requirements:

    • It must start with a letter and can contain only letters (a-z or A-Z), digits (0-9), or underscores (_).

    • The total length cannot exceed 64 characters.

  • comment: The description of the Paimon catalog. This parameter is optional.

  • type: The type of the data source. Set this parameter to paimon.

  • CatalogParams: The parameters for StarRocks to access the metadata of the Paimon cluster.

    Property

    Required

    Description

    paimon.catalog.type

    Yes

    The type of the data source. The value is dlf.

    paimon.catalog.warehouse

    Yes

    The storage path of the warehouse where Paimon data is stored. HDFS, OSS, and OSS-HDFS are supported. The format for OSS or OSS-HDFS is oss://<yourBucketName>/<yourPath>.

    Important

    If you use OSS or OSS-HDFS as the warehouse, you must configure the aliyun.oss.endpoint parameter. For more information, see StorageCredentialParams: Parameters for StarRocks to access the file storage of the Paimon cluster.

    dlf.catalog.id

    No

    The ID of an existing data catalog in DLF. If you do not configure the dlf.catalog.id parameter, the system uses the default DLF catalog.

  • StorageCredentialParams: The parameters for StarRocks to access the file storage of the Paimon cluster.

    • If you use HDFS as the storage system, you do not need to configure StorageCredentialParams.

    • If you use OSS or OSS-HDFS, you must configure StorageCredentialParams.

      "aliyun.oss.endpoint" = "<YourAliyunOSSEndpoint>" 

      The parameters are described in the following table.

      Property

      Description

      aliyun.oss.endpoint

      The endpoint information for OSS or OSS-HDFS is as follows:

      • OSS: Go to the Overview page of your bucket and find the endpoint in the Port section. You can also see OSS regions and endpoints to view the endpoint of the corresponding region. For example, oss-cn-hangzhou.aliyuncs.com.

      • OSS-HDFS: Go to the Overview page of your bucket and find the endpoint for the OSS-HDFS in the Port section. For example, the endpoint for the China (Hangzhou) region is cn-hangzhou.oss-dls.aliyuncs.com.

        Important

        After you configure this parameter, you must also go to the Parameter Configuration page in the EMR Serverless StarRocks console. Then, modify the fs.oss.endpoint parameter in core-site.xml and jindosdk.cfg to be consistent with the value of aliyun.oss.endpoint.

Example

CREATE EXTERNAL CATALOG paimon_catalog
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "dlf",
    "paimon.catalog.warehouse" = "oss://<yourBucketName>/<yourPath>",
    "dlf.catalog.id" = "paimon_dlf_test"
);

For more information about Paimon Catalogs, see Paimon Catalog.

References

For more information about Paimon Catalogs, see Paimon Catalog.