All Products
Search
Document Center

Hologres:access paimon data using dlf catalog

Last Updated:Mar 07, 2026

Hologres V3.2 and later versions support accessing Paimon data using a DLF Catalog. This feature provides a unified interface and improves the efficiency of metadata management.

Background information

Alibaba Cloud Data Lake Formation (DLF) is a fully managed and unified platform for metadata, data storage, and data management. It provides features such as metadata management, permission management, and storage optimization. This platform unifies metadata, lake table formats, and data storage, which significantly simplifies O&M tasks during data lake construction and management. This allows enterprises to focus on business innovation and data insights. For more information, see What Is Data Lake Formation?.

Feature Details

  • Hologres V3.2 and later versions support the following features:

    • Connect to a DLF Catalog using an External Database. For more information, see EXTERNAL DATABASE.

      • Create or delete databases in a DLF Catalog using Create/Drop External Schema.

      • Create or delete tables in a DLF Catalog using Create/Drop External Table.

      • View all databases and tables in a DLF Catalog.

    • Use a Dynamic Table to read Paimon table data in full and incremental modes. For more information, see CREATE DYNAMIC TABLE.

    • Use ANALYZE and AUTO ANALYZE to collect statistics information for foreign tables. This helps the optimizer generate the optimal query plan. For more information, see ANALYZE and AUTO ANALYZE.

    • External table data mirroring.

    • TimeTravel: Query historical snapshots of Paimon tables based on a timestamp, version, or tag.

    • Branch query: Query Paimon branch data and historical snapshots based on a branch.

    • Query Paimon system tables, such as Snapshots and Tags.

  • Hologres V4.0 and later versions support various DML operations:

    • Write to Paimon primary key tables with multiple concurrent operations. This feature supports both INSERT INTO and INSERT OVERWRITE. The write process is optimized to shuffle data by bucket before writing, which reduces the generation of small files.

    • Provide a memory-based caching mechanism to support writing to Paimon tables that require caching. Examples include tables where the Merge Engine is First Row, tables that generate a Changelog through a lookup mechanism, or tables with Deletion Vector enabled.

    • Support UPDATE and DELETE operations on Paimon tables.

    • Hologres V3.2.6 and V4.0.6 and later versions support reading data from tags that correspond to expired Paimon snapshots.

Prerequisites

Limits

  • You can access only DLF services in the same region.

  • You cannot visually create DLF Catalogs using the OSS data lake acceleration feature in HoloWeb.

  • Only Paimon foreign tables that are mapped using an External Database support TimeTravel queries, branch queries, and data mirroring.

  • TimeTravel queries on tables with schema evolution are not supported.

  • Dynamic Table does not support TimeTravel queries for Paimon tables and branches.

  • External table mirroring does not support Paimon primary key tables.

  • Reading default partitioned data from Paimon, such as partitions with empty or NULL values, is not supported.

  • Read-only replica instances do not support data lake acceleration.

  • You cannot execute the TRUNCATE command on foreign tables. Instead, you can use INSERT OVERWRITE to insert empty data.

  • Writing to Paimon dynamic bucketing tables is not supported. These are Paimon tables that are created without specifying the bucket parameter or with bucket = -1.

  • The UPDATE operation is supported only on Paimon primary key tables where the Merge Engine is deduplicate or partial-update. Updating primary keys is not supported.

  • Only supports DELETE operations on Paimon primary key tables where the merge engine is set to deduplicate or partial-update (requires enabling the partial-update.remove-record-on-delete property).

  • Foreign tables do not support mixed DML transactions. Only single DML operations implement a two-phase commit protocol to ensure atomicity during distributed writes.

Map DLF Catalog Using External Database

Create an External Database (SLR Mode)

A service-linked role (SLR) is a type of RAM role whose trusted entity is an Alibaba Cloud service. SLRs are designed to manage authorized access across different Alibaba Cloud services. In most cases, when you use a specific feature, the associated Alibaba Cloud service automatically creates or deletes the SLR. You do not need to create or delete the role manually. A service-linked role simplifies the process of authorizing a service to access other services and reduces the risks of misoperation. For more information, see Service-linked Role.

Scenarios

The SLR mode applies to scenarios where you access External Database data using your current user identity. Hologres uses an SLR and identity pass-through by default to achieve cross-cloud product access.

Procedure

  1. For instances that are newly purchased or upgraded to Hologres V2.2 or later, the system automatically creates a service-linked role. You do not need to create it manually. You can skip this step.

    Note

    For Hologres instances that run versions earlier than V2.2, you must use the Resource Access Management Quick Authorization page to create a service-linked role.

  2. Connect to the Hologres instance and create an External Database.

    CREATE EXTERNAL DATABASE EXT_DATABASE_NAME WITH
      catalog_type 'paimon',
      metastore_type 'dlf-rest',
      dlf_catalog 'DLF_CATALOG_NAME',
      comment 'dlf catalog';
    
  3. View the schemas and tables in the External Database.

    -- Query schemas in the External Database
    SELECT * FROM hologres.hg_external_schemas('<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME'</span>);
    
    -- Query tables in the schema
    SELECT * FROM hologres.hg_external_tables ('<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME'</span>, '<span class="var-span" contenteditable="true" data-var="EXT_SCHEMA_NAME">EXT_SCHEMA_NAME'</span>);
  4. Query table data.

    SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;

Create an External Database (STS Mode)

Alibaba Cloud Security Token Service (STS) is a service that you can use to manage temporary credentials. Resource Access Management (RAM) provides two types of identities: RAM users and RAM roles. RAM roles do not have permanent identity credentials. Instead, they can obtain temporary identity credentials from STS that have customizable validity periods and access permissions. These credentials are known as Security Token Service (STS) tokens. For more information, see What is STS.

Scenarios

The STS mode applies to scenarios that involve cross-account access and scenarios where Hologres custom accounts (BASIC accounts) access External Database data.

Procedure

  1. Log on to the Resource Access Management (RAM) console, create a RAM role, and grant the AliyunDLFFullAccess or AliyunDLFReadOnlyAccess permission for DLF access. For more information, see Create a RAM Role and Grant Permissions.

  2. Update the trust policy to allow the RAM role to be assumed by the trusted Alibaba Cloud service Hologres. For more information, see Modify the trust policy of a RAM role.

    {
      "Statement": [
        {
          "Action": "sts:AssumeRole",
          "Effect": "Allow",
          "Principal": {
            "Service": [
              "hologres.aliyuncs.com"
            ]
          }
        }
      ],
      "Version": "1"
    }
  3. Log on to the Data Lake Formation console and grant access permissions for databases and tables to the RAM role in the target Catalog.

  4. Create an External Database in Hologres. Specify the previously created RAM role as the authentication policy.

    CREATE EXTERNAL DATABASE EXT_DATABASE_NAME WITH
      catalog_type 'paimon'
      metastore_type 'dlf-rest'
      dlf_catalog 'paimon_catalog' 
      rolearn 'acs:ram::106380604****:role/***-ramrole'
      comment 'dlf catalog'
      ;
  5. Create a USER MAPPING in Hologres to associate the RAM user or BASIC account that needs to access the DLF Catalog with the RAM role.

    CREATE USER MAPPING FOR "<RAM user | BASIC account>"
    EXTERNAL DATABASE EXT_DATABASE_NAME
    OPTIONS
    (
       rolearn 'acs:ram::10638060***:role/***ramrole'
    );
    Note
  6. After you log on to the Hologres console using a RAM user or re-connecting to a Hologres instance using a BASIC account, you can execute the following SQL query.

    SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;
  • For more information about operations related to an External Database, see EXTERNAL DATABASE.

  • For more information about operations related to an External Schema, see EXTERNAL SCHEMA.

  • For more information about External Table operations, see EXTERNAL TABLE.

Collect external table statistics

You can use the ANALYZE and AUTO ANALYZE commands to collect statistics about table content in your database. The optimizer uses these statistics to generate optimal query plans and improve query performance. For more information, see ANALYZE and AUTO ANALYZE.

-- Update statistics for a table. By default, this command collects statistics for all columns in the table.
analyze EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;

-- Update statistics for specific columns. This command samples more data than a full-table update and provides more accurate results. Use this command mainly for columns that appear in filter conditions.
analyze EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME(COLNAME, COLNAME);

-- Enable Auto Analyze
ALTER EXTERNAL DATABASE EXT_DATABASE_NAME WITH enable_auto_analyze 'true';

TimeTravel Query

Starting from V3.2, Hologres supports historical snapshot queries on Paimon tables based on a timestamp, version, or tag, which enables time travel for data. Hologres also supports querying the Paimon Snapshots and Tags system tables to retrieve snapshot details for the target table. For more information about system tables, see Paimon System Tables.

Query Paimon Snapshots System Table

SELECT * FROM hologres.hg_list_snapshots('EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME');

The fields in the Snapshots system table correspond to the Paimon system table fields as follows:

Snapshots System Table Field

Type

Description

Corresponding Paimon System Table Field

branch_name

TEXT

Branch name.

branch_name

snapshot_id

TEXT

Snapshot ID.

snapshot_id

schema_id

TEXT

Table schema ID.

schema_id

commit_kind

TEXT

Table data commit type.

commit_kind

commit_time

TIMESTAMPTZ

Commit time.

commit_time

extend_info

TEXT(JSON)

Remaining properties of the Paimon Snapshots system table.

Remaining properties of the Paimon Snapshots system table

You can also query the Paimon Snapshots system table using the hologres.hg_list_versions function. For Paimon table queries, Version corresponds to Snapshot ID.

SELECT * FROM hologres.hg_list_versions('EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME');

Query Paimon Tag System Table

SELECT * FROM hologres.hg_list_tags('EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME');

The fields in the Tag system table correspond to the Paimon system table fields as follows:

Tag System Table Field

Type

Description

Corresponding Paimon System Table Field

branch_name

TEXT

Branch name.

branch_name

tag_name

TEXT

Tag name.

tag_name

snapshot_id

TEXT

Snapshot ID.

snapshot_id

schema_id

TEXT

Table schema ID.

schema_id

commit_time

TIMESTAMPTZ

Submission time

commit_time

extend_info

TEXT(JSON)

Remaining properties of the Paimon Tag system table.

Remaining properties of the Paimon Tag system table

Query Paimon Historical Snapshots Based on Timestamp

SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME FOR TIMESTAMP AS OF 'TIMESTAMP';

Query Paimon Historical Snapshots Based on Version

For Paimon tables, Version corresponds to Snapshot ID.

SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME'</span> FOR VERSION AS OF '<span class="var-span" contenteditable="true" data-var="VERSION">VERSION'</span>;

Query Paimon Historical Snapshots Based on Tag

SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME'</span> FOR TAG AS OF '<span class="var-span" contenteditable="true" data-var="TAG">TAG'</span>;

Branch Query

Starting from V3.2, Hologres supports querying data from a specified branch of a Paimon table and supports the branch fallback property. For more information about Paimon branches, see Manage Branch.

Query Branch System Table

SELECT * FROM hologres.hg_list_branches('EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME');

The fields in the Branch system table correspond to the Paimon system table fields as follows:

Field

Type

Description

Corresponding Paimon System Table Field

branch_name

TEXT

Branch name.

branch_name

create_time

TIMESTAMPTZ

Branch creation time.

create_time

extend_info

TEXT(JSON)

Remaining properties of the Paimon Branch system table.

Remaining properties of the Paimon Branch system table

Query Specified Branch Data

SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME'</span> FOR branch AS OF '<span class="var-span" contenteditable="true" data-var="TAG">TAG'</span>;
Note

If the scan.fallback-branch property is set on a Paimon table, when a query job reads data from the current branch and a partition is missing, the reader attempts to read that partition's data from the fallback branch. For more information about fallback branches, see Manage Branch.

Query Historical Snapshots of a Specified Branch

-- Query based on Timestamp
SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME'</span> FOR branch AS OF '<span class="var-span" contenteditable="true" data-var="BRANCH_NAME">BRANCH_NAME'</span> TIMESTAMP AS OF '<span class="var-span" contenteditable="true" data-var="TIMESTAMP">TIMESTAMP'</span>;

-- Query based on Tag
SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME'</span> FOR branch AS OF '<span class="var-span" contenteditable="true" data-var="BRANCH_NAME">BRANCH_NAME'</span> TAG AS '<span class="var-span" contenteditable="true" data-var="TAG_NAME">TAG_NAME'</span>;

Lake Table Mirroring Acceleration

Starting from V3.2, Hologres supports the lake table image feature to accelerate data lake queries. This feature supports metadata and data images and synchronizes changes from external data sources to Hologres in near-real-time or at specified intervals. It also supports full table images and partial images of partitioned tables. You can use parameters to flexibly control the scope of the image data. For more information, see Lake Table Image.

Consume Paimon Tables Using Foreign Table

  1. Create a Foreign Server.

    For more information about Foreign Server operations, see More Foreign Server Operations (SQL Method).

    DLF syntax

    Note

    This syntax applies to DLF service V2.5 and later.

    The SQL syntax is as follows:

    -- Create a foreign server.
    CREATE SERVER IF NOT EXISTS SERVER_NAME FOREIGN DATA WRAPPER dlf_fdw OPTIONS (
        catalog_type 'paimon',
        metastore_type 'dlf-rest', 
        dlf_catalog '<span class="var-span" contenteditable="true" data-var="DLF_CATALOG_NAME">DLF_CATALOG_NAME</span>'
    );

    DLF 1.0 syntax

    Note

    This syntax applies to DLF service V1.0.

    The SQL syntax is as follows:

    -- Create a foreign server
    CREATE SERVER IF NOT EXISTS SERVER_NAME FOREIGN DATA WRAPPER dlf_fdw OPTIONS (
      dlf_region '',
      dlf_endpoint 'dlf-share..aliyuncs.com',
      oss_endpoint 'oss-REGION_ID-internal.aliyuncs.com',
      dlf_catalog 'dlf_catalog_name'                 
    );
  2. Create a foreign table.

    • Use the IMPORT FOREIGN SCHEMA method

      For more information about how to use IMPORT FOREIGN SCHEMA, see IMPORT FOREIGN SCHEMA.

      • Create a foreign table using IMPORT FOREIGN SCHEMA

        -- Create a foreign table using IMPORT FOREIGN SCHEMA
        IMPORT FOREIGN SCHEMA DLF_DB_NAME
        FROM SERVER SERVER_NAME 
        INTO <holo+schema_name> 
        options (if_table_exist 'update');
      • Create a foreign table by specifying table names

        IMPORT FOREIGN SCHEMA DLF_DB_NAME
        limit to (table_name[,....]) 
        FROM SERVER SERVER_NAME 
        into <holo+schema_name>
        options (if_table_exist 'update');
    • Use the CREATE FOREIGN TABLE method

      For more information, see CREATE FOREIGN TABLE.

      CREATE FOREIGN TABLE FOREIGN_TABLE_NAME
      (
        { column_name data_type }
        [, ... ]
      ) 
      SERVER SERVER_NAME
      options
      (
        schema_name 'DLF_DB_NAME',
        table_name 'DLF_TABLE_NAME'
      );

Consume Paimon Tables Using Dynamic Table

A Dynamic Table can automatically process and store aggregated data from one or more base tables and includes built-in data refresh strategies. You can configure different data refresh strategies based on your business needs to enable automatic data flow from base tables to the Dynamic Table to meet requirements such as unified development, automatic data flow, and timely processing. Starting with Hologres V3.0, MaxCompute foreign tables and DLF foreign tables are supported as base tables for a Dynamic Table. For more information, see Dynamic Table.

Full Refresh

CREATE DYNAMIC TABLE paimon_dt_table
WITH (
      auto_refresh_mode='full',
      freshness='3 minutes'
)AS SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;

Incremental Refresh

CREATE DYNAMIC TABLE paimon_dt_table_increamental
WITH (
      auto_refresh_mode='incremental',
      freshness='3 minutes'
)AS SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;

Auto-Refresh

CREATE DYNAMIC TABLE paimon_dt_table_auto
WITH (
      auto_refresh_mode='auto',
      freshness='3 minutes'
)AS * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;

DML Operations

INSERT INTO

-- Create a Paimon primary key table and insert data using Hologres SQL

CREATE EXTERNAL TABLE  EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME (
id BIGINT,
customer_id BIGINT,
item TEXT,
amount FLOAT,
dt TEXT,
PRIMARY KEY (id)
)
WITH (
"table_format" = 'paimon',
"file_format" = 'orc',
"merge-engine" = 'deduplicate', -- This is an example merge policy. You can modify it as needed.
"bucket" = '4' -- This is an example number of buckets.
);

INSERT INTO EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME (id, customer_id, item, amount, dt) VALUES
(1, 1001, 'apple', 3.50, '2025-09-10'),
(2, 1002, 'banana', 2.00, '2025-09-10'),
(3, 1003, 'orange', 4.20, '2025-09-11');

INSERT OVERWRITE

-- Create a Paimon append-only table and insert data using Hologres SQL

CREATE EXTERNAL TABLE EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME (
id BIGINT,
name TEXT,
amount FLOAT8,
dt TEXT
)
LOGICAL PARTITION BY LIST(dt)
WITH (
"table_format" = 'paimon',
"file_format" = 'parquet',
"merge-engine" = 'deduplicate', 
"bucket-key" = 'id',
"bucket" = '2'
);

INSERT OVERWRITE EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME(id, name, amount, dt) VALUES
(1, 'Alice', 100.50, '2025-09-01'),
(2, 'Bob', 200.00, '2025-09-01'),
(3, 'Carol', 150.75, '2025-09-02');

UPDATE

UPDATE EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME SET amount= 5 where id= 1;

DELETE

DELETE FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME WHERE id = 2;