All Products
Search
Document Center

Hologres:Access a catalog in DLF

Last Updated:Dec 01, 2025

Hologres V3.2 and later lets you access Paimon catalog data in Data Lake Formation (DLF). This feature provides a unified interface for more efficient metadata management.

Background information

DLF is a fully managed platform that provides unified metadata, data storage, and data management. It offers features such as metadata management, permission management, and storage optimization. By unifying metadata, lake table formats, and data storage, DLF simplifies the operations and maintenance (O&M) involved in building and managing data lakes. This allows businesses to focus on innovation and data insights.

Features

  • Hologres V3.2+ support these features:

    • Connect to a catalog in DLF using an external database.

      • Use CREATE EXTERNAL SCHEMA or DROP EXTERNAL SCHEMA to cascade the creation or deletion of databases in the catalog.

      • Use CREATE EXTERNAL TABLE or DROP EXTERNAL TABLE to cascade the creation or deletion of tables.

      • Query all databases and tables in a catalog.

    • Read data from Paimon tables in full or incrementally using dynamic tables.

    • Collect statistics for external tables by using ANALYZE and AUTO ANALYZE. This helps the optimizer generate optimal query plans.

    • Data mirroring of external tables.

    • Time Travel: Query historical snapshots of Paimon tables based on a timestamp, version, or tag.

    • Branch query: Query Paimon branch data and historical snapshots based on a branch.

    • Query Paimon system tables, such as snapshots and tags tables.

  • Hologres V4.0 and later supports various DML operations:

    • Perform concurrent writes to Paimon primary key tables using INSERT INTO or INSERT OVERWRITE. The write process is optimized to support shuffling by bucket before writing, which reduces the generation of small files.

    • A memory-based caching mechanism is provided to support writes to Paimon tables that require caching. Examples include tables with the first-row merge engine, lookup changelog producer, or with the deletion vector enabled.

    • UPDATE and DELETE operations on Paimon tables.

    • Read data from tags corresponding to expired Paimon snapshots (requires V3.2.6+ and V4.0.6+).

Prerequisites

Limitations

  • Your Hologres instance can access only DLF catalogs in the same region.

  • You cannot use OSS Data Lake Acceleration in HoloWeb to create a catalog in DLF.

  • Only Paimon tables mapped to Hologres via CREATE EXTERNAL DATABASE support Time Travel, branch queries, and data mirroring.

  • Time Travel is not supported for tables with evolved schema.

  • Paimon-based dynamic tables do not support Time Travel or branch queries.

  • Paimon primary-key tables cannot be mapped as external tables.

  • Accessing the Paimon default partition is not supported.

  • Read-only secondary instances do not support data lake acceleration.

  • The TRUNCATE command is not supported for external/foreign tables. Use INSERT OVERWRITE with empty data instead.

  • Writing to Paimon tables with dynamic bucket strategies is not supported. These are tables created without specifying the bucket parameter or with bucket = -1.

  • The UPDATE operation is supported only for Paimon primary key tables with the deduplicate or partial-update merge engine. Updating the primary key is not supported.

  • The DELETE operation is supported only for Paimon primary key tables that use either the deduplicate merge engine or the partial-update merge engine with partial-update.remove-record-on-delete enabled.

  • External/foreign tables do not support multiple DML statements in a transaction. Two-phase commit is implemented within a DML operation to ensure the atomicity of distributed writes.

Map a Paimon catalog as an external database

Use service-linked role

A service-linked roles (SLR) is a RAM role whose trusted entity is an Alibaba Cloud service. SLRs are designed to resolve authorization issues for cross-service access. In most cases, an SLR is automatically created or deleted when you use a specific feature. SLRs aid with permission configurations, ensuring a service runs as expected while preventing misoperations.

Use case

SLR suits users who use their current identity to access an external database. By default, Hologres uses SLRs and identity pass-through to access another Alibaba Cloud service.

Procedure

  1. (Optional) Create an SLR in Quick Authorization for Resource Access Management.

    Note

    This step is required if you use Holores V2.2 or earlier instances. For newly created instances or V2.2 or later instances, skip this step.

  2. Connect to the Hologres instance and map your Paimon catalog as an external database.

    CREATE EXTERNAL DATABASE <ext_database_name> WITH
      catalog_type 'paimon'
      metastore_type 'dlf-rest'
      dlf_catalog '<dlf_catalog_name>' 
      comment 'paimon catalog on dlf'
      ;
  3. View the schemas and tables in the Paimon catalog.

    -- Query the schemas in the catalog.
    SELECT * FROM hologres.hg_external_schemas('<ext_database_name>');
    
    -- Query the tables in the schema.
    SELECT * FROM hologres.hg_external_tables ('<ext_database_name>', '<ext_schema_name>');
  4. Query data in a table.

    SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;

Use STS

Resource Access Management (RAM) provides two types of identities: RAM users and RAM roles. RAM roles don't have permanent identity credentials. Instead, they obtain temporary identity credentials, called STS tokens, from Service Token Service (STS). You can define the validity period and access permissions of STS tokens.

Use case

STS is suitable for accessing a Paimon catalog across accounts or using a custom Hologres account (BASIC account).

Procedure

  1. Log on to the Resource Access Management (RAM) console, create a RAM role, and grant the AliyunDLFFullAccess or AliyunDLFReadOnlyAccess permission to the role. For more information, see Create a RAM role and grant permissions to the RAM role.

  2. Add a trust policy to allow the RAM role to be assumed by Hologres. For more information, see Modify the trust policy of a RAM role.

    {
      "Statement": [
        {
          "Action": "sts:AssumeRole",
          "Effect": "Allow",
          "Principal": {
            "Service": [
              "hologres.aliyuncs.com"
            ]
          }
        }
      ],
      "Version": "1"
    }
  3. Log on to the DLF console and grant the RAM role the permissions to access the database and tables in the Paimon catalog.

  4. Create an external database in Hologres and specify the RAM role for authentication.

    CREATE EXTERNAL DATABASE <ext_database_name> WITH
      catalog_type 'paimon'
      metastore_type 'dlf-rest'
      dlf_catalog 'paimon_catalog' 
      rolearn 'acs:ram::106380604****:role/***-ramrole'
      comment 'paimon catalog on dlf'
      ;
  5. Create a user mapping in Hologres to associate the RAM role with the RAM user or BASIC account who accesses the DLF catalog.

    CREATE USER MAPPING FOR "<RAM_user|BASIC_account>"
    EXTERNAL DATABASE <ext_database_name>
    OPTIONS
    (
       rolearn 'acs:ram::10638060***:role/***ramrole'
    );
    Note
  6. Log on to the Hologres Management Console as a RAM user or reconnect to the Hologres instance as a BASIC account. Then, run the following SQL statement.

    SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;

Related operations for external databases

View external databases in an instance

SELECT database_name, options FROM hologres.hg_external_databases();

Refresh the metadata of an external database

REFRESH CACHE FOR EXTERNAL DATABASE <ext_database_name> WITH( cache_level = 'metadata');

View the schemas in an external database

SELECT * FROM hologres.hg_external_schemas('<ext_database_name>');

Related operations for external schemas

Create an external schema in a Paimon catalog

When you create an external schema, a database is physically created in the DLF Paimon catalog:

-- Create an external schema.
CREATE EXTERNAL SCHEMA [IF NOT EXISTS] <ext_database_name>.<ext_schema_name>;

Refresh the metadata of an external schema

REFRESH CACHE FOR EXTERNAL SCHEMA <ext_database_name>.<ext_schema_name> WITH( cache_level = 'metadata'); 

Drop an external schema

When you drop an external schema, the database and all tables in the database are also dropped from the Paimon catalog. Perform this operation with caution.

-- Delete an external schema.
DROP EXTERNAL SCHEMA [IF NOT EXISTS] <ext_database_name>.<ext_schema_name>;

View tables in a schema

SELECT * FROM hologres.hg_external_tables ('<ext_database_name>', '<ext_schema_name>');

Related operations for external tables

Create an external table in a Paimon catalog

-- Create a non-partitioned table.

CREATE EXTERNAL TABLE <ext_database_name>.<ext_schema_name>.<ext_table_name>(
    id TEXT,                                        
    created_at BIGINT,                             
    type TEXT,                                      
    actor_id TEXT,                                  
    actor_login TEXT,                               
    repo_id TEXT,                                   
    repo_name TEXT,                                 
    org TEXT,                                       
    org_login TEXT,                                 
    PRIMARY KEY(id)
) WITH (
  "changelog-producer"='input',
  "bucket"=6,
  "bucket-key"='id'
);

-- Create a partitioned table.

CREATE EXTERNAL TABLE <ext_database_name>.<ext_schema_name>.<ext_table_name>(
    id TEXT,                                        
    created_at BIGINT,                              
    type TEXT,                                      
    actor_id TEXT,                                  
    actor_login TEXT,                               
    repo_id TEXT,                                  
    repo_name TEXT,                                 
    org TEXT,                                       
    org_login TEXT                                
)
LOGICAL PARTITION BY LIST(created_at)
 WITH (
  "file_format" = 'orc',
  "bucket"=6,
  "bucket-key"='id'
);

Refresh the metadata of an external table

REFRESH CACHE FOR EXTERNAL TABLE <ext_database_name>.<ext_schema_name>.<ext_table_name> WITH( cache_level = 'metadata');

Drop an external table

When you drop an external table, the table is physically removed from the DLF Paimon catalog. Perform this operation with caution.

-- Delete the external table.
DROP EXTERNAL TABLE IF EXISTS <ext_database_name>.<ext_schema_name>.<ext_table_name>;

Collect statistics for foreign tables

The ANALYZE and AUTO ANALYZE commands collect statistics about the content of tables in a database. The optimizer uses these statistics to generate optimal query plans and improve query efficiency.

-- Update the statistics of a table. By default, the statistics of all columns in the table are collected.
analyze <ext_database_name>.<ext_schema_name>.<ext_table_name>;

-- Update the statistics of a specific column. More data is sampled than when you update the statistics of the entire table, which makes the statistics more accurate. This is mainly used to update conditional columns.
analyze <ext_database_name>.<ext_schema_name>.<ext_table_name>(<colname>, <colname>);

-- Enable auto-analyze.
ALTER EXTERNAL DATABASE <ext_database_name> WITH enable_auto_analyze 'true';

Time Travel

In Hologres V3.2 and later, you can query historical snapshots of Paimon tables based on a timestamp, version, or tag. You can also query the Snapshots and Tags system tables in Hologres to obtain the snapshot details of a table.

Query the Snapshots table

SELECT * FROM hologres.hg_list_snapshots('<ext_database_name>.<ext_schema_name>.<ext_table_name>;');

The columns in the Snapshots table map to the Paimon system table columns as follows.

Snapshots table column

Type

Description

Corresponding column in the Paimon system table

branch_name

TEXT

The branch name.

branch_name

snapshot_id

TEXT

The snapshot ID.

snapshot_id

schema_id

TEXT

The schema ID of the table.

schema_id

commit_kind

TEXT

The commit type of the table data.

commit_kind

commit_time

TIMESTAMPTZ

The commit time.

commit_time

extend_info

TEXT (JSON)

Other properties of the Paimon Snapshots table.

Other properties of the Paimon Snapshots table

You can also query the Snapshots table using the hologres.hg_list_versions function. For Paimon table queries, a version corresponds to a snapshot ID.

SELECT * FROM hologres.hg_list_versions('<ext_database_name>.<ext_schema_name>.<ext_table_name>');

Query the Tags table

SELECT * FROM hologres.hg_list_tags('<ext_database_name>.<ext_schema_name>.<ext_table_name>');

The columns in the Tags table map to the Paimon system table columns as follows.

Tags table column

Type

Description

Corresponding column in the Paimon system table

branch_name

TEXT

The branch name.

branch_name

tag_name

TEXT

The tag name.

tag_name

snapshot_id

TEXT

The snapshot ID.

snapshot_id

schema_id

TEXT

The schema ID of the table.

schema_id

commit_time

TIMESTAMPTZ

The commit time.

commit_time

extend_info

TEXT (JSON)

Other properties of the Tags table.

Other properties of the Tags table.

Query historical snapshots based on a timestamp

SELECT * FROM '<ext_database_name>.<ext_schema_name>.<ext_table_name>' FOR TIMESTAMP AS OF '<timestamp>';

Query historical snapshots based on a version

For Paimon tables, a version corresponds to a snapshot ID.

SELECT * FROM '<ext_database_name>.<ext_schema_name>.<ext_table_name>' FOR VERSION AS OF '<version>';

Query historical snapshots based on a tag

SELECT * FROM '<ext_database_name>.<ext_schema_name>.<ext_table_name>' FOR TAG AS OF '<tag>';

Query branches

In Hologres V3.2 and later, you can query data from a branch. The fallback branch is also supported. For more information, see Manage branches.

Query the Branches table

SELECT * FROM hologres.hg_list_branches('<ext_database_name>.<ext_schema_name>.<ext_table_name>');

The columns in the branches table map to the Paimon system table columns as follows.

Branches table column

Type

Description

Corresponding column in the Paimon system table

branch_name

TEXT

The branch name.

branch_name

create_time

TIMESTAMPTZ

The time when the branch was created.

create_time

extend_info

TEXT (JSON)

Other properties of the Branches table.

Other properties of the Branches table.

Query data from a branch

SELECT * FROM '<ext_database_name>.<ext_schema_name>.<ext_table_name>' FOR branch AS OF '<tag>';
Note

If the scan.fallback-branch option is set for a Paimon table and a partition does not exist when a query job reads data from the current branch, the reader attempts to read the partition data from the fallback branch. For more information, see Manage branches.

Query historical snapshot from a branch

-- Query based on a timestamp.
SELECT * FROM '<ext_database_name>.<ext_schema_name>.<ext_table_name>' FOR branch AS OF '<branch_name>' TIMESTAMP AS OF '<timestamp>';

-- Query based on a tag.
SELECT * FROM '<ext_database_name>.<ext_schema_name>.<ext_table_name>' FOR branch AS OF '<branch_name>' TAG AS OF '<tag_name>';

Data mirroring

Starting with V3.2, Hologres offers data and metadata mirroring to accelerate queries on data lake data. This feature enables near real-time or interval-based synchronization of changes from external data sources to Hologres. It supports both full table mirroring and partial mirroring of partitioned tables, allowing flexible control over the mirrored data scope. For more information, see Data mirroring.

Access Paimon tables by using foreign tables

  1. Create a foreign server.

    For more information, see More operations on foreign servers by using SQL statements.

    -- create foreign server
    
    CREATE SERVER IF NOT EXISTS <server_name> FOREIGN DATA WRAPPER dlf_fdw OPTIONS (
        catalog_type 'paimon',
        metastore_type 'dlf-rest', 
        dlf_catalog '<dlf_catalog_name>'
    );
  2. Create a foreign table.

    • IMPORT FOREIGN SCHEMA

      For more information, see IMPORT FOREIGN SCHEMA.

      -- Create a foreign table using IMPORT FOREIGN SCHEMA.
      IMPORT FOREIGN SCHEMA <dlf_db_name>
      FROM SERVER <server_name> 
      INTO <holo_schema_name> 
      options (if_table_exist 'update');
    • CREATE FOREIGN TABLE

      For more information, see CREATE FOREIGN TABLE.

      CREATE FOREIGN TABLE <foreign_table_name>
      (
        { column_name data_type }
        [, ... ]
      ) 
      SERVER <server_name>
      options
      (
        schema_name '<dlf_database_name>',
        table_name '<dlf_table_name>'
      );

Access Paimon tables using dynamic tables

A dynamic table can automatically process and store the data aggregation results of one or more base tables. It provides various data refresh policies. This enables automatic data flow from base tables to the dynamic table, supporting unified development, automatic data flow, and processing timeliness. In Hologres V3.0 and later, you can use MaxCompute and DLF external tables as the base tables of a dynamic table.

Full refresh

CREATE DYNAMIC TABLE paimon_dt_table
WITH (
      auto_refresh_mode='full',
      freshness='3 minutes'
)AS SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;

Incremental refresh

CREATE DYNAMIC TABLE paimon_dt_table_increamental
WITH (
      auto_refresh_mode='incremental',
      freshness='3 minutes'
)AS SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;

Automatic refresh

CREATE DYNAMIC TABLE paimon_dt_table_auto
WITH (
      auto_refresh_mode='auto',
      freshness='3 minutes'
)AS SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;

DML operations

INSERT INTO

-- Use Hologres SQL to create a Paimon primary key table and insert data

CREATE EXTERNAL TABLE  <ext_database_name>.<ext_schema_name>.<ext_table_name> (
id BIGINT,
customer_id BIGINT,
item TEXT,
amount FLOAT,
dt TEXT,
PRIMARY KEY (id)
)
WITH (
"table_format" = 'paimon',
"file_format" = 'orc',
"merge-engine" = 'deduplicate', -- Set merge engine. Modify as needed.
"bucket" = '4' -- Example number of buckets
);

INSERT INTO <ext_database_name>.<ext_schema_name>.<ext_table_name> (id, customer_id, item, amount, dt) VALUES
(1, 1001, 'apple', 3.50, '2025-09-10'),
(2, 1002, 'banana', 2.00, '2025-09-10'),
(3, 1003, 'orange', 4.20, '2025-09-11');

INSERT OVERWRITE

-- Use Hologres SQL to create a Paimon append-only table and insert data

CREATE EXTERNAL TABLE <ext_database_name>.<ext_schema_name>.<ext_table_name> (
id BIGINT,
name TEXT,
amount FLOAT8,
dt TEXT
)
LOGICAL PARTITION BY LIST(dt)
WITH (
"table_format" = 'paimon',
"file_format" = 'parquet',
"merge-engine" = 'deduplicate', 
"bucket-key" = 'id',
"bucket" = '2'
);

INSERT OVERWRITE <ext_database_name>.<ext_schema_name>.<ext_table_name>(id, name, amount, dt) VALUES
(1, 'Alice', 100.50, '2025-09-01'),
(2, 'Bob', 200.00, '2025-09-01'),
(3, 'Carol', 150.75, '2025-09-02');

UPDATE

UPDATE <ext_database_name>.<ext_schema_name>.<ext_table_name> SET amount= 5 where id= 1;

DELETE

DELETE FROM <ext_database_name>.<ext_schema_name>.<ext_table_name> WHERE id = 2;