Access a Paimon catalog using DLF - Hologres - Alibaba Cloud Documentation Center

Hologres V3.2 and later lets you access Paimon catalog data using Data Lake Formation (DLF). This feature provides a unified interface and improves the efficiency of metadata management.

Background information

Alibaba Cloud Data Lake Formation (DLF) is a fully managed platform that provides unified metadata, data storage, and data management. It offers features such as metadata management, permission management, and storage optimization. By unifying metadata, lake table formats, and data storage, DLF simplifies the Operations and Maintenance (O&M) involved in building and managing data lakes. This allows businesses to focus on innovation and data insights. For more information, see What is Data Lake Formation?.

Features

Hologres 3.2 and later supports the following features:
- Connect to a Paimon catalog in DLF using an external database. For more information, see EXTERNAL DATABASE.
  - You can use CREATE/DROP EXTERNAL SCHEMA to cascade the creation or deletion of databases in the Paimon catalog.
  - You can use CREATE/DROP EXTERNAL TABLE to cascade the creation or deletion of tables in the Paimon catalog.
  - You can view all databases and tables in the Paimon catalog.
- You can read data from Paimon tables in full or incrementally using dynamic tables. For more information, see CREATE DYNAMIC TABLE.
- You can collect statistics for external tables using ANALYZE and AUTO ANALYZE. This helps the optimizer generate optimal query plans. For more information, see ANALYZE and AUTO ANALYZE.
- You can mirror data from external tables.
- Time Travel: You can query historical snapshots of Paimon tables based on a timestamp, version, or tag.
- Branch query: You can query Paimon branch data and historical snapshots based on a branch.
- You can query Paimon system tables, such as the Snapshots and Tags tables.
Hologres 4.0 and later supports the following DML operations:
- You can perform concurrent writes to Paimon primary key tables using INSERT INTO or INSERT OVERWRITE. The write process is optimized to support shuffling by bucket before writing. This reduces the generation of small files.
- A memory-based caching mechanism is provided to support writes to Paimon tables that require caching. Examples include tables with the First Row Merge Engine, tables that generate a Changelog using the lookup mechanism, or tables with Deletion Vector enabled.
- You can use UPDATE and DELETE operations on Paimon tables.
- Starting from V3.2.6 and V4.0.6, you can read data from tags that correspond to expired Paimon snapshots.

Prerequisites

DLF V2.5 or later is activated, and a Paimon catalog is created. For more information, see Authorize and activate DLF and Data catalog.
Note
For information about the supported regions, see Endpoints.
You have purchased a Hologres instance and enabled the data lake acceleration service. For more information, see Accelerate access to OSS data using DLF.

Limits

You can access only DLF services in the same region as your Hologres instance.
You cannot use the OSS data lake acceleration feature in HoloWeb to visually create a DLF Paimon catalog.
Only Paimon external tables that are mapped using an external database support Time Travel queries, branch queries, and data mirroring.
Time Travel queries are not supported for tables on which schema evolution has occurred.
Dynamic tables do not support Time Travel queries for Paimon tables and branches.
Data mirroring for external tables does not support Paimon primary key tables.
Reading data from the default partition of a Paimon table is not supported. This includes partitions with empty or NULL values.
The data lake acceleration feature cannot be enabled for read-only secondary instances.
The TRUNCATE command cannot be executed on external tables. You can use INSERT OVERWRITE to insert empty data instead.
Writing data to Paimon dynamic bucket tables is not supported. These are tables created without specifying the bucket parameter or with bucket = -1.
The UPDATE operation is supported only for Paimon primary key tables with the deduplicate or partial-update merge engine. Updating the primary key is not supported.
The DELETE operation is supported only for Paimon primary key tables that use the deduplicate merge engine or the partial-update merge engine with the partial-update.remove-record-on-delete property enabled.
External tables do not support mixed DML transactions. The two-phase commit protocol is implemented only within a single DML operation to ensure the atomicity of distributed writes.

Map a Paimon catalog using an external database

Create an external database (SLR mode)

A service-linked role (SLR) is a RAM role whose trusted entity is an Alibaba Cloud service. SLRs are designed to resolve authorization issues for cross-service access. In most cases, an SLR is automatically created or deleted when you use a specific feature. You do not need to manually create or delete it. SLRs help you configure the permissions that are required for an Alibaba Cloud service to run as expected and prevent risks that are caused by misoperations. For more information, see Service-linked roles.

Scenarios

The SLR mode is suitable for scenarios where you use your current identity to access data in an external database. By default, Hologres uses SLRs and identity pass-through to implement cross-service access.

Procedure

For newly purchased instances or instances upgraded to Hologres V2.2 or later, the system automatically creates an SLR. You can skip this step.
Note
For Hologres instances earlier than V2.2, go to the Quick Authorization for Access Control page to create an SLR.

Connect to the Hologres instance and create an external database.

CREATE EXTERNAL DATABASE EXT_DATABASE_NAME WITH
  catalog_type 'paimon',
  metastore_type 'dlf-rest',
  dlf_catalog 'DLF_CATALOG_NAME',
  comment 'paimon catalog on dlf';

View the schemas and tables in the external database.

-- Query the schemas in the external database.
SELECT * FROM hologres.hg_external_schemas('<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>'</span>);

-- Query the tables in a schema.
SELECT * FROM hologres.hg_external_tables ('<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>'</span>, '<span class="var-span" contenteditable="true" data-var="EXT_SCHEMA_NAME"><ext_schema_name>'</span>);

Query the table data.

SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;

Create an external database (STS mode)

Alibaba Cloud Security Token Service (STS) is a service that manages temporary access permissions. RAM provides two types of identities: RAM users and RAM roles. RAM roles do not have permanent identity credentials. Instead, they obtain temporary identity credentials, called STS tokens, from STS. You can customize the validity period and access permissions of STS tokens. For more information, see What is STS?.

Scenarios

The STS mode is suitable for scenarios where you use cross-account access or a Hologres custom account (BASIC account) to access data in an external database.

Procedure

Log on to the RAM console, create a RAM role, and grant the AliyunDLFFullAccess or AliyunDLFReadOnlyAccess permission to the role. For more information, see Create a RAM role and grant permissions to it.

Add a trust policy to allow the RAM role to be assumed by the trusted Alibaba Cloud service, Hologres. For more information, see Modify the trust policy of a RAM role.

{
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "hologres.aliyuncs.com"
        ]
      }
    }
  ],
  "Version": "1"
}

Log on to the Data Lake Formation console. In the destination catalog, grant the RAM role the permissions to access the database and tables.

In Hologres, create an external database and specify the preceding RAM role in the authentication policy.

CREATE EXTERNAL DATABASE <ext_database_name> WITH
  catalog_type 'paimon'
  metastore_type 'dlf-rest'
  dlf_catalog 'paimon_catalog' 
  rolearn 'acs:ram::106380604****:role/***-ramrole'
  comment 'paimon catalog on dlf'
  ;

In Hologres, create a user mapping to associate the RAM user or BASIC account that needs to access the DLF catalog with the RAM role.
```
CREATE USER MAPPING FOR "<RAM user | basic account>"
EXTERNAL DATABASE EXT_DATABASE_NAME
OPTIONS
(
   rolearn 'acs:ram::10638060***:role/***ramrole'
);
```
Note
- For more information about user mappings, see CREATE USER MAPPING.
- For more information about how to create a RAM user, see Create a RAM user.
- For more information about how to create a BASIC account, see User management.
Log on to the Hologres management console as a RAM user or reconnect to the Hologres instance as a BASIC account. Then, execute the following SQL query statement.
```
SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;
```

For operations related to external databases, see EXTERNAL DATABASE.
For operations related to external schemas, see EXTERNAL SCHEMA.
For operations related to external tables, see EXTERNAL TABLE.

Collect statistics for external tables

The ANALYZE and AUTO ANALYZE commands collect statistics about the content of tables in a database. The optimizer uses these statistics to generate optimal query plans and improve query efficiency. For more information, see ANALYZE and AUTO ANALYZE.

-- Update the statistics of a table. By default, the statistics of all columns in the table are collected.
analyze <ext_database_name>.<ext_schema_name>.<ext_table_name>;

-- Update the statistics of specific columns. More data is sampled than when you update the statistics of the entire table, which makes the statistics more accurate. This is mainly used to update conditional columns.
analyze <ext_database_name>.<ext_schema_name>.<ext_table_name>(<colname>, <colname>);

-- Enable Auto Analyze.
ALTER EXTERNAL DATABASE <ext_database_name> WITH enable_auto_analyze 'true';

Time Travel query

Hologres V3.2 and later supports Time Travel. You can query historical snapshots of Paimon tables based on a timestamp, version, or tag. You can also query the Paimon Snapshots and Tags system tables in Hologres to obtain the snapshot details of a destination table. For more information about system tables, see Paimon System Tables.

Query the Paimon Snapshots system table

SELECT * FROM hologres.hg_list_snapshots('EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME');

The following table describes the mapping between the fields in the Snapshots system table and the fields in the Paimon system table.

Snapshots system table field	Type	Description	Corresponding Paimon system table field
branch_name	TEXT	The branch name.	branch_name
snapshot_id	TEXT	The snapshot ID.	snapshot_id
schema_id	TEXT	The schema ID of the table.	schema_id
commit_kind	TEXT	The commit type of the table data.	commit_kind
commit_time	TIMESTAMPTZ	The commit time.	commit_time
extend_info	TEXT (JSON)	The remaining properties of the Paimon Snapshots system table.	The remaining properties of the Paimon Snapshots system table

You can also use the hologres.hg_list_versions function to query the Paimon Snapshots system table. For Paimon table queries, a version corresponds to a snapshot ID.

SELECT * FROM hologres.hg_list_versions('EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME');

Query the Paimon Tag system table

SELECT * FROM hologres.hg_list_tags('<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>.<ext_schema_name>.<ext_table_name>'</span>);

The following table describes the mapping between the fields in the Tag system table and the fields in the Paimon system table.

Tag system table field	Type	Description	Corresponding Paimon system table field
branch_name	TEXT	The branch name.	branch_name
tag_name	TEXT	The tag name.	tag_name
snapshot_id	TEXT	The snapshot ID.	snapshot_id
schema_id	TEXT	The schema ID of the table.	schema_id
commit_time	TIMESTAMPTZ	The commit time.	commit_time
extend_info	TEXT (JSON)	The remaining properties of the Paimon Tag system table.	The remaining properties of the Paimon Tag system table

Query a historical Paimon snapshot based on a timestamp

SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME FOR TIMESTAMP AS OF 'TIMESTAMP';

Query a historical Paimon snapshot based on a version

For Paimon tables, a version corresponds to a snapshot ID.

SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>.<ext_schema_name>.<ext_table_name>'</span> FOR VERSION AS OF '<span class="var-span" contenteditable="true" data-var="VERSION"><version>'</span>;

Query a historical Paimon snapshot based on a tag

SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME FOR TAG AS OF 'TAG';

Branch query

Hologres V3.2 and later supports querying data from a specified branch of a Paimon table and supports the branch fallback property. For more information about Paimon branches, see Manage Branch.

Query the Branch system table

SELECT * FROM hologres.hg_list_branches('<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>.<ext_schema_name>.<ext_table_name>'</span>);

The following table describes the mapping between the fields in the Branch system table and the fields in the Paimon system table.

Field	Type	Description	Corresponding Paimon system table field
branch_name	TEXT	The branch name.	branch_name
create_time	TIMESTAMPTZ	The time when the branch was created.	create_time
extend_info	TEXT (JSON)	The remaining properties of the Paimon Branch system table.	The remaining properties of the Paimon Branch system table

Query data from a specified branch

SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME FOR branch AS OF 'TAG';

Note

If the scan.fallback-branch property is set for a Paimon table and a partition does not exist when a query job reads data from the current branch, the reader attempts to read the partition data from the fallback branch. For more information about fallback branches, see Manage Branch.

Query historical snapshot data from a specified branch

-- Query based on a timestamp.
SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>.<ext_schema_name>.<ext_table_name>'</span> FOR branch AS OF '<span class="var-span" contenteditable="true" data-var="BRANCH_NAME"><branch_name>'</span> TIMESTAMP AS OF '<span class="var-span" contenteditable="true" data-var="TIMESTAMP"><timestamp>'</span>;

-- Query based on a tag.
SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>.<ext_schema_name>.<ext_table_name>'</span> FOR branch AS OF '<span class="var-span" contenteditable="true" data-var="BRANCH_NAME"><branch_name>'</span> TAG AS OF '<span class="var-span" contenteditable="true" data-var="TAG_NAME"><tag_name>'</span>;

Lakehouse table image acceleration

Hologres V3.2 and later provides the data lake table mirroring feature to accelerate queries on data in data lakes. This feature supports metadata and data mirroring and can synchronize changes from external data sources to Hologres in near-real time or at specified intervals. It also supports full table mirroring and partial mirroring of partitioned tables. You can use parameters to flexibly control the scope of mirrored data. For more information, see Data lake table mirroring.

Access Paimon tables using foreign tables

Create a foreign server.

For more information about operations on foreign servers, see More operations on foreign servers (SQL).

DLF syntax

Note

Applicable to DLF V2.5 and later.

The SQL syntax is as follows:

-- create foreign server
CREATE SERVER IF NOT EXISTS SERVER_NAME FOREIGN DATA WRAPPER dlf_fdw OPTIONS (
    catalog_type 'paimon',
    metastore_type 'dlf-rest', 
    dlf_catalog 'DLF_CATALOG_NAME'
);

DLF 1.0 syntax

Note

Applicable to DLF V1.0.

The SQL syntax is as follows:

-- create foreign server
CREATE SERVER IF NOT EXISTS SERVER_NAME FOREIGN DATA WRAPPER dlf_fdw OPTIONS (
  dlf_region '',
  dlf_endpoint 'dlf-share..aliyuncs.com',
  oss_endpoint 'oss-REGION_ID-internal.aliyuncs.com',
  dlf_catalog 'dlf_catalog_name'                 
);

Create a foreign table.

Use IMPORT FOREIGN SCHEMA

For more information about how to use IMPORT FOREIGN SCHEMA, see IMPORT FOREIGN SCHEMA.

Create a foreign table using IMPORT FOREIGN SCHEMA

-- Create a foreign table using IMPORT FOREIGN SCHEMA.
IMPORT FOREIGN SCHEMA <dlf_db_name>
FROM SERVER <server_name> 
INTO <holo_schema_name> 
options (if_table_exist 'update');

Create a foreign table by specifying a table name

IMPORT FOREIGN SCHEMA <dlf_db_name>
limit to (table_name[,....]) 
FROM SERVER <server_name> 
into <holo_schema_name>
options (if_table_exist 'update');

CREATE FOREIGN TABLE statement

For more information about how to use CREATE FOREIGN TABLE, see CREATE FOREIGN TABLE.

CREATE FOREIGN TABLE FOREIGN_TABLE_NAME
(
  column_name data_type
  [, ... ]
) 
SERVER SERVER_NAME
options
(
  schema_name 'DLF_DB_NAME',
  table_name 'DLF_TABLE_NAME'
);

Access Paimon tables using dynamic tables

A dynamic table can automatically process and store the data aggregation results of one or more base tables. It has built-in data refresh policies. You can set different data refresh policies as needed to implement automatic data flow from base tables to the dynamic table. This meets business requirements for unified development, automatic data flow, and processing timeliness. Hologres V3.0 and later lets you use MaxCompute external tables and DLF external tables as the base tables of a dynamic table. For more information, see Dynamic tables.

Full refresh

CREATE DYNAMIC TABLE paimon_dt_table
WITH (
      auto_refresh_mode='full',
      freshness='3 minutes'
)AS SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;

Incremental refresh

CREATE DYNAMIC TABLE paimon_dt_table_increamental
WITH (
      auto_refresh_mode='incremental',
      freshness='3 minutes'
)AS SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;

Automatic refresh

CREATE DYNAMIC TABLE paimon_dt_table_auto
WITH (
      auto_refresh_mode='auto',
      freshness='3 minutes'
)AS SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;

DML operations

INSERT INTO

-- Use Hologres SQL to create a Paimon primary key table and insert data.

CREATE EXTERNAL TABLE  <ext_database_name>.<ext_schema_name>.<ext_table_name> (
id BIGINT,
customer_id BIGINT,
item TEXT,
amount FLOAT,
dt TEXT,
PRIMARY KEY (id)
)
WITH (
"table_format" = 'paimon',
"file_format" = 'orc',
"merge-engine" = 'deduplicate', -- Example merge engine. Modify as needed.
"bucket" = '4' -- Example number of buckets.
);

INSERT INTO <ext_database_name>.<ext_schema_name>.<ext_table_name> (id, customer_id, item, amount, dt) VALUES
(1, 1001, 'apple', 3.50, '2025-09-10'),
(2, 1002, 'banana', 2.00, '2025-09-10'),
(3, 1003, 'orange', 4.20, '2025-09-11');

INSERT OVERWRITE

-- Use Hologres SQL to create a Paimon append-only table and insert data.

CREATE EXTERNAL TABLE <ext_database_name>.<ext_schema_name>.<ext_table_name> (
id BIGINT,
name TEXT,
amount FLOAT8,
dt TEXT
)
LOGICAL PARTITION BY LIST(dt)
WITH (
"table_format" = 'paimon',
"file_format" = 'parquet',
"merge-engine" = 'deduplicate', 
"bucket-key" = 'id',
"bucket" = '2'
);

INSERT OVERWRITE <ext_database_name>.<ext_schema_name>.<ext_table_name>(id, name, amount, dt) VALUES
(1, 'Alice', 100.50, '2025-09-01'),
(2, 'Bob', 200.00, '2025-09-01'),
(3, 'Carol', 150.75, '2025-09-02');

UPDATE

UPDATE <ext_database_name>.<ext_schema_name>.<ext_table_name> SET amount= 5 where id= 1;

DELETE

DELETE FROM <ext_database_name>.<ext_schema_name>.<ext_table_name> WHERE id = 2;