Hologres V3.2 and later versions support accessing Paimon data using a DLF Catalog. This feature provides a unified interface and improves the efficiency of metadata management.
Background information
Alibaba Cloud Data Lake Formation (DLF) is a fully managed and unified platform for metadata, data storage, and data management. It provides features such as metadata management, permission management, and storage optimization. This platform unifies metadata, lake table formats, and data storage, which significantly simplifies O&M tasks during data lake construction and management. This allows enterprises to focus on business innovation and data insights. For more information, see What Is Data Lake Formation?.
Feature Details
-
Hologres V3.2 and later versions support the following features:
-
Connect to a DLF Catalog using an External Database. For more information, see EXTERNAL DATABASE.
-
Create or delete databases in a DLF Catalog using Create/Drop External Schema.
-
Create or delete tables in a DLF Catalog using Create/Drop External Table.
-
View all databases and tables in a DLF Catalog.
-
-
Use a Dynamic Table to read Paimon table data in full and incremental modes. For more information, see CREATE DYNAMIC TABLE.
-
Use ANALYZE and AUTO ANALYZE to collect statistics information for foreign tables. This helps the optimizer generate the optimal query plan. For more information, see ANALYZE and AUTO ANALYZE.
-
External table data mirroring.
-
TimeTravel: Query historical snapshots of Paimon tables based on a timestamp, version, or tag.
-
Branch query: Query Paimon branch data and historical snapshots based on a branch.
-
Query Paimon system tables, such as Snapshots and Tags.
-
-
Hologres V4.0 and later versions support various DML operations:
-
Write to Paimon primary key tables with multiple concurrent operations. This feature supports both
INSERT INTOandINSERT OVERWRITE. The write process is optimized to shuffle data by bucket before writing, which reduces the generation of small files. -
Provide a memory-based caching mechanism to support writing to Paimon tables that require caching. Examples include tables where the
Merge EngineisFirst Row, tables that generate aChangelogthrough alookupmechanism, or tables withDeletion Vectorenabled. -
Support
UPDATEandDELETEoperations on Paimon tables. -
Hologres V3.2.6 and V4.0.6 and later versions support reading data from tags that correspond to expired Paimon snapshots.
-
Prerequisites
-
You have activated DLF service V2.5 or later and created a DLF Catalog. For more information, see Authorize and activate DLF and Data catalog.
NoteFor more information, see Service Endpoints.
-
You have purchased a Hologres instance and enabled the data lake acceleration service. For more information, see Hologres instance and Data Lake Acceleration for OSS Access Using DLF.
Limits
-
You can access only DLF services in the same region.
-
You cannot visually create DLF Catalogs using the OSS data lake acceleration feature in HoloWeb.
-
Only Paimon foreign tables that are mapped using an External Database support TimeTravel queries, branch queries, and data mirroring.
-
TimeTravel queries on tables with schema evolution are not supported.
-
Dynamic Table does not support TimeTravel queries for Paimon tables and branches.
-
External table mirroring does not support Paimon primary key tables.
-
Reading default partitioned data from Paimon, such as partitions with empty or NULL values, is not supported.
-
Read-only replica instances do not support data lake acceleration.
-
You cannot execute the
TRUNCATEcommand on foreign tables. Instead, you can useINSERT OVERWRITEto insert empty data. -
Writing to Paimon dynamic bucketing tables is not supported. These are Paimon tables that are created without specifying the
bucketparameter or withbucket = -1. -
The
UPDATEoperation is supported only on Paimon primary key tables where the Merge Engine isdeduplicateorpartial-update. Updating primary keys is not supported. -
Only supports
DELETEoperations on Paimon primary key tables where the merge engine is set todeduplicateorpartial-update(requires enabling thepartial-update.remove-record-on-deleteproperty). -
Foreign tables do not support mixed DML transactions. Only single DML operations implement a two-phase commit protocol to ensure atomicity during distributed writes.
Map DLF Catalog Using External Database
Create an External Database (SLR Mode)
A service-linked role (SLR) is a type of RAM role whose trusted entity is an Alibaba Cloud service. SLRs are designed to manage authorized access across different Alibaba Cloud services. In most cases, when you use a specific feature, the associated Alibaba Cloud service automatically creates or deletes the SLR. You do not need to create or delete the role manually. A service-linked role simplifies the process of authorizing a service to access other services and reduces the risks of misoperation. For more information, see Service-linked Role.
Scenarios
The SLR mode applies to scenarios where you access External Database data using your current user identity. Hologres uses an SLR and identity pass-through by default to achieve cross-cloud product access.
Procedure
-
For instances that are newly purchased or upgraded to Hologres V2.2 or later, the system automatically creates a service-linked role. You do not need to create it manually. You can skip this step.
NoteFor Hologres instances that run versions earlier than V2.2, you must use the Resource Access Management Quick Authorization page to create a service-linked role.
-
Connect to the Hologres instance and create an External Database.
CREATE EXTERNAL DATABASE EXT_DATABASE_NAME WITH catalog_type 'paimon', metastore_type 'dlf-rest', dlf_catalog 'DLF_CATALOG_NAME', comment 'dlf catalog'; -
View the schemas and tables in the External Database.
-- Query schemas in the External Database SELECT * FROM hologres.hg_external_schemas('<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME'</span>); -- Query tables in the schema SELECT * FROM hologres.hg_external_tables ('<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME'</span>, '<span class="var-span" contenteditable="true" data-var="EXT_SCHEMA_NAME">EXT_SCHEMA_NAME'</span>); -
Query table data.
SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;
Create an External Database (STS Mode)
Alibaba Cloud Security Token Service (STS) is a service that you can use to manage temporary credentials. Resource Access Management (RAM) provides two types of identities: RAM users and RAM roles. RAM roles do not have permanent identity credentials. Instead, they can obtain temporary identity credentials from STS that have customizable validity periods and access permissions. These credentials are known as Security Token Service (STS) tokens. For more information, see What is STS.
Scenarios
The STS mode applies to scenarios that involve cross-account access and scenarios where Hologres custom accounts (BASIC accounts) access External Database data.
Procedure
-
Log on to the Resource Access Management (RAM) console, create a RAM role, and grant the AliyunDLFFullAccess or AliyunDLFReadOnlyAccess permission for DLF access. For more information, see Create a RAM Role and Grant Permissions.
-
Update the trust policy to allow the RAM role to be assumed by the trusted Alibaba Cloud service Hologres. For more information, see Modify the trust policy of a RAM role.
{ "Statement": [ { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": [ "hologres.aliyuncs.com" ] } } ], "Version": "1" } -
Log on to the Data Lake Formation console and grant access permissions for databases and tables to the RAM role in the target Catalog.
-
Create an External Database in Hologres. Specify the previously created RAM role as the authentication policy.
CREATE EXTERNAL DATABASE EXT_DATABASE_NAME WITH catalog_type 'paimon' metastore_type 'dlf-rest' dlf_catalog 'paimon_catalog' rolearn 'acs:ram::106380604****:role/***-ramrole' comment 'dlf catalog' ; -
Create a USER MAPPING in Hologres to associate the RAM user or BASIC account that needs to access the DLF Catalog with the RAM role.
CREATE USER MAPPING FOR "<RAM user | BASIC account>" EXTERNAL DATABASE EXT_DATABASE_NAME OPTIONS ( rolearn 'acs:ram::10638060***:role/***ramrole' );Note-
For more information about USER MAPPING, see CREATE USER MAPPING.
-
For more information about how to create a RAM user, see Create a RAM user.
-
For more information about how to create a BASIC account, see User Management.
-
-
After you log on to the Hologres console using a RAM user or re-connecting to a Hologres instance using a BASIC account, you can execute the following SQL query.
SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;
-
For more information about operations related to an External Database, see EXTERNAL DATABASE.
-
For more information about operations related to an External Schema, see EXTERNAL SCHEMA.
-
For more information about External Table operations, see EXTERNAL TABLE.
Collect external table statistics
You can use the ANALYZE and AUTO ANALYZE commands to collect statistics about table content in your database. The optimizer uses these statistics to generate optimal query plans and improve query performance. For more information, see ANALYZE and AUTO ANALYZE.
-- Update statistics for a table. By default, this command collects statistics for all columns in the table.
analyze EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;
-- Update statistics for specific columns. This command samples more data than a full-table update and provides more accurate results. Use this command mainly for columns that appear in filter conditions.
analyze EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME(COLNAME, COLNAME);
-- Enable Auto Analyze
ALTER EXTERNAL DATABASE EXT_DATABASE_NAME WITH enable_auto_analyze 'true';
TimeTravel Query
Starting from V3.2, Hologres supports historical snapshot queries on Paimon tables based on a timestamp, version, or tag, which enables time travel for data. Hologres also supports querying the Paimon Snapshots and Tags system tables to retrieve snapshot details for the target table. For more information about system tables, see Paimon System Tables.
Query Paimon Snapshots System Table
SELECT * FROM hologres.hg_list_snapshots('EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME');
The fields in the Snapshots system table correspond to the Paimon system table fields as follows:
|
Snapshots System Table Field |
Type |
Description |
Corresponding Paimon System Table Field |
|
branch_name |
TEXT |
Branch name. |
branch_name |
|
snapshot_id |
TEXT |
Snapshot ID. |
snapshot_id |
|
schema_id |
TEXT |
Table schema ID. |
schema_id |
|
commit_kind |
TEXT |
Table data commit type. |
commit_kind |
|
commit_time |
TIMESTAMPTZ |
Commit time. |
commit_time |
|
extend_info |
TEXT(JSON) |
Remaining properties of the Paimon Snapshots system table. |
Remaining properties of the Paimon Snapshots system table |
You can also query the Paimon Snapshots system table using the hologres.hg_list_versions function. For Paimon table queries, Version corresponds to Snapshot ID.
SELECT * FROM hologres.hg_list_versions('EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME');
Query Paimon Tag System Table
SELECT * FROM hologres.hg_list_tags('EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME');
The fields in the Tag system table correspond to the Paimon system table fields as follows:
|
Tag System Table Field |
Type |
Description |
Corresponding Paimon System Table Field |
|
branch_name |
TEXT |
Branch name. |
branch_name |
|
tag_name |
TEXT |
Tag name. |
tag_name |
|
snapshot_id |
TEXT |
Snapshot ID. |
snapshot_id |
|
schema_id |
TEXT |
Table schema ID. |
schema_id |
|
commit_time |
TIMESTAMPTZ |
Submission time |
commit_time |
|
extend_info |
TEXT(JSON) |
Remaining properties of the Paimon Tag system table. |
Remaining properties of the Paimon Tag system table |
Query Paimon Historical Snapshots Based on Timestamp
SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME FOR TIMESTAMP AS OF 'TIMESTAMP';
Query Paimon Historical Snapshots Based on Version
For Paimon tables, Version corresponds to Snapshot ID.
SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME'</span> FOR VERSION AS OF '<span class="var-span" contenteditable="true" data-var="VERSION">VERSION'</span>;
Query Paimon Historical Snapshots Based on Tag
SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME'</span> FOR TAG AS OF '<span class="var-span" contenteditable="true" data-var="TAG">TAG'</span>;
Branch Query
Starting from V3.2, Hologres supports querying data from a specified branch of a Paimon table and supports the branch fallback property. For more information about Paimon branches, see Manage Branch.
Query Branch System Table
SELECT * FROM hologres.hg_list_branches('EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME');
The fields in the Branch system table correspond to the Paimon system table fields as follows:
|
Field |
Type |
Description |
Corresponding Paimon System Table Field |
|
branch_name |
TEXT |
Branch name. |
branch_name |
|
create_time |
TIMESTAMPTZ |
Branch creation time. |
create_time |
|
extend_info |
TEXT(JSON) |
Remaining properties of the Paimon Branch system table. |
Remaining properties of the Paimon Branch system table |
Query Specified Branch Data
SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME'</span> FOR branch AS OF '<span class="var-span" contenteditable="true" data-var="TAG">TAG'</span>;
If the scan.fallback-branch property is set on a Paimon table, when a query job reads data from the current branch and a partition is missing, the reader attempts to read that partition's data from the fallback branch. For more information about fallback branches, see Manage Branch.
Query Historical Snapshots of a Specified Branch
-- Query based on Timestamp
SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME'</span> FOR branch AS OF '<span class="var-span" contenteditable="true" data-var="BRANCH_NAME">BRANCH_NAME'</span> TIMESTAMP AS OF '<span class="var-span" contenteditable="true" data-var="TIMESTAMP">TIMESTAMP'</span>;
-- Query based on Tag
SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME">EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME'</span> FOR branch AS OF '<span class="var-span" contenteditable="true" data-var="BRANCH_NAME">BRANCH_NAME'</span> TAG AS '<span class="var-span" contenteditable="true" data-var="TAG_NAME">TAG_NAME'</span>;
Lake Table Mirroring Acceleration
Starting from V3.2, Hologres supports the lake table image feature to accelerate data lake queries. This feature supports metadata and data images and synchronizes changes from external data sources to Hologres in near-real-time or at specified intervals. It also supports full table images and partial images of partitioned tables. You can use parameters to flexibly control the scope of the image data. For more information, see Lake Table Image.
Consume Paimon Tables Using Foreign Table
-
Create a Foreign Server.
For more information about Foreign Server operations, see More Foreign Server Operations (SQL Method).
DLF syntax
NoteThis syntax applies to DLF service V2.5 and later.
The SQL syntax is as follows:
-- Create a foreign server. CREATE SERVER IF NOT EXISTS SERVER_NAME FOREIGN DATA WRAPPER dlf_fdw OPTIONS ( catalog_type 'paimon', metastore_type 'dlf-rest', dlf_catalog '<span class="var-span" contenteditable="true" data-var="DLF_CATALOG_NAME">DLF_CATALOG_NAME</span>' );DLF 1.0 syntax
NoteThis syntax applies to DLF service V1.0.
The SQL syntax is as follows:
-- Create a foreign server CREATE SERVER IF NOT EXISTS SERVER_NAME FOREIGN DATA WRAPPER dlf_fdw OPTIONS ( dlf_region '', dlf_endpoint 'dlf-share..aliyuncs.com', oss_endpoint 'oss-REGION_ID-internal.aliyuncs.com', dlf_catalog 'dlf_catalog_name' ); -
Create a foreign table.
-
Use the IMPORT FOREIGN SCHEMA method
For more information about how to use IMPORT FOREIGN SCHEMA, see IMPORT FOREIGN SCHEMA.
-
Create a foreign table using IMPORT FOREIGN SCHEMA
-- Create a foreign table using IMPORT FOREIGN SCHEMA IMPORT FOREIGN SCHEMA DLF_DB_NAME FROM SERVER SERVER_NAME INTO <holo+schema_name> options (if_table_exist 'update'); -
Create a foreign table by specifying table names
IMPORT FOREIGN SCHEMA DLF_DB_NAME limit to (table_name[,....]) FROM SERVER SERVER_NAME into <holo+schema_name> options (if_table_exist 'update');
-
-
Use the CREATE FOREIGN TABLE method
For more information, see CREATE FOREIGN TABLE.
CREATE FOREIGN TABLE FOREIGN_TABLE_NAME ( { column_name data_type } [, ... ] ) SERVER SERVER_NAME options ( schema_name 'DLF_DB_NAME', table_name 'DLF_TABLE_NAME' );
-
Consume Paimon Tables Using Dynamic Table
A Dynamic Table can automatically process and store aggregated data from one or more base tables and includes built-in data refresh strategies. You can configure different data refresh strategies based on your business needs to enable automatic data flow from base tables to the Dynamic Table to meet requirements such as unified development, automatic data flow, and timely processing. Starting with Hologres V3.0, MaxCompute foreign tables and DLF foreign tables are supported as base tables for a Dynamic Table. For more information, see Dynamic Table.
Full Refresh
CREATE DYNAMIC TABLE paimon_dt_table
WITH (
auto_refresh_mode='full',
freshness='3 minutes'
)AS SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;
Incremental Refresh
CREATE DYNAMIC TABLE paimon_dt_table_increamental
WITH (
auto_refresh_mode='incremental',
freshness='3 minutes'
)AS SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;
Auto-Refresh
CREATE DYNAMIC TABLE paimon_dt_table_auto
WITH (
auto_refresh_mode='auto',
freshness='3 minutes'
)AS * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;
DML Operations
INSERT INTO
-- Create a Paimon primary key table and insert data using Hologres SQL
CREATE EXTERNAL TABLE EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME (
id BIGINT,
customer_id BIGINT,
item TEXT,
amount FLOAT,
dt TEXT,
PRIMARY KEY (id)
)
WITH (
"table_format" = 'paimon',
"file_format" = 'orc',
"merge-engine" = 'deduplicate', -- This is an example merge policy. You can modify it as needed.
"bucket" = '4' -- This is an example number of buckets.
);
INSERT INTO EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME (id, customer_id, item, amount, dt) VALUES
(1, 1001, 'apple', 3.50, '2025-09-10'),
(2, 1002, 'banana', 2.00, '2025-09-10'),
(3, 1003, 'orange', 4.20, '2025-09-11');
INSERT OVERWRITE
-- Create a Paimon append-only table and insert data using Hologres SQL
CREATE EXTERNAL TABLE EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME (
id BIGINT,
name TEXT,
amount FLOAT8,
dt TEXT
)
LOGICAL PARTITION BY LIST(dt)
WITH (
"table_format" = 'paimon',
"file_format" = 'parquet',
"merge-engine" = 'deduplicate',
"bucket-key" = 'id',
"bucket" = '2'
);
INSERT OVERWRITE EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME(id, name, amount, dt) VALUES
(1, 'Alice', 100.50, '2025-09-01'),
(2, 'Bob', 200.00, '2025-09-01'),
(3, 'Carol', 150.75, '2025-09-02');
UPDATE
UPDATE EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME SET amount= 5 where id= 1;
DELETE
DELETE FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME WHERE id = 2;