Hologres V3.2 and later lets you access Paimon catalog data using Data Lake Formation (DLF). This feature provides a unified interface and improves the efficiency of metadata management.
Background information
Alibaba Cloud Data Lake Formation (DLF) is a fully managed platform that provides unified metadata, data storage, and data management. It offers features such as metadata management, permission management, and storage optimization. By unifying metadata, lake table formats, and data storage, DLF simplifies the Operations and Maintenance (O&M) involved in building and managing data lakes. This allows businesses to focus on innovation and data insights. For more information, see What is Data Lake Formation?.
Features
Hologres
3.2and later supports the following features:Connect to a Paimon catalog in DLF using an external database. For more information, see EXTERNAL DATABASE.
You can use CREATE/DROP EXTERNAL SCHEMA to cascade the creation or deletion of databases in the Paimon catalog.
You can use CREATE/DROP EXTERNAL TABLE to cascade the creation or deletion of tables in the Paimon catalog.
You can view all databases and tables in the Paimon catalog.
You can read data from Paimon tables in full or incrementally using dynamic tables. For more information, see CREATE DYNAMIC TABLE.
You can collect statistics for external tables using ANALYZE and AUTO ANALYZE. This helps the optimizer generate optimal query plans. For more information, see ANALYZE and AUTO ANALYZE.
You can mirror data from external tables.
Time Travel: You can query historical snapshots of Paimon tables based on a timestamp, version, or tag.
Branch query: You can query Paimon branch data and historical snapshots based on a branch.
You can query Paimon system tables, such as the Snapshots and Tags tables.
Hologres
4.0and later supports the following DML operations:You can perform concurrent writes to Paimon primary key tables using
INSERT INTOorINSERT OVERWRITE. The write process is optimized to support shuffling by bucket before writing. This reduces the generation of small files.A memory-based caching mechanism is provided to support writes to Paimon tables that require caching. Examples include tables with the
First RowMerge Engine, tables that generate aChangelogusing thelookupmechanism, or tables withDeletion Vectorenabled.You can use
UPDATEandDELETEoperations on Paimon tables.Starting from V3.2.6 and V4.0.6, you can read data from tags that correspond to expired Paimon snapshots.
Prerequisites
DLF V2.5 or later is activated, and a Paimon catalog is created. For more information, see Authorize and activate DLF and Data catalog.
NoteFor information about the supported regions, see Endpoints.
You have purchased a Hologres instance and enabled the data lake acceleration service. For more information, see Accelerate access to OSS data using DLF.
Limits
You can access only DLF services in the same region as your Hologres instance.
You cannot use the OSS data lake acceleration feature in HoloWeb to visually create a DLF Paimon catalog.
Only Paimon external tables that are mapped using an external database support Time Travel queries, branch queries, and data mirroring.
Time Travel queries are not supported for tables on which schema evolution has occurred.
Dynamic tables do not support Time Travel queries for Paimon tables and branches.
Data mirroring for external tables does not support Paimon primary key tables.
Reading data from the default partition of a Paimon table is not supported. This includes partitions with empty or NULL values.
The data lake acceleration feature cannot be enabled for read-only secondary instances.
The
TRUNCATEcommand cannot be executed on external tables. You can useINSERT OVERWRITEto insert empty data instead.Writing data to Paimon dynamic bucket tables is not supported. These are tables created without specifying the
bucketparameter or withbucket = -1.The
UPDATEoperation is supported only for Paimon primary key tables with thededuplicateorpartial-updatemerge engine. Updating the primary key is not supported.The
DELETEoperation is supported only for Paimon primary key tables that use thededuplicatemerge engine or thepartial-updatemerge engine with thepartial-update.remove-record-on-deleteproperty enabled.External tables do not support mixed DML transactions. The two-phase commit protocol is implemented only within a single DML operation to ensure the atomicity of distributed writes.
Map a Paimon catalog using an external database
Create an external database (SLR mode)
A service-linked role (SLR) is a RAM role whose trusted entity is an Alibaba Cloud service. SLRs are designed to resolve authorization issues for cross-service access. In most cases, an SLR is automatically created or deleted when you use a specific feature. You do not need to manually create or delete it. SLRs help you configure the permissions that are required for an Alibaba Cloud service to run as expected and prevent risks that are caused by misoperations. For more information, see Service-linked roles.
Scenarios
The SLR mode is suitable for scenarios where you use your current identity to access data in an external database. By default, Hologres uses SLRs and identity pass-through to implement cross-service access.
Procedure
For newly purchased instances or instances upgraded to Hologres V2.2 or later, the system automatically creates an SLR. You can skip this step.
NoteFor Hologres instances earlier than V2.2, go to the Quick Authorization for Access Control page to create an SLR.
Connect to the Hologres instance and create an external database.
CREATE EXTERNAL DATABASE EXT_DATABASE_NAME WITH catalog_type 'paimon', metastore_type 'dlf-rest', dlf_catalog 'DLF_CATALOG_NAME', comment 'paimon catalog on dlf';View the schemas and tables in the external database.
-- Query the schemas in the external database. SELECT * FROM hologres.hg_external_schemas('<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>'</span>); -- Query the tables in a schema. SELECT * FROM hologres.hg_external_tables ('<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>'</span>, '<span class="var-span" contenteditable="true" data-var="EXT_SCHEMA_NAME"><ext_schema_name>'</span>);Query the table data.
SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME;
Create an external database (STS mode)
Alibaba Cloud Security Token Service (STS) is a service that manages temporary access permissions. RAM provides two types of identities: RAM users and RAM roles. RAM roles do not have permanent identity credentials. Instead, they obtain temporary identity credentials, called STS tokens, from STS. You can customize the validity period and access permissions of STS tokens. For more information, see What is STS?.
Scenarios
The STS mode is suitable for scenarios where you use cross-account access or a Hologres custom account (BASIC account) to access data in an external database.
Procedure
Log on to the RAM console, create a RAM role, and grant the AliyunDLFFullAccess or AliyunDLFReadOnlyAccess permission to the role. For more information, see Create a RAM role and grant permissions to it.
Add a trust policy to allow the RAM role to be assumed by the trusted Alibaba Cloud service, Hologres. For more information, see Modify the trust policy of a RAM role.
{ "Statement": [ { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": [ "hologres.aliyuncs.com" ] } } ], "Version": "1" }Log on to the Data Lake Formation console. In the destination catalog, grant the RAM role the permissions to access the database and tables.
In Hologres, create an external database and specify the preceding RAM role in the authentication policy.
CREATE EXTERNAL DATABASE <ext_database_name> WITH catalog_type 'paimon' metastore_type 'dlf-rest' dlf_catalog 'paimon_catalog' rolearn 'acs:ram::106380604****:role/***-ramrole' comment 'paimon catalog on dlf' ;In Hologres, create a user mapping to associate the RAM user or BASIC account that needs to access the DLF catalog with the RAM role.
CREATE USER MAPPING FOR "<RAM user | basic account>" EXTERNAL DATABASE EXT_DATABASE_NAME OPTIONS ( rolearn 'acs:ram::10638060***:role/***ramrole' );NoteFor more information about user mappings, see CREATE USER MAPPING.
For more information about how to create a RAM user, see Create a RAM user.
For more information about how to create a BASIC account, see User management.
Log on to the Hologres management console as a RAM user or reconnect to the Hologres instance as a BASIC account. Then, execute the following SQL query statement.
SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;
For operations related to external databases, see EXTERNAL DATABASE.
For operations related to external schemas, see EXTERNAL SCHEMA.
For operations related to external tables, see EXTERNAL TABLE.
Collect statistics for external tables
The ANALYZE and AUTO ANALYZE commands collect statistics about the content of tables in a database. The optimizer uses these statistics to generate optimal query plans and improve query efficiency. For more information, see ANALYZE and AUTO ANALYZE.
-- Update the statistics of a table. By default, the statistics of all columns in the table are collected.
analyze <ext_database_name>.<ext_schema_name>.<ext_table_name>;
-- Update the statistics of specific columns. More data is sampled than when you update the statistics of the entire table, which makes the statistics more accurate. This is mainly used to update conditional columns.
analyze <ext_database_name>.<ext_schema_name>.<ext_table_name>(<colname>, <colname>);
-- Enable Auto Analyze.
ALTER EXTERNAL DATABASE <ext_database_name> WITH enable_auto_analyze 'true';Time Travel query
Hologres V3.2 and later supports Time Travel. You can query historical snapshots of Paimon tables based on a timestamp, version, or tag. You can also query the Paimon Snapshots and Tags system tables in Hologres to obtain the snapshot details of a destination table. For more information about system tables, see Paimon System Tables.
Query the Paimon Snapshots system table
SELECT * FROM hologres.hg_list_snapshots('EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME');The following table describes the mapping between the fields in the Snapshots system table and the fields in the Paimon system table.
Snapshots system table field | Type | Description | Corresponding Paimon system table field |
branch_name | TEXT | The branch name. | branch_name |
snapshot_id | TEXT | The snapshot ID. | snapshot_id |
schema_id | TEXT | The schema ID of the table. | schema_id |
commit_kind | TEXT | The commit type of the table data. | commit_kind |
commit_time | TIMESTAMPTZ | The commit time. | commit_time |
extend_info | TEXT (JSON) | The remaining properties of the Paimon Snapshots system table. | The remaining properties of the Paimon Snapshots system table |
You can also use the hologres.hg_list_versions function to query the Paimon Snapshots system table. For Paimon table queries, a version corresponds to a snapshot ID.
SELECT * FROM hologres.hg_list_versions('EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME');Query the Paimon Tag system table
SELECT * FROM hologres.hg_list_tags('<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>.<ext_schema_name>.<ext_table_name>'</span>);The following table describes the mapping between the fields in the Tag system table and the fields in the Paimon system table.
Tag system table field | Type | Description | Corresponding Paimon system table field |
branch_name | TEXT | The branch name. | branch_name |
tag_name | TEXT | The tag name. | tag_name |
snapshot_id | TEXT | The snapshot ID. | snapshot_id |
schema_id | TEXT | The schema ID of the table. | schema_id |
commit_time | TIMESTAMPTZ | The commit time. | commit_time |
extend_info | TEXT (JSON) | The remaining properties of the Paimon Tag system table. | The remaining properties of the Paimon Tag system table |
Query a historical Paimon snapshot based on a timestamp
SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME FOR TIMESTAMP AS OF 'TIMESTAMP';Query a historical Paimon snapshot based on a version
For Paimon tables, a version corresponds to a snapshot ID.
SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>.<ext_schema_name>.<ext_table_name>'</span> FOR VERSION AS OF '<span class="var-span" contenteditable="true" data-var="VERSION"><version>'</span>;Query a historical Paimon snapshot based on a tag
SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME FOR TAG AS OF 'TAG';Branch query
Hologres V3.2 and later supports querying data from a specified branch of a Paimon table and supports the branch fallback property. For more information about Paimon branches, see Manage Branch.
Query the Branch system table
SELECT * FROM hologres.hg_list_branches('<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>.<ext_schema_name>.<ext_table_name>'</span>);The following table describes the mapping between the fields in the Branch system table and the fields in the Paimon system table.
Field | Type | Description | Corresponding Paimon system table field |
branch_name | TEXT | The branch name. | branch_name |
create_time | TIMESTAMPTZ | The time when the branch was created. | create_time |
extend_info | TEXT (JSON) | The remaining properties of the Paimon Branch system table. | The remaining properties of the Paimon Branch system table |
Query data from a specified branch
SELECT * FROM EXT_DATABASE_NAME.EXT_SCHEMA_NAME.EXT_TABLE_NAME FOR branch AS OF 'TAG';If the scan.fallback-branch property is set for a Paimon table and a partition does not exist when a query job reads data from the current branch, the reader attempts to read the partition data from the fallback branch. For more information about fallback branches, see Manage Branch.
Query historical snapshot data from a specified branch
-- Query based on a timestamp.
SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>.<ext_schema_name>.<ext_table_name>'</span> FOR branch AS OF '<span class="var-span" contenteditable="true" data-var="BRANCH_NAME"><branch_name>'</span> TIMESTAMP AS OF '<span class="var-span" contenteditable="true" data-var="TIMESTAMP"><timestamp>'</span>;
-- Query based on a tag.
SELECT * FROM '<span class="var-span" contenteditable="true" data-var="EXT_DATABASE_NAME"><ext_database_name>.<ext_schema_name>.<ext_table_name>'</span> FOR branch AS OF '<span class="var-span" contenteditable="true" data-var="BRANCH_NAME"><branch_name>'</span> TAG AS OF '<span class="var-span" contenteditable="true" data-var="TAG_NAME"><tag_name>'</span>;Lakehouse table image acceleration
Hologres V3.2 and later provides the data lake table mirroring feature to accelerate queries on data in data lakes. This feature supports metadata and data mirroring and can synchronize changes from external data sources to Hologres in near-real time or at specified intervals. It also supports full table mirroring and partial mirroring of partitioned tables. You can use parameters to flexibly control the scope of mirrored data. For more information, see Data lake table mirroring.
Access Paimon tables using foreign tables
Create a foreign server.
For more information about operations on foreign servers, see More operations on foreign servers (SQL).
DLF syntax
NoteApplicable to DLF V2.5 and later.
The SQL syntax is as follows:
-- create foreign server CREATE SERVER IF NOT EXISTS SERVER_NAME FOREIGN DATA WRAPPER dlf_fdw OPTIONS ( catalog_type 'paimon', metastore_type 'dlf-rest', dlf_catalog 'DLF_CATALOG_NAME' );DLF 1.0 syntax
NoteApplicable to DLF V1.0.
The SQL syntax is as follows:
-- create foreign server CREATE SERVER IF NOT EXISTS SERVER_NAME FOREIGN DATA WRAPPER dlf_fdw OPTIONS ( dlf_region '', dlf_endpoint 'dlf-share..aliyuncs.com', oss_endpoint 'oss-REGION_ID-internal.aliyuncs.com', dlf_catalog 'dlf_catalog_name' );Create a foreign table.
Use IMPORT FOREIGN SCHEMA
For more information about how to use IMPORT FOREIGN SCHEMA, see IMPORT FOREIGN SCHEMA.
Create a foreign table using IMPORT FOREIGN SCHEMA
-- Create a foreign table using IMPORT FOREIGN SCHEMA. IMPORT FOREIGN SCHEMA <dlf_db_name> FROM SERVER <server_name> INTO <holo_schema_name> options (if_table_exist 'update');Create a foreign table by specifying a table name
IMPORT FOREIGN SCHEMA <dlf_db_name> limit to (table_name[,....]) FROM SERVER <server_name> into <holo_schema_name> options (if_table_exist 'update');
CREATE FOREIGN TABLE statement
For more information about how to use CREATE FOREIGN TABLE, see CREATE FOREIGN TABLE.
CREATE FOREIGN TABLE FOREIGN_TABLE_NAME ( column_name data_type [, ... ] ) SERVER SERVER_NAME options ( schema_name 'DLF_DB_NAME', table_name 'DLF_TABLE_NAME' );
Access Paimon tables using dynamic tables
A dynamic table can automatically process and store the data aggregation results of one or more base tables. It has built-in data refresh policies. You can set different data refresh policies as needed to implement automatic data flow from base tables to the dynamic table. This meets business requirements for unified development, automatic data flow, and processing timeliness. Hologres V3.0 and later lets you use MaxCompute external tables and DLF external tables as the base tables of a dynamic table. For more information, see Dynamic tables.
Full refresh
CREATE DYNAMIC TABLE paimon_dt_table
WITH (
auto_refresh_mode='full',
freshness='3 minutes'
)AS SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;Incremental refresh
CREATE DYNAMIC TABLE paimon_dt_table_increamental
WITH (
auto_refresh_mode='incremental',
freshness='3 minutes'
)AS SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;Automatic refresh
CREATE DYNAMIC TABLE paimon_dt_table_auto
WITH (
auto_refresh_mode='auto',
freshness='3 minutes'
)AS SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;DML operations
INSERT INTO
-- Use Hologres SQL to create a Paimon primary key table and insert data.
CREATE EXTERNAL TABLE <ext_database_name>.<ext_schema_name>.<ext_table_name> (
id BIGINT,
customer_id BIGINT,
item TEXT,
amount FLOAT,
dt TEXT,
PRIMARY KEY (id)
)
WITH (
"table_format" = 'paimon',
"file_format" = 'orc',
"merge-engine" = 'deduplicate', -- Example merge engine. Modify as needed.
"bucket" = '4' -- Example number of buckets.
);
INSERT INTO <ext_database_name>.<ext_schema_name>.<ext_table_name> (id, customer_id, item, amount, dt) VALUES
(1, 1001, 'apple', 3.50, '2025-09-10'),
(2, 1002, 'banana', 2.00, '2025-09-10'),
(3, 1003, 'orange', 4.20, '2025-09-11');INSERT OVERWRITE
-- Use Hologres SQL to create a Paimon append-only table and insert data.
CREATE EXTERNAL TABLE <ext_database_name>.<ext_schema_name>.<ext_table_name> (
id BIGINT,
name TEXT,
amount FLOAT8,
dt TEXT
)
LOGICAL PARTITION BY LIST(dt)
WITH (
"table_format" = 'paimon',
"file_format" = 'parquet',
"merge-engine" = 'deduplicate',
"bucket-key" = 'id',
"bucket" = '2'
);
INSERT OVERWRITE <ext_database_name>.<ext_schema_name>.<ext_table_name>(id, name, amount, dt) VALUES
(1, 'Alice', 100.50, '2025-09-01'),
(2, 'Bob', 200.00, '2025-09-01'),
(3, 'Carol', 150.75, '2025-09-02');UPDATE
UPDATE <ext_database_name>.<ext_schema_name>.<ext_table_name> SET amount= 5 where id= 1;DELETE
DELETE FROM <ext_database_name>.<ext_schema_name>.<ext_table_name> WHERE id = 2;