Hologres V3.2 and later lets you access Paimon catalog data in Data Lake Formation (DLF). This feature provides a unified interface for more efficient metadata management.
Background information
DLF is a fully managed platform that provides unified metadata, data storage, and data management. It offers features such as metadata management, permission management, and storage optimization. By unifying metadata, lake table formats, and data storage, DLF simplifies the operations and maintenance (O&M) involved in building and managing data lakes. This allows businesses to focus on innovation and data insights.
Features
Hologres V3.2+ support these features:
Connect to a catalog in DLF using an external database.
Use
CREATE EXTERNAL SCHEMAorDROP EXTERNAL SCHEMAto cascade the creation or deletion of databases in the catalog.Use
CREATE EXTERNAL TABLEorDROP EXTERNAL TABLEto cascade the creation or deletion of tables.Query all databases and tables in a catalog.
Read data from Paimon tables in full or incrementally using dynamic tables.
Collect statistics for external tables by using ANALYZE and AUTO ANALYZE. This helps the optimizer generate optimal query plans.
Data mirroring of external tables.
Time Travel: Query historical snapshots of Paimon tables based on a timestamp, version, or tag.
Branch query: Query Paimon branch data and historical snapshots based on a branch.
Query Paimon system tables, such as snapshots and tags tables.
Hologres V4.0 and later supports various DML operations:
Perform concurrent writes to Paimon primary key tables using
INSERT INTOorINSERT OVERWRITE. The write process is optimized to support shuffling by bucket before writing, which reduces the generation of small files.A memory-based caching mechanism is provided to support writes to Paimon tables that require caching. Examples include tables with the first-row merge engine, lookup changelog producer, or with the deletion vector enabled.
UPDATEandDELETEoperations on Paimon tables.Read data from tags corresponding to expired Paimon snapshots (requires V3.2.6+ and V4.0.6+).
Prerequisites
You have activated latest DLF and created a catalog.
NoteFor information about the supported regions, see Regions and endpoints.
You have purchased a Hologres instance and enabled data lake acceleration. For more information, see Environment configuration.
Limitations
Your Hologres instance can access only DLF catalogs in the same region.
You cannot use OSS Data Lake Acceleration in HoloWeb to create a catalog in DLF.
Only Paimon tables mapped to Hologres via
CREATE EXTERNAL DATABASEsupport Time Travel, branch queries, and data mirroring.Time Travel is not supported for tables with evolved schema.
Paimon-based dynamic tables do not support Time Travel or branch queries.
Paimon primary-key tables cannot be mapped as external tables.
Accessing the Paimon default partition is not supported.
Read-only secondary instances do not support data lake acceleration.
The
TRUNCATEcommand is not supported for external/foreign tables. UseINSERT OVERWRITEwith empty data instead.Writing to Paimon tables with dynamic bucket strategies is not supported. These are tables created without specifying the
bucketparameter or withbucket = -1.The
UPDATEoperation is supported only for Paimon primary key tables with the deduplicate or partial-update merge engine. Updating the primary key is not supported.The
DELETEoperation is supported only for Paimon primary key tables that use either the deduplicate merge engine or the partial-update merge engine withpartial-update.remove-record-on-deleteenabled.External/foreign tables do not support multiple DML statements in a transaction. Two-phase commit is implemented within a DML operation to ensure the atomicity of distributed writes.
Map a Paimon catalog as an external database
Use service-linked role
A service-linked roles (SLR) is a RAM role whose trusted entity is an Alibaba Cloud service. SLRs are designed to resolve authorization issues for cross-service access. In most cases, an SLR is automatically created or deleted when you use a specific feature. SLRs aid with permission configurations, ensuring a service runs as expected while preventing misoperations.
Use case
SLR suits users who use their current identity to access an external database. By default, Hologres uses SLRs and identity pass-through to access another Alibaba Cloud service.
Procedure
(Optional) Create an SLR in Quick Authorization for Resource Access Management.
NoteThis step is required if you use Holores V2.2 or earlier instances. For newly created instances or V2.2 or later instances, skip this step.
Connect to the Hologres instance and map your Paimon catalog as an external database.
CREATE EXTERNAL DATABASE <ext_database_name> WITH catalog_type 'paimon' metastore_type 'dlf-rest' dlf_catalog '<dlf_catalog_name>' comment 'paimon catalog on dlf' ;View the schemas and tables in the Paimon catalog.
-- Query the schemas in the catalog. SELECT * FROM hologres.hg_external_schemas('<ext_database_name>'); -- Query the tables in the schema. SELECT * FROM hologres.hg_external_tables ('<ext_database_name>', '<ext_schema_name>');Query data in a table.
SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;
Use STS
Resource Access Management (RAM) provides two types of identities: RAM users and RAM roles. RAM roles don't have permanent identity credentials. Instead, they obtain temporary identity credentials, called STS tokens, from Service Token Service (STS). You can define the validity period and access permissions of STS tokens.
Use case
STS is suitable for accessing a Paimon catalog across accounts or using a custom Hologres account (BASIC account).
Procedure
Log on to the Resource Access Management (RAM) console, create a RAM role, and grant the AliyunDLFFullAccess or AliyunDLFReadOnlyAccess permission to the role. For more information, see Create a RAM role and grant permissions to the RAM role.
Add a trust policy to allow the RAM role to be assumed by Hologres. For more information, see Modify the trust policy of a RAM role.
{ "Statement": [ { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": [ "hologres.aliyuncs.com" ] } } ], "Version": "1" }Log on to the DLF console and grant the RAM role the permissions to access the database and tables in the Paimon catalog.
Create an external database in Hologres and specify the RAM role for authentication.
CREATE EXTERNAL DATABASE <ext_database_name> WITH catalog_type 'paimon' metastore_type 'dlf-rest' dlf_catalog 'paimon_catalog' rolearn 'acs:ram::106380604****:role/***-ramrole' comment 'paimon catalog on dlf' ;Create a user mapping in Hologres to associate the RAM role with the RAM user or BASIC account who accesses the DLF catalog.
CREATE USER MAPPING FOR "<RAM_user|BASIC_account>" EXTERNAL DATABASE <ext_database_name> OPTIONS ( rolearn 'acs:ram::10638060***:role/***ramrole' );NoteFor more information about user mappings, see CREATE USER MAPPING.
For more information about creating a RAM user, see Create a RAM user.
For more information about creating a BASIC account, see User management.
Log on to the Hologres Management Console as a RAM user or reconnect to the Hologres instance as a BASIC account. Then, run the following SQL statement.
SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;
Related operations for external databases
View external databases in an instance
SELECT database_name, options FROM hologres.hg_external_databases();Refresh the metadata of an external database
REFRESH CACHE FOR EXTERNAL DATABASE <ext_database_name> WITH( cache_level = 'metadata');View the schemas in an external database
SELECT * FROM hologres.hg_external_schemas('<ext_database_name>');Related operations for external schemas
Create an external schema in a Paimon catalog
When you create an external schema, a database is physically created in the DLF Paimon catalog:
-- Create an external schema.
CREATE EXTERNAL SCHEMA [IF NOT EXISTS] <ext_database_name>.<ext_schema_name>;Refresh the metadata of an external schema
REFRESH CACHE FOR EXTERNAL SCHEMA <ext_database_name>.<ext_schema_name> WITH( cache_level = 'metadata'); Drop an external schema
When you drop an external schema, the database and all tables in the database are also dropped from the Paimon catalog. Perform this operation with caution.
-- Delete an external schema.
DROP EXTERNAL SCHEMA [IF NOT EXISTS] <ext_database_name>.<ext_schema_name>;View tables in a schema
SELECT * FROM hologres.hg_external_tables ('<ext_database_name>', '<ext_schema_name>');Related operations for external tables
Create an external table in a Paimon catalog
-- Create a non-partitioned table.
CREATE EXTERNAL TABLE <ext_database_name>.<ext_schema_name>.<ext_table_name>(
id TEXT,
created_at BIGINT,
type TEXT,
actor_id TEXT,
actor_login TEXT,
repo_id TEXT,
repo_name TEXT,
org TEXT,
org_login TEXT,
PRIMARY KEY(id)
) WITH (
"changelog-producer"='input',
"bucket"=6,
"bucket-key"='id'
);
-- Create a partitioned table.
CREATE EXTERNAL TABLE <ext_database_name>.<ext_schema_name>.<ext_table_name>(
id TEXT,
created_at BIGINT,
type TEXT,
actor_id TEXT,
actor_login TEXT,
repo_id TEXT,
repo_name TEXT,
org TEXT,
org_login TEXT
)
LOGICAL PARTITION BY LIST(created_at)
WITH (
"file_format" = 'orc',
"bucket"=6,
"bucket-key"='id'
);Refresh the metadata of an external table
REFRESH CACHE FOR EXTERNAL TABLE <ext_database_name>.<ext_schema_name>.<ext_table_name> WITH( cache_level = 'metadata');Drop an external table
When you drop an external table, the table is physically removed from the DLF Paimon catalog. Perform this operation with caution.
-- Delete the external table.
DROP EXTERNAL TABLE IF EXISTS <ext_database_name>.<ext_schema_name>.<ext_table_name>;Collect statistics for foreign tables
The ANALYZE and AUTO ANALYZE commands collect statistics about the content of tables in a database. The optimizer uses these statistics to generate optimal query plans and improve query efficiency.
-- Update the statistics of a table. By default, the statistics of all columns in the table are collected.
analyze <ext_database_name>.<ext_schema_name>.<ext_table_name>;
-- Update the statistics of a specific column. More data is sampled than when you update the statistics of the entire table, which makes the statistics more accurate. This is mainly used to update conditional columns.
analyze <ext_database_name>.<ext_schema_name>.<ext_table_name>(<colname>, <colname>);
-- Enable auto-analyze.
ALTER EXTERNAL DATABASE <ext_database_name> WITH enable_auto_analyze 'true';Time Travel
In Hologres V3.2 and later, you can query historical snapshots of Paimon tables based on a timestamp, version, or tag. You can also query the Snapshots and Tags system tables in Hologres to obtain the snapshot details of a table.
Query the Snapshots table
SELECT * FROM hologres.hg_list_snapshots('<ext_database_name>.<ext_schema_name>.<ext_table_name>;');The columns in the Snapshots table map to the Paimon system table columns as follows.
Snapshots table column | Type | Description | Corresponding column in the Paimon system table |
branch_name | TEXT | The branch name. | branch_name |
snapshot_id | TEXT | The snapshot ID. | snapshot_id |
schema_id | TEXT | The schema ID of the table. | schema_id |
commit_kind | TEXT | The commit type of the table data. | commit_kind |
commit_time | TIMESTAMPTZ | The commit time. | commit_time |
extend_info | TEXT (JSON) | Other properties of the Paimon Snapshots table. | Other properties of the Paimon Snapshots table |
You can also query the Snapshots table using the hologres.hg_list_versions function. For Paimon table queries, a version corresponds to a snapshot ID.
SELECT * FROM hologres.hg_list_versions('<ext_database_name>.<ext_schema_name>.<ext_table_name>');Query the Tags table
SELECT * FROM hologres.hg_list_tags('<ext_database_name>.<ext_schema_name>.<ext_table_name>');The columns in the Tags table map to the Paimon system table columns as follows.
Tags table column | Type | Description | Corresponding column in the Paimon system table |
branch_name | TEXT | The branch name. | branch_name |
tag_name | TEXT | The tag name. | tag_name |
snapshot_id | TEXT | The snapshot ID. | snapshot_id |
schema_id | TEXT | The schema ID of the table. | schema_id |
commit_time | TIMESTAMPTZ | The commit time. | commit_time |
extend_info | TEXT (JSON) | Other properties of the Tags table. | Other properties of the Tags table. |
Query historical snapshots based on a timestamp
SELECT * FROM '<ext_database_name>.<ext_schema_name>.<ext_table_name>' FOR TIMESTAMP AS OF '<timestamp>';Query historical snapshots based on a version
For Paimon tables, a version corresponds to a snapshot ID.
SELECT * FROM '<ext_database_name>.<ext_schema_name>.<ext_table_name>' FOR VERSION AS OF '<version>';Query historical snapshots based on a tag
SELECT * FROM '<ext_database_name>.<ext_schema_name>.<ext_table_name>' FOR TAG AS OF '<tag>';Query branches
In Hologres V3.2 and later, you can query data from a branch. The fallback branch is also supported. For more information, see Manage branches.
Query the Branches table
SELECT * FROM hologres.hg_list_branches('<ext_database_name>.<ext_schema_name>.<ext_table_name>');The columns in the branches table map to the Paimon system table columns as follows.
Branches table column | Type | Description | Corresponding column in the Paimon system table |
branch_name | TEXT | The branch name. | branch_name |
create_time | TIMESTAMPTZ | The time when the branch was created. | create_time |
extend_info | TEXT (JSON) | Other properties of the Branches table. | Other properties of the Branches table. |
Query data from a branch
SELECT * FROM '<ext_database_name>.<ext_schema_name>.<ext_table_name>' FOR branch AS OF '<tag>';If the scan.fallback-branch option is set for a Paimon table and a partition does not exist when a query job reads data from the current branch, the reader attempts to read the partition data from the fallback branch. For more information, see Manage branches.
Query historical snapshot from a branch
-- Query based on a timestamp.
SELECT * FROM '<ext_database_name>.<ext_schema_name>.<ext_table_name>' FOR branch AS OF '<branch_name>' TIMESTAMP AS OF '<timestamp>';
-- Query based on a tag.
SELECT * FROM '<ext_database_name>.<ext_schema_name>.<ext_table_name>' FOR branch AS OF '<branch_name>' TAG AS OF '<tag_name>';Data mirroring
Starting with V3.2, Hologres offers data and metadata mirroring to accelerate queries on data lake data. This feature enables near real-time or interval-based synchronization of changes from external data sources to Hologres. It supports both full table mirroring and partial mirroring of partitioned tables, allowing flexible control over the mirrored data scope. For more information, see Data mirroring.
Access Paimon tables by using foreign tables
Create a foreign server.
For more information, see More operations on foreign servers by using SQL statements.
-- create foreign server CREATE SERVER IF NOT EXISTS <server_name> FOREIGN DATA WRAPPER dlf_fdw OPTIONS ( catalog_type 'paimon', metastore_type 'dlf-rest', dlf_catalog '<dlf_catalog_name>' );Create a foreign table.
IMPORT FOREIGN SCHEMA
For more information, see IMPORT FOREIGN SCHEMA.
-- Create a foreign table using IMPORT FOREIGN SCHEMA. IMPORT FOREIGN SCHEMA <dlf_db_name> FROM SERVER <server_name> INTO <holo_schema_name> options (if_table_exist 'update');CREATE FOREIGN TABLE
For more information, see CREATE FOREIGN TABLE.
CREATE FOREIGN TABLE <foreign_table_name> ( { column_name data_type } [, ... ] ) SERVER <server_name> options ( schema_name '<dlf_database_name>', table_name '<dlf_table_name>' );
Access Paimon tables using dynamic tables
A dynamic table can automatically process and store the data aggregation results of one or more base tables. It provides various data refresh policies. This enables automatic data flow from base tables to the dynamic table, supporting unified development, automatic data flow, and processing timeliness. In Hologres V3.0 and later, you can use MaxCompute and DLF external tables as the base tables of a dynamic table.
Full refresh
CREATE DYNAMIC TABLE paimon_dt_table
WITH (
auto_refresh_mode='full',
freshness='3 minutes'
)AS SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;Incremental refresh
CREATE DYNAMIC TABLE paimon_dt_table_increamental
WITH (
auto_refresh_mode='incremental',
freshness='3 minutes'
)AS SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;Automatic refresh
CREATE DYNAMIC TABLE paimon_dt_table_auto
WITH (
auto_refresh_mode='auto',
freshness='3 minutes'
)AS SELECT * FROM <ext_database_name>.<ext_schema_name>.<ext_table_name>;DML operations
INSERT INTO
-- Use Hologres SQL to create a Paimon primary key table and insert data
CREATE EXTERNAL TABLE <ext_database_name>.<ext_schema_name>.<ext_table_name> (
id BIGINT,
customer_id BIGINT,
item TEXT,
amount FLOAT,
dt TEXT,
PRIMARY KEY (id)
)
WITH (
"table_format" = 'paimon',
"file_format" = 'orc',
"merge-engine" = 'deduplicate', -- Set merge engine. Modify as needed.
"bucket" = '4' -- Example number of buckets
);
INSERT INTO <ext_database_name>.<ext_schema_name>.<ext_table_name> (id, customer_id, item, amount, dt) VALUES
(1, 1001, 'apple', 3.50, '2025-09-10'),
(2, 1002, 'banana', 2.00, '2025-09-10'),
(3, 1003, 'orange', 4.20, '2025-09-11');INSERT OVERWRITE
-- Use Hologres SQL to create a Paimon append-only table and insert data
CREATE EXTERNAL TABLE <ext_database_name>.<ext_schema_name>.<ext_table_name> (
id BIGINT,
name TEXT,
amount FLOAT8,
dt TEXT
)
LOGICAL PARTITION BY LIST(dt)
WITH (
"table_format" = 'paimon',
"file_format" = 'parquet',
"merge-engine" = 'deduplicate',
"bucket-key" = 'id',
"bucket" = '2'
);
INSERT OVERWRITE <ext_database_name>.<ext_schema_name>.<ext_table_name>(id, name, amount, dt) VALUES
(1, 'Alice', 100.50, '2025-09-01'),
(2, 'Bob', 200.00, '2025-09-01'),
(3, 'Carol', 150.75, '2025-09-02');UPDATE
UPDATE <ext_database_name>.<ext_schema_name>.<ext_table_name> SET amount= 5 where id= 1;DELETE
DELETE FROM <ext_database_name>.<ext_schema_name>.<ext_table_name> WHERE id = 2;