All Products
Search
Document Center

Hologres:Data lake table mirroring

Last Updated:Oct 20, 2025

Hologres V3.2 introduces data lake table mirroring to accelerate data lake queries. This feature mirrors metadata and data by synchronizing changes from external data sources to Hologres in near real-time or at set intervals. You can mirror entire tables or specific partitions of partitioned tables. You can use parameters to control the scope of the mirrored data.

Feature details

When you access a data lake table through Hologres, the system automatically triggers a metadata mirror sync task. This task periodically synchronizes the latest metadata from the source table and caches it in Hologres. The default interval is one minute. On subsequent access, Hologres automatically uses the mirrored metadata to accelerate metadata access.

To mirror the data of a data lake table, you must manually enable the feature using the ALTER EXTERNAL TABLE command. The supported capabilities are as follows:

  • Supports enabling mirroring for Paimon append-only tables and primary key tables with deletion vectors.

  • Supports mirroring for entire partitioned and non-partitioned tables, and for specific partitions.

  • Supports schema evolution for external tables. Supported operations include adding, deleting, reordering, and renaming columns. For unsupported operations, the system automatically falls back to reading the external table directly.

  • Supports automatic discovery and synchronization of data lake table partitions.

  • Supports setting common internal table indexes on the table mirror. This allows SQL queries to quickly hit the data, which reduces I/O consumption and achieves faster query performance with fewer computing resources. For more information, see Set table properties and indexes. The supported indexes include the following:

    • data_mirroring_clustering_key

    • data_mirroring_bitmap_columns

    • data_mirroring_dictionary_encoding_columns

  • Supports querying mirrored data using secondary instances, compute group instances, and Serverless resources.

  • Supports viewing the progress of mirror synchronization.

  • Supports viewing the latest partition of a full mirror.

  • Supports viewing the mirror storage usage at the instance, DB, and table levels.

  • Supports modifying the data mirror configuration for a specific table.

  • Supports disabling the data mirroring feature for a specific table.

Limitations

  • External tables support metadata mapping using the External Database method, but not the Foreign Table method.

  • Supports external tables with metadata managed by Data Lake Formation (DLF).

  • Mirroring data from Paimon branches is not currently supported.

Syntax

ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table> 
SET(
   -- Enable table mirroring for the external table
  data_mirroring_speed_up_enable = 'on|off', 
  [data_mirroring_partition_num='1~N|all',] |[data_mirroring_partition_list='pt1,pt2...',] 
  
  -- Specify the resources used for the mirror build process
  [data_mirroring_guc_hg_computing_resource='[serverless | local]',]
  [data_mirroring_guc_hg_experimental_serverless_computing_required_cores='<num>',] 

  -- Set indexes
  [data_mirroring_clustering_key='[columnName{:asc]} [,...]]',]
  [data_mirroring_dictionary_encoding_columns='[columnName [,...]]',]
  [data_mirroring_bitmap_columns='[columnName [,...]]',]
  );

Parameters

Parameter

Required

Description

data_mirroring_speed_up_enable

Yes

Specifies whether to enable table mirroring for the external table. Valid values:

  • on: Enables mirroring.

  • off: Disables mirroring.

data_mirroring_partition_num

No

Specifies the n most recent partitions to mirror. This parameter applies only to partitioned tables. Valid values:

  • all: All partitions.

  • 1 to N: The n most recent partitions.

data_mirroring_partition_list

No

Specifies a list of partitions to mirror. This parameter applies only to partitioned tables.

data_mirroring_guc_hg_computing_resource

No

Specifies the resources to use for the mirror build process. The default value is Local.

Serverless: Uses Serverless resources for data mirroring.

Local: Uses the resources of the current instance.

data_mirroring_guc_hg_experimental_serverless_computing_required_cores

No

Specifies the number of cores required for Serverless resources.

Note

This parameter is required only when data_mirroring_guc_hg_computing_resource is set to Serverless.

Usage

Prerequisites

  • You have activated DLF V2.5 or later and created a Paimon catalog.

  • You have completed the following operations:

    1. Purchase a Hologres instance and activate the data lake acceleration service. For more information, see OSS data lake acceleration.

    2. Create an EXTERNAL DATABASE to map the data lake table metadata to Hologres. For more information, see CREATE EXTERNAL DATABASE.

Enable table mirroring for an external table

Non-partitioned tables

ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table> 
SET(
   -- Enable table mirroring for the external table
  data_mirroring_speed_up_enable = 'on', 
   
  -- Specify the resources for the mirror build process
  data_mirroring_guc_hg_computing_resource='local'
 
  -- Set indexes
  data_mirroring_clustering_key='c_int:asc,c_char:desc',
  data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
  data_mirroring_bitmap_columns='c_int:on'
  );

Partitioned tables

  • Specify the number of partitions

    ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table> 
    SET(
       -- Enable table mirroring for the external table
      data_mirroring_speed_up_enable = 'on', 
      data_mirroring_partition_num='7', 
      
      -- Specify the resources for the mirror build process
      data_mirroring_guc_hg_computing_resource='serverless'
      data_mirroring_guc_hg_experimental_serverless_computing_required_cores='16', 
    
      -- Set indexes
      data_mirroring_clustering_key='c_int:asc,c_char:desc',
      data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
      data_mirroring_bitmap_columns='c_int:on'
      );
  • Specify all partitions

    ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table> 
    SET(
       -- Enable table mirroring for the external table
      data_mirroring_speed_up_enable = 'on', 
      data_mirroring_partition_num='all', 
      
      -- Specify the resources for the mirror build process
      data_mirroring_guc_hg_computing_resource='local',
    
      -- Set indexes
      data_mirroring_clustering_key='c_int:asc,c_char:desc,
      data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
      data_mirroring_bitmap_columns='c_int:on'
      );
  • Specify a list of partitions

    ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table> 
    SET(
       -- Enable table mirroring for the external table
      data_mirroring_speed_up_enable = 'on', 
      data_mirroring_partition_list='pt1,pt2...', 
      
      -- Specify the resources for the mirror build process
      data_mirroring_guc_hg_computing_resource='local',
    
      -- Set indexes
      data_mirroring_clustering_key='c_int:asc,c_char:desc,
      data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
      data_mirroring_bitmap_columns='c_int:on'
      );
  • Specify the number of partitions and specific partitions to retain historical partition mirrors

    ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table> 
    SET(
       -- Enable table mirroring for the external table
      data_mirroring_speed_up_enable = 'on',
      data_mirroring_partition_num='7', 
      data_mirroring_partition_list='pt1', 
      
      -- Specify the resources for the mirror build process
      data_mirroring_guc_hg_computing_resource='local',
    
      -- Set indexes
      data_mirroring_clustering_key='c_int:asc,c_char:desc,
      data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
      data_mirroring_bitmap_columns='c_int:on'
      );

View the mirror build status

Run the following command to view the mirror build status.

SELECT * FROM hologres.hg_datalake_get_mirror_status();

The following table describes the fields in the output.

Field

Description

external_db_name

The External Database where the table for the mirror build task resides.

external_schema_name

The external schema where the table for the mirror build task resides.

external_table_name

The external table for the mirror build task.

partition

The partition in the external table for the mirror build task.

mirror_data_size

The size of the mirror file for the table or partition.

mirror_start_time

The start time of the mirror build task.

mirror_last_update_time

The last update time of the mirrored data.

total_file_count

Total file size

mirrored_file_count

Mirror file size.

Modify the mirror configuration

ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table> 
SET(
   -- Enable table mirroring for the external table
  data_mirroring_speed_up_enable = 'on', 
  data_mirroring_partition_num='12'
  );

Check if a query uses mirror files

explain analyze  SELECT xx FROM <holo_ext_db.ext_schema.ext_table>;

The following result is returned.

QUERY PLAN
Gather  (cost=0.00..5.80 rows=1000 width=116)
[128:1 id=100002 dop=1 time=26/26/26ms rows=107995(107995/107995/107995) mem=0/0/0B open=0/0/0ms get_next=26/26/26ms * ]
  ->  Local Gather  (cost=0.00..5.01 rows=1000 width=116)
      [id=2 dop=128 time=13/0/0ms rows=107995(97688/843/0) mem=0/0/0B open=0/0/0ms get_next=13/0/0ms local_dop=0/0/0]
        ->  Seq Scan on sls_ads_table  (cost=0.00..5.01 rows=1000 width=116)
              Foreign Table Type: DLF
            [id=1 split_count=128 time=13/0/0ms rows=107995(97688/843/0) mem=37/5/4KB open=0/0/0ms get_next=13/0/0ms physical_reads=154(140/77/14) scan_rows=107995(97688/8999/671)]

ADVICE: 
"[node id : 1] Table default."sls_ads_table" Miss Stats! please run 'analyze default."sls_ads_table"'. "

Query id:[1002007457093288810]
QE version: 2.0
Query Queue: init_warehouse.default_queue
======================cost======================
Total cost:[150] ms
Fast statistics cost:[0] ms
Optimizer cost:[73] ms
Build execution plan cost:[10] ms
Init execution plan cost:[9] ms
Start query cost:[13] ms
- Queue cost: [0] ms
- Wait schema cost:[0] ms
- Lock query cost:[0] ms
- Create dataset reader cost:[0] ms
- Create split reader cost:[3] ms
Get result cost:[45] ms
- Get the first block cost:[1] ms
====================resource====================
Memory: total 7 MB. Worker stats: max 3 MB, avg 3 MB, min 3 MB, max memory worker id: 1889346452478863373.
CPU time: total 96 ms. Worker stats: max 82 ms, avg 48 ms, min 14 ms, max CPU time worker id: 1889346412692090879.
DAG CPU time stats: max 77 ms, avg 38 ms, min 0 ms, cnt 2, max CPU time dag id: 1.
Fragment CPU time stats: max 66 ms, avg 25 ms, min 0 ms, cnt 3, max CPU time fragment id: 2.
Ec wait time: total 139 ms. Worker stats: max 79 ms, max(max) 2 ms, avg 69 ms, min 60 ms, max ec wait time worker id: 1889346452478863373, max(max) ec wait time worker id: 1889346412692090879.
Physical read bytes: total 1 MB. Worker stats: max 1 MB, avg 0 MB, min 0 MB, max physical read bytes worker id: 1889346412692090879.
Read bytes: total 7 MB. Worker stats: max 7 MB, avg 3 MB, min 0 MB, max read bytes worker id: 1889346412692090879.
DAG instance count: total 3. Worker stats: max 2, avg 1, min 1, max DAG instance count worker id: 1889346452478863373.
Fragment instance count: total 257. Worker stats: max 129, avg 128, min 128, max fragment instance count worker id: 1889346452478863373.
Meta mirror table count: use 1 miss 0.
Data mirror file count: use 12 miss 0.

An image hit is indicated by the last two metrics in the returned result: use and count.

Disable table mirroring for an external table

After you disable mirroring, the system stops synchronizing data to Hologres. Queries on the external table directly access the data lake data. The internally mirrored data files are asynchronously deleted after 30 minutes.

ALTER EXTERNAL TABLE <holo_ext_db.ext_schema.ext_table> SET(
  data_mirroring_speed_up_enable = 'off'
  );

View the mirror configuration

Run the following SQL statement to view which tables have data mirroring enabled. SELECT * FROM hologres.hg_datalake_get_mirror_config();

SELECT * FROM hologres.hg_datalake_get_mirror_config();

View the mirror progress

Run the following SQL statement to view the progress of the mirroring process.

SELECT * 
FROM hologres.hg_user_datalake_mirror_cron_tasks 
WHERE command::jsonb->>'external_db_name'='external_database_name'
ORDER BY start_time DESC;

In the statement, external_database_name is the name of the External Database.