All Products
Search
Document Center

Hologres:Lake data mirroring

Last Updated:Mar 26, 2026

Data lake table mirroring, introduced in Hologres V3.2, accelerates queries on data lake tables by syncing metadata and data from external sources into Hologres in near real-time or at scheduled intervals. Mirror entire tables or specific partitions of partitioned tables, and control the scope of mirrored data with parameters.

How it works

Mirroring operates at two levels: metadata and data.

Metadata mirroring starts automatically the first time you query a data lake table. Hologres triggers a background sync task that pulls the latest metadata from the source table and caches it locally. The default sync interval is one minute. Subsequent queries use the cached metadata, eliminating repeated round-trips to the external catalog.

Data mirroring must be enabled manually using ALTER EXTERNAL TABLE. Once enabled, Hologres copies the actual table data—not just metadata—into local storage and keeps it in sync with the source. Data mirroring supports:

  • Paimon append-only tables and primary key tables with deletion vectors

  • Non-partitioned tables and partitioned tables, including specific partition subsets

  • Schema evolution: adding, deleting, reordering, and renaming columns. Unsupported schema changes fall back to reading the external table directly.

  • Automatic discovery and synchronization of new partitions

  • Common Hologres indexes (clustering key, dictionary encoding, bitmap) on the mirrored data to reduce I/O and accelerate queries

  • Querying mirrored data from secondary instances, compute group instances, and Serverless resources

  • Viewing the latest partition of a full mirror

  • Viewing the mirror storage usage at the instance, DB, and table levels

When to use data mirroring

Data mirroring provides the most benefit when:

  • Queries repeatedly scan the same data lake tables or partitions

  • Network latency to the remote storage is high or unstable

  • Query performance is bottlenecked by remote I/O, not by Hologres compute

For tables accessed only occasionally, metadata mirroring alone may be sufficient.

Limitations

  • External tables must use the External Database mapping method. The foreign table method is not supported.

  • Only external tables whose metadata is managed by Data Lake Formation (DLF) are supported.

  • Mirroring data from Apache Paimon branches is not currently supported.

Enable data mirroring

Before you enable data mirroring, ensure that you have:

  • Data Lake Formation (DLF) V2.5 or later activated, with an Apache Paimon catalog created

  • A Hologres instance with the data lake acceleration service enabled (see OSS data lake acceleration)

  • An External Database created to map data lake table metadata to Hologres (see CREATE EXTERNAL DATABASE)

Use ALTER EXTERNAL TABLE with data_mirroring_speed_up_enable = 'on' to enable mirroring on an external table. All examples below use this parameter.

Non-partitioned table

ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table>
SET(
  -- Enable data mirroring
  data_mirroring_speed_up_enable = 'on',

  -- Specify the resources for the mirror build process
  data_mirroring_guc_hg_computing_resource = 'local',

  -- Set indexes
  data_mirroring_clustering_key = 'c_int:asc,c_char:desc',
  data_mirroring_dictionary_encoding_columns = 'c_int:on,c_char:auto',
  data_mirroring_bitmap_columns = 'c_int:on'
);

Partitioned table

Choose one of the following partition scopes based on your needs.

Mirror the N most recent partitions (recommended for large tables with time-based partitions):

ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table>
SET(
  data_mirroring_speed_up_enable = 'on',
  data_mirroring_partition_num = '7',

  -- Use Serverless resources for the mirror build process
  data_mirroring_guc_hg_computing_resource = 'serverless',
  data_mirroring_guc_hg_experimental_serverless_computing_required_cores = '16',

  data_mirroring_clustering_key = 'c_int:asc,c_char:desc',
  data_mirroring_dictionary_encoding_columns = 'c_int:on,c_char:auto',
  data_mirroring_bitmap_columns = 'c_int:on'
);

Mirror all partitions:

ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table>
SET(
  data_mirroring_speed_up_enable = 'on',
  data_mirroring_partition_num = 'all',

  data_mirroring_guc_hg_computing_resource = 'local',

  data_mirroring_clustering_key = 'c_int:asc,c_char:desc',
  data_mirroring_dictionary_encoding_columns = 'c_int:on,c_char:auto',
  data_mirroring_bitmap_columns = 'c_int:on'
);

Mirror a specific list of partitions:

ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table>
SET(
  data_mirroring_speed_up_enable = 'on',
  data_mirroring_partition_list = 'pt1,pt2',

  data_mirroring_guc_hg_computing_resource = 'local',

  data_mirroring_clustering_key = 'c_int:asc,c_char:desc',
  data_mirroring_dictionary_encoding_columns = 'c_int:on,c_char:auto',
  data_mirroring_bitmap_columns = 'c_int:on'
);

Combine `partition_num` and `partition_list` to retain both recent partitions and specific historical partitions:

ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table>
SET(
  data_mirroring_speed_up_enable = 'on',
  data_mirroring_partition_num = '7',
  data_mirroring_partition_list = 'pt1',

  data_mirroring_guc_hg_computing_resource = 'local',

  data_mirroring_clustering_key = 'c_int:asc,c_char:desc',
  data_mirroring_dictionary_encoding_columns = 'c_int:on,c_char:auto',
  data_mirroring_bitmap_columns = 'c_int:on'
);

Parameters

Parameter Required Description
data_mirroring_speed_up_enable Yes Enables or disables data mirroring. Valid values: on, off.
data_mirroring_partition_num No Applies to partitioned tables. Specifies the number of most recent partitions to mirror. Valid values: all (all partitions) or an integer 1 to N (the N most recent partitions).
data_mirroring_partition_list No Applies to partitioned tables. Specifies a comma-separated list of partition names to mirror.
data_mirroring_guc_hg_computing_resource No Specifies the resources used for the mirror build process. Default: local. Valid values: local (current instance resources) or serverless (Serverless resources).
data_mirroring_guc_hg_experimental_serverless_computing_required_cores No Specifies the number of cores for the Serverless build process. Required only when data_mirroring_guc_hg_computing_resource is set to serverless.
data_mirroring_clustering_key No Sets a clustering key on the mirrored data to accelerate range queries.
data_mirroring_dictionary_encoding_columns No Sets dictionary encoding on specified columns to reduce storage and speed up queries on low-cardinality data.
data_mirroring_bitmap_columns No Sets bitmap indexes on specified columns to accelerate filter queries.

Monitor mirroring

Use the following tools depending on what you want to check.

Goal Tool
See which tables have mirroring enabled hg_datalake_get_mirror_config()
Check the build status and progress of active mirror tasks hg_datalake_get_mirror_status()
View the history and scheduling of mirror sync tasks hg_user_datalake_mirror_cron_tasks
Confirm whether a query hit the mirror EXPLAIN ANALYZE

View mirroring configuration

Run the following statement to list all tables that have data mirroring enabled:

SELECT * FROM hologres.hg_datalake_get_mirror_config();

Check mirror build status

Run the following statement to check the build status and file sync progress of active mirror tasks:

SELECT * FROM hologres.hg_datalake_get_mirror_status();

The output includes the following fields:

Field Description
external_db_name The External Database where the mirrored table resides.
external_schema_name The external schema where the mirrored table resides.
external_table_name The mirrored external table.
partition The partition being mirrored, if applicable.
mirror_data_size The size of the mirrored data for the table or partition.
mirror_start_time The start time of the mirror build task.
mirror_last_update_time The last time the mirrored data was updated.
total_file_count Total number of files in the source table or partition.
mirrored_file_count Number of files that have been mirrored so far.

To estimate sync progress, compare mirrored_file_count against total_file_count. When both values are equal, the mirror build is complete.

View mirror sync history

Run the following statement to view the scheduling and execution history of mirror sync tasks for a specific External Database:

SELECT *
FROM hologres.hg_user_datalake_mirror_cron_tasks
WHERE command::jsonb->>'external_db_name' = '<external_database_name>'
ORDER BY start_time DESC;

Replace <external_database_name> with the name of your External Database.

Confirm query hits the mirror

Run EXPLAIN ANALYZE on your query to check whether Hologres served the results from the mirrored data:

EXPLAIN ANALYZE SELECT <columns> FROM <holo_ext_db.ext_schema.ext_table>;

At the end of the output, look for the following two lines:

Meta mirror table count: use 1 miss 0.
Data mirror file count: use 12 miss 0.

A miss value of 0 means all data was served from the mirror. If miss is greater than 0, some data was still read from the remote data lake.

Modify the mirror configuration

To update the mirror configuration—for example, to change the number of partitions—run ALTER EXTERNAL TABLE again with the updated parameters:

ALTER EXTERNAL TABLE <ext_db.ext_schema.ext_table>
SET(
  data_mirroring_speed_up_enable = 'on',
  data_mirroring_partition_num = '12'
);

Disable data mirroring

To disable mirroring on a table, set data_mirroring_speed_up_enable to off:

ALTER EXTERNAL TABLE <holo_ext_db.ext_schema.ext_table>
SET(
  data_mirroring_speed_up_enable = 'off'
);

After mirroring is disabled, Hologres stops syncing data and queries access the data lake directly. The locally mirrored data files are asynchronously deleted within 30 minutes.

What's next