Hologres V4.0 and later lets you mirror MaxCompute tables. After you enable mirroring, Hologres synchronizes data to its local storage in near-real time or at specified intervals. If the metadata or data in the external data source changes, Hologres automatically updates the mirrored data. You can mirror an entire table or specific partitions of a partitioned table. You can also use parameters to limit the data range for mirroring. This topic describes how to use the mirroring feature.
Features
You can use the ALTER EXTERNAL TABLE command to manually enable data mirroring for MaxCompute tables. This feature supports the following capabilities:
Enable mirroring for standard MaxCompute tables. This feature does not support MaxCompute tables of the Append2.0, atomicity, consistency, isolation, and durability (ACID) 1.0, or Delta Table types.
Mirror entire partitioned and non-partitioned tables, or mirror specific partitions of a partitioned table.
When you enable data mirroring, you can set
hg_experimental_maxcompute_sdk_split_file_num_limit = 1to ensure split stability.This feature supports schema evolution for foreign tables. Supported operations include adding, deleting, reordering, and renaming columns. For unsupported operations, Hologres automatically falls back to directly reading the foreign table. If a schema evolution occurs, the mirroring process automatically restarts.
Automatically discover and synchronize partitions of MaxCompute tables.
You can set common indexes on mirrored lake tables to improve query performance. Indexes help SQL queries hit data faster, reduce I/O, and use fewer computing resources. For more information, see CREATE TABLE. The supported indexes include the following:
data_mirroring_clustering_key
data_mirroring_bitmap_columns
data_mirroring_dictionary_encoding_columns
You can query mirrored data using primary and secondary instances, compute group instances, and Serverless resources.
You can view the progress of mirror synchronization.
You can view the storage usage of mirrored data at the instance, DB, and table levels.
You can modify the data mirroring configuration for a specific table.
You can disable the data mirroring feature for a specific table.
Prerequisites
A Hologres instance is created. For more information, see Purchase a Hologres instance.
MaxCompute is activated and a project is created. For more information, see Activate MaxCompute.
Permissions to access the MaxCompute project and tables have been granted to the user. For more information, see Manage user permissions using commands.
Notes
Only foreign tables whose metadata is mapped using an External Database are supported. Foreign tables created using Import Foreign Schema or Create Foreign Table are not supported.
Mirroring is not supported for MaxCompute tables of the Append2.0, ACID 1.0, or Delta Table types.
The MaxCompute table cannot contain data types that Hologres does not support. Otherwise, the mirroring operation fails. For more information about the data types that are supported by Hologres, see Data types.
Syntax
The SQL command is as follows:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable mirroring for the foreign table
data_mirroring_speed_up_enable = 'on|off',
[data_mirroring_partition_num='1~N|all',]
[data_mirroring_partition_list='pt1,pt2...',]
-- Specify the resources used for the mirror build process
[data_mirroring_guc_hg_computing_resource='[serverless | local]',]
[data_mirroring_guc_hg_experimental_serverless_computing_required_cores='<num>',]
-- Set indexes
[data_mirroring_clustering_key='[columnName{:asc]} [,...]]',]
[data_mirroring_dictionary_encoding_columns='[columnName [,...]]',]
[data_mirroring_bitmap_columns='[columnName [,...]]',]
);Parameters
Parameter | Description | Required | Details |
data_mirroring_speed_up_enable | Enables or disables the mirroring feature for the foreign table. | Yes |
|
data_mirroring_partition_num | Specifies the n most recent partitions to mirror. This parameter applies only to partitioned tables. | No |
|
data_mirroring_partition_list | Specifies a list of partitions to mirror. This parameter applies only to partitioned tables. | No | Partition name |
data_mirroring_guc_hg_computing_resource | Specifies the resources used for the mirror build process. | No |
|
data_mirroring_guc_hg_experimental_serverless_computing_required_cores | Specifies the number of cores required for the Serverless resource. | No | Data type: positive integer. This parameter is required only when data_mirroring_guc_hg_computing_resource is set to Serverless. |
Usage
Create an external database
You can create an external database to map the metadata of MaxCompute tables to Hologres. For more information, see CREATE EXTERNAL DATABASE.
The SQL command is as follows:
-- Globally enable the bigmeta link.
alter role all set hg_experimental_enable_big_meta_on_maxcompute_sdk = on;
-- Globally enable the CommonTable direct read link. You must manually enable this for instances upgraded to version 4.0. This is not required for new instances because it is enabled by default.
alter role all set hg_experimental_external_catalog_routing = 'odps:common_table';
-- Generate stable data shards.
alter role all set hg_experimental_maxcompute_sdk_split_file_num_limit = 1;
-- Create an external database.
CREATE EXTERNAL DATABASE ext_db WITH
metastore_type 'maxcompute'
mc_project '<mc_project_name>'
;Enable mirroring for a foreign table
Non-partitioned table
The SQL command is as follows:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable mirroring for the foreign table
data_mirroring_speed_up_enable = 'on',
-- Specify the resources used for the mirror build process
data_mirroring_guc_hg_computing_resource='local',
-- Set indexes
data_mirroring_clustering_key='c_int:asc,c_char:desc',
data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
data_mirroring_bitmap_columns='c_int:on'
);Partitioned table
Specify the number of partitions
The SQL command is as follows:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable mirroring for the foreign table
data_mirroring_speed_up_enable = 'on',
data_mirroring_partition_num='7',
-- Specify the resources used for the mirror build process
data_mirroring_guc_hg_computing_resource='serverless',
data_mirroring_guc_hg_experimental_serverless_computing_required_cores='16',
-- Set indexes
data_mirroring_clustering_key='c_int:asc,c_char:desc',
data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
data_mirroring_bitmap_columns='c_int:on'
);Specify all partitions
The following is the SQL command:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable mirroring for the foreign table
data_mirroring_speed_up_enable = 'on',
data_mirroring_partition_num='all',
-- Specify the resources used for the mirror build process
data_mirroring_guc_hg_computing_resource='local',
-- Set indexes
data_mirroring_clustering_key='c_int:asc,c_char:desc',
data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
data_mirroring_bitmap_columns='c_int:on'
);You can specify a list of partitions.
The SQL command is as follows:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable mirroring for the foreign table
data_mirroring_speed_up_enable = 'on',
data_mirroring_partition_list='pt1,pt2...',
-- Specify the resources used for the mirror build process
data_mirroring_guc_hg_computing_resource='local',
-- Set indexes
data_mirroring_clustering_key='c_int:asc,c_char:desc',
data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
data_mirroring_bitmap_columns='c_int:on'
);You can specify the number of partitions or the specific partitions to retain as historical mirrored partitions.
The SQL command is as follows:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable mirroring for the foreign table
data_mirroring_speed_up_enable = 'on',
data_mirroring_partition_num='7',
data_mirroring_partition_list='pt1',
-- Specify the resources used for the mirror build process
data_mirroring_guc_hg_computing_resource='local',
-- Set indexes
data_mirroring_clustering_key='c_int:asc,c_char:desc',
data_mirroring_dictionary_encoding_columns='c_int:on,c_char:auto',
data_mirroring_bitmap_columns='c_int:on'
);View the mirror build status
The SQL command is as follows:
select * from hologres.hg_datalake_get_mirror_status();The following result is returned:

Parameters
Parameter | Description |
external_db_name | The external database where the table for the mirror build task resides. |
external_schema_name | The external schema where the table for the mirror build task resides. |
external_table_name | The external table for the mirror build task. |
partition | The partition in the external table for the mirror build task. |
mirror_data_size | The size of the mirror file for the table or partition. |
mirror_start_time | The start time of the mirror build task. |
mirror_last_update_time | The last update time of the mirrored data. |
total_file_count | The total number of files. |
mirrored_file_count | The number of mirrored files. |
Modify the mirror configuration
The SQL command is as follows:
alter external table <ext_db.ext_schema.ext_table>
set(
-- Enable mirroring for the foreign table
data_mirroring_speed_up_enable = 'on',
data_mirroring_partition_num='12'
);Check whether a query uses mirror files
You can use explain analyze to view the execution plan. In the results, check the values of Meta mirror table count and Data mirror file count to check whether the mirror was hit.
The SQL command is as follows:
explain analyze select xx from <ext_db.ext_schema.ext_table>;The following result is returned:
QUERY PLAN
Gather (cost=0.00..5.80 rows=1000 width=116)
[128:1 id=100002 dop=1 time=26/26/26ms rows=107995(107995/107995/107995) mem=0/0/0B open=0/0/0ms get_next=26/26/26ms * ]
-> Local Gather (cost=0.00..5.01 rows=1000 width=116)
[id=2 dop=128 time=13/0/0ms rows=107995(97688/843/0) mem=0/0/0B open=0/0/0ms get_next=13/0/0ms local_dop=0/0/0]
-> Seq Scan on sls_ads_table (cost=0.00..5.01 rows=1000 width=116)
Foreign Table Type: DLF
[id=1 split_count=128 time=13/0/0ms rows=107995(97688/843/0) mem=37/5/4KB open=0/0/0ms get_next=13/0/0ms physical_reads=154(140/77/14) scan_rows=107995(97688/8999/671)]
ADVICE:
"[node id : 1] Table default."sls_ads_table" Miss Stats! please run 'analyze default."sls_ads_table"'. "
Query id:[1002007457093288810]
QE version: 2.0
Query Queue: init_warehouse.default_queue
======================cost======================
Total cost:[150] ms
Fast statistics cost:[0] ms
Optimizer cost:[73] ms
Build execution plan cost:[10] ms
Init execution plan cost:[9] ms
Start query cost:[13] ms
- Queue cost: [0] ms
- Wait schema cost:[0] ms
- Lock query cost:[0] ms
- Create dataset reader cost:[0] ms
- Create split reader cost:[3] ms
Get result cost:[45] ms
- Get the first block cost:[1] ms
====================resource====================
Memory: total 7 MB. Worker stats: max 3 MB, avg 3 MB, min 3 MB, max memory worker id: 1889346452478863373.
CPU time: total 96 ms. Worker stats: max 82 ms, avg 48 ms, min 14 ms, max CPU time worker id: 1889346412692090879.
DAG CPU time stats: max 77 ms, avg 38 ms, min 0 ms, cnt 2, max CPU time dag id: 1.
Fragment CPU time stats: max 66 ms, avg 25 ms, min 0 ms, cnt 3, max CPU time fragment id: 2.
Ec wait time: total 139 ms. Worker stats: max 79 ms, max(max) 2 ms, avg 69 ms, min 60 ms, max ec wait time worker id: 1889346452478863373, max(max) ec wait time worker id: 1889346412692090879.
Physical read bytes: total 1 MB. Worker stats: max 1 MB, avg 0 MB, min 0 MB, max physical read bytes worker id: 1889346412692090879.
Read bytes: total 7 MB. Worker stats: max 7 MB, avg 3 MB, min 0 MB, max read bytes worker id: 1889346412692090879.
DAG instance count: total 3. Worker stats: max 2, avg 1, min 1, max DAG instance count worker id: 1889346452478863373.
Fragment instance count: total 257. Worker stats: max 129, avg 128, min 128, max fragment instance count worker id: 1889346452478863373.
Meta mirror table count: use 1 miss 0.
Data mirror file count: use 12 miss 0.Disable mirroring for a foreign table
The SQL command is as follows:
alter external table <ext_db.ext_schema.ext_table>
set(
data_mirroring_speed_up_enable = 'off'
);After you disable mirroring, the system no longer synchronizes data to Hologres. Queries on the foreign table directly access the data in the data lake. The mirrored data files within Hologres are asynchronously deleted after 30 minutes.
View the mirror configuration
The SQL command is as follows:
SELECT * FROM hologres.hg_datalake_get_mirror_config();View the mirror progress
You can use the following SQL command to view the progress. In the command, external_database_name is the name of the External Database.
The SQL command is as follows:
SELECT *
FROM hologres.hg_user_datalake_mirror_cron_tasks
WHERE command::jsonb->>'external_db_name'='<ext_db>'
ORDER BY start_time DESC;