Hologres integrates natively with MaxCompute. Its vector engine reads MaxCompute data directly, using multi-level partition filtering and operator pushdown to speed up queries.
Two approaches are available to accelerate MaxCompute queries in Hologres:
| Approach | Performance gain | Storage overhead | Best for |
|---|---|---|---|
| Import data into Hologres (recommended) | 10–100x faster than direct MaxCompute queries | Requires storage for a copy of the data | Frequent analytical queries, joins with internal tables |
| Query via foreign table | 2–5x faster than direct MaxCompute queries | No additional storage | Ad hoc queries, infrequent access, large datasets not worth copying |
Foreign table queries are always slower than queries on Hologres internal tables. If query performance is your top priority, import the data. Use foreign tables when storage cost or data freshness requirements make importing impractical.
Import data into Hologres (recommended)
Importing MaxCompute data into a Hologres internal table delivers 10–100x better query performance than foreign table queries. Use this approach when you run frequent analytical queries or join MaxCompute data with Hologres internal tables.
To import data, create a Hologres internal table with a suitable distribution key and index property, then use SQL to load the data from MaxCompute. For details, see Import data from MaxCompute to Hologres by executing SQL statements.
Query via foreign table
When importing data is not practical — for example, when data freshness requirements are strict or the dataset is too large to copy — optimize foreign table query performance using the strategies below. Apply them in the order listed: strategies earlier in the list yield the largest performance gains.
| Strategy | When to use | Performance impact |
|---|---|---|
| Use Serverless Computing | Large-scale ETL, bulk imports, queries on large datasets | Offloads compute; prevents OOM errors |
| Use the HQE execution engine | Tables in ORC format, Hologres V0.10+ | 30–100% improvement |
| Avoid full table scans | All queries | Reduces scanned data immediately |
| Optimize MaxCompute table structure for SQE | HQE unsupported (non-ORC, transactional, Schema Evolution) | Enables pruning and aggregation pushdown |
| Merge small files | Tables with >100 files and avg file size <64 MB | Reduces scan overhead |
| Tune query parameters | Specific workload characteristics | Fine-grained resource control |
Use Serverless Computing
For large-scale workloads — such as importing large volumes of MaxCompute data, running extract, transform, and load (ETL) jobs, or querying large foreign tables — use Serverless Computing (available in Hologres V2.1.17 and later).
Serverless Computing draws on additional compute resources outside your instance, so:
Your instance resources remain available for other workloads.
No need to reserve extra compute capacity for peak loads.
OOM (out-of-memory) errors are less likely.
You are charged only for the compute used by the task.
For details, see Serverless Computing overview and Use Serverless Computing.
Use the HQE execution engine
Hologres V0.10 and later use the HQE execution engine for foreign table queries, which improves performance by approximately 30–100% compared to earlier versions.
HQE has the following limitations:
HQE only accelerates queries on MaxCompute tables in ORC format. Other formats, such as CFile, are not supported.
HQE does not accelerate queries on MaxCompute transactional tables or tables with Schema Evolution. The system automatically falls back to the SQE engine in these cases.
Make sure the data type mapping between MaxCompute and Hologres tables is correct. Mismatches reduce the acceleration effect.
Avoid full table scans
Full table scans are the most common cause of slow foreign table queries. To avoid them:
Query only the columns you need. Use
SELECT col1, col2 FROM tableinstead ofSELECT * FROM table.Filter on partition columns. Add partition conditions in your
WHEREclause to limit the number of partitions scanned.
We strongly recommend using partition filters in every query on partitioned tables. Partition pruning is the single most effective way to reduce the data that Hologres must read from MaxCompute.
Optimize MaxCompute table structure for SQE
When HQE does not support a query, the system falls back to the SQE engine. To improve SQE performance, restructure the underlying MaxCompute table. Two clustering strategies are available:
| Strategy | Best for | Constraint |
|---|---|---|
| Hash clustering | Aggregation queries, bucket pruning | No data skew guarantee |
| Range clustering | Range queries, composite key queries, skew prevention | Globally ordered layout |
Convert to a hash clustering table
Hash clustering enables Bucket Pruning, aggregation optimization, and storage optimization. When you create or alter a table with CLUSTERED BY, MaxCompute hashes the specified columns and distributes data into buckets.
For the hash key, choose columns with few duplicate key values.
ALTER TABLE <table_name>
[CLUSTERED BY (<col_name> [, <col_name>, ...])
[SORTED BY (<col_name> [ASC | DESC] [, <col_name> [ASC | DESC] ...])]
INTO <number_of_buckets> BUCKETS];After the ALTER TABLE statement, only new partitions use hash clustering. For more information, see Hash Clustering.
Hash clustering tables have the following constraints:
Use
INSERT OVERWRITEto write data.INSERT INTOis not supported.Do not upload data to these tables using Tunnel. Data uploaded through Tunnel is unordered and incompatible with clustering requirements.
Convert to a range clustering table
Range clustering distributes data in a globally ordered layout, which prevents the data skew that hash clustering can cause. Its ordered distribution creates a two-level index that supports range queries and composite key queries.
ALTER TABLE <table_name>
[RANGE CLUSTERED BY (<col_name> [, <col_name>, ...])
[SORTED BY (<col_name> [ASC | DESC] [, <col_name> [ASC | DESC] ...])]
[INTO <number_of_buckets> BUCKETS];After the ALTER TABLE statement, only new partitions use range clustering. For more information, see Range Clustering.
Range clustering tables have the same constraints as hash clustering tables:
Use
INSERT OVERWRITEto write data.INSERT INTOis not supported.Do not upload data to these tables using Tunnel. Data uploaded through Tunnel is unordered and incompatible with clustering requirements.
Merge small files
Too many small files in a MaxCompute table slow down queries because Hologres must open and read each file individually. Run the following command in MaxCompute to check the file count:
DESC EXTENDED <table_name>;In the output, FileNum shows the number of files and Size shows the total size in bytes. If the table has more than 100 files and the average file size is below 64 MB, merge the small files.
Merging small files is safe to run at any time — it reorganizes data without changing query results. Schedule it as a regular maintenance task for tables that receive frequent small writes.
For guidance on how to merge files, see FAQ about small file optimization and job diagnostics.
Tune query parameters
Hologres sets default parameter values based on internal benchmarking. These defaults work well for most workloads — change them only if you have a specific reason.
| Parameter | Default | Notes |
|---|---|---|
hg_foreign_table_max_partition_limit | 512 (max: 1024) | Maximum partitions scanned per query. Setting this too high degrades performance. |
hg_experimental_query_batch_size | 8192 | Batch size for each read from a MaxCompute table. |
hg_foreign_table_split_size | 64 MB | Size per split when accessing a MaxCompute table. For large tables, increase this value to avoid excessive splits. Available in Hologres V1.1 and later. |
hg_foreign_table_executor_max_dop | Number of instance cores (max: 128) | Maximum concurrency for foreign table access. Keep this low to prevent foreign table queries from starving other workloads. Available in Hologres V1.1 and later. |
hg_foreign_table_executor_dml_max_dop | 32 | Maximum concurrency for DML statements on foreign tables. Limits resource consumption during data import and export. Available in Hologres V1.1 and later. |
Do not set hg_foreign_table_executor_max_dop to a very low value such as 1. Doing so can concentrate load on specific workers during concurrent read and write operations, triggering OOM errors.
Example: Adjust the maximum partition limit:
-- Default: 512. Maximum: 1024. Do not set too high.
SET hg_foreign_table_max_partition_limit = 128;Example: Adjust the split size for large tables:
-- Default: 64 MB. Increase for large tables to reduce the number of splits.
SET hg_foreign_table_split_size = 128;Example: Limit concurrency for foreign table access:
-- Default: number of instance cores. Maximum: 128.
SET hg_foreign_table_executor_max_dop = 32;