Hologres uses table statistics to generate efficient query plans. The ANALYZE command collects those statistics manually; AUTO ANALYZE collects them in the background whenever it detects meaningful data changes.
How it works
The query optimizer relies on statistics—such as row count, column width, most common values (MCV), histogram, and number of distinct values (NDV)—to estimate costs, choose join order, and set the degree of parallelism. Without up-to-date statistics, the optimizer makes poor estimates and generates suboptimal execution plans.
ANALYZE collects statistics by sampling table data. AUTO ANALYZE runs ANALYZE in the background whenever it detects meaningful data changes, so you don't have to schedule it manually.
ANALYZE
Syntax
-- Collect statistics for all columns in a table
ANALYZE <tablename>;
-- Collect statistics for specific columns only
ANALYZE <tablename>(<colname>, <colname>);Choose the right form
ANALYZE <tablename> | ANALYZE <tablename>(<colname>, ...) | |
|---|---|---|
| NDV calculation | Sampling | APPROX_COUNT_DISTINCT (more accurate in many cases) |
| Histogram and width | Sampling | Sampling |
| Overhead | Lower | Higher |
| Best for | General use after data imports | JOIN columns, GROUP BY columns, and filter columns that need precision |
Both forms overwrite statistics only for the specified columns and leave all other columns unchanged. For example, ANALYZE t(col1) updates col1 but does not affect col2.
ANALYZE tableis not fully equivalent toANALYZE table(col1, col2)even when you list all columns, because the NDV calculation method differs.
For wide-column tables (Bitmap, Bytea, or Text columns larger than 1 KB), statistics on those columns provide little benefit and increase memory consumption during sampling. Skip wide columns and analyze only JOIN columns, GROUP BY columns, and filter columns:
ANALYZE <tablename>(<join_col>, <group_by_col>, <filter_col>);1 KB is an empirical threshold. Adjust based on your workload.
When to run ANALYZE
Run ANALYZE in the following situations:
After bulk data changes: After large-scale INSERT, UPDATE, or DELETE operations, such as data imports.
Before multi-table JOINs with performance issues: Run
ANALYZE <tablename>(<colname>, <colname>)on the JOIN columns and GROUP BY columns.After creating or importing foreign schema: After
CREATE FOREIGN TABLE,IMPORT FOREIGN SCHEMA, orCREATE EXTERNAL DATABASE, run ANALYZE on the tables you plan to query.When queries run out of memory (OOM): If you see
Query executor exceeded total memory limitation xxxxx: yyyy bytes used, run ANALYZE before the import task.When import or query performance is unexpectedly slow: Run ANALYZE to refresh statistics before investigating further.
Usage notes
Partitioned tables (V0.10 and V1.1): If you query the parent partitioned table, run ANALYZE on the parent table. If you query child tables directly, run ANALYZE on the child tables. If you do both, run ANALYZE on both.
External Database foreign tables: Starting with Hologres V3.1.0, ANALYZE supports collecting statistics for foreign tables in External Databases.
HMS foreign tables: ANALYZE does not support HMS foreign tables.
AUTO ANALYZE
AUTO ANALYZE (available in Hologres V0.10 and later) monitors table activity in the background and triggers ANALYZE automatically. It reduces the need for manual ANALYZE and helps prevent the query performance degradation that comes from stale statistics.
Prerequisites
Before you begin, ensure that you have:
Hologres V0.10 or later. Check your version on the instance details page in the Hologres Management Console. If your instance is on an earlier version, see Common upgrade preparation failure errors or contact support via online channels
Superuser privileges to enable, disable, or configure AUTO ANALYZE
Check and enable AUTO ANALYZE
-- Check whether AUTO ANALYZE is enabled (V1.1 and later)
SHOW hg_enable_start_auto_analyze_worker;
-- Check whether AUTO ANALYZE is enabled (V0.10)
SHOW hg_experimental_enable_start_auto_analyze_worker;-- Enable or disable at the database level (V1.1 and later)
ALTER DATABASE <dbname> SET hg_enable_start_auto_analyze_worker = ON; -- Enable (default)
ALTER DATABASE <dbname> SET hg_enable_start_auto_analyze_worker = OFF; -- Disable
-- Enable or disable at the database level (V0.10)
ALTER DATABASE <dbname> SET hg_experimental_enable_start_auto_analyze_worker = ON;
ALTER DATABASE <dbname> SET hg_experimental_enable_start_auto_analyze_worker = OFF;
-- Enable AUTO ANALYZE for an External Database
ALTER EXTERNAL DATABASE <dbname> WITH enable_auto_analyze 'true';Trigger conditions
Internal tables (non-partitioned and partitioned):
Every minute, the system checks for recent DML activity. ANALYZE is triggered if any of the following conditions are met:
A DML operation changes more than 10% of the table's current row count (for partition child tables, more than 10% of the partition's row count).
TRUNCATE TABLEempties the table.A DDL change occurs, such as
CREATE TABLEor anALTER TABLEschema change. (CALL SET_TABLE_PROPERTYdoes not count.)
Every 10 minutes, the system checks all internal tables for row count changes since the last check. If a table's row count changed by more than 10%, ANALYZE is triggered.
The 10-minute check detects data written by non-explicit DML operations, such as real-time writes from Flink, data integration pipelines, or HoloClient.
Foreign tables:
Every 4 hours, the system checks foreign table metadata for changes. ANALYZE is triggered when the last_modify_time of the corresponding external table (such as a MaxCompute or DLF table) falls within the inspection interval.
Each inspection cycle starts only after the previous ANALYZE finishes. As long as the time since the last inspection start meets the scheduling interval, the next inspection begins.
Limitations
Partitioned tables: When a child table changes, the system analyzes the parent table. Sampling scans up to 2^24 rows (16,777,216 rows) by default. If the total row count across all partitions exceeds this limit, the system applies partition pruning and samples a subset of partitions totaling no more than 16,777,216 rows.
Partition key column statistics are always complete and unaffected by pruning. However, pruning may affect statistics for columns co-located with the partition key (such as columns with identical data), which can cause some values to be missed and row count estimates to be inaccurate. To adjust the maximum scan row limit, contact Hologres support by joining DingTalk group 32314975.
Column limit: AUTO ANALYZE collects statistics for up to 256 columns by default. For tables with more than 256 columns, it processes the first 256. Adjust with
hg_experimental_auto_analyze_max_columns_count.Memory limit: AUTO ANALYZE limits memory to 4 GB per worker by default. Tables with very wide columns may exceed this limit and cause ANALYZE to fail. Adjust with
auto_analyze_work_memory_mb, but consider the impact on overall system memory.Foreign tables in External Databases: Supported from Hologres V3.1.0 onward.
HMS foreign tables: Not supported.
Configure AUTO ANALYZE behavior
All parameters apply at the database level and take effect within one minute. Only Superusers can modify them.
ALTER DATABASE <dbname> SET <parameter> = <value>;| Parameter | Description | Default | Version |
|---|---|---|---|
autovacuum_naptime | Interval (in seconds) between checks for recent DML activity | 60s | V1.1.0+ (requires backend adjustment; contact support via DingTalk group 32314975) |
hg_auto_check_table_changes_interval | Interval for checking row count changes in all internal tables | 600s (10 min) | V1.1.0+ |
hg_auto_check_foreign_table_changes_interval | Interval for checking data changes in all foreign tables | 14400s (4 hours) | V1.1.0+ |
hg_experimental_auto_analyze_max_columns_count | Maximum number of columns for automatic statistics collection | 256 | V1.1.0+ |
auto_analyze_work_memory_mb | Memory limit per worker for AUTO ANALYZE, in MB | 4096 MB (4 GB) | V1.1.54+ |
hg_experimental_auto_analyze_start_time | Daily start time for AUTO ANALYZE | 00:00 +0800 | V1.1.54+ |
hg_experimental_auto_analyze_end_time | Daily end time for AUTO ANALYZE | 23:59 +0800 | V1.1.54+ |
autovacuum_enabled | Enable or disable AUTO ANALYZE for a specific table | true | V1.1.54+ |
hg_experimental_auto_analyze_start_timeandhg_experimental_auto_analyze_end_timemust use the same time zone, and start time must be less than or equal to end time.
Examples:
-- Run AUTO ANALYZE only between 00:00 and 06:00 (UTC+8) when data is not changing
ALTER DATABASE <dbname> SET hg_experimental_auto_analyze_start_time = '00:00 +0800';
ALTER DATABASE <dbname> SET hg_experimental_auto_analyze_end_time = '06:00 +0800';
-- Increase the per-worker memory limit to 9 GB for wide-column tables
ALTER DATABASE <dbname> SET auto_analyze_work_memory_mb = 9216;
-- Disable AUTO ANALYZE for a specific table (internal tables only)
ALTER TABLE <tablename> SET (autovacuum_enabled = false);V0.10 uses different parameter names:hg_experimental_auto_check_table_changes_intervalandhg_experimental_auto_check_foreign_table_changes_interval. These were renamed in V1.1.
Fast Rows (V3.1 and later)
When a table in a query has no statistics, the optimizer defaults to estimating 1,000 rows—which can lead to poor query plans.
Fast Rows retrieves the actual row count directly from the storage engine (~10 ms per call) and passes it to the optimizer, reducing the chance of plan errors caused by missing statistics.
-- Enable Fast Rows (disabled by default)
ALTER DATABASE <dbname> SET hg_experimental_get_fast_num_of_rows = ON;
-- Disable Fast Rows
ALTER DATABASE <dbname> SET hg_experimental_get_fast_num_of_rows = OFF;When Fast Rows is disabled, a table without statistics shows a row count of 1,000 in the query plan:

When Fast Rows is enabled, the system retrieves the actual row count from storage:

Monitor statistics
Table statistics are stored in hologres_statistic.hg_table_statistic. Query this table to check ANALYZE status. Sort by analyze_timestamp to get the most recent results.
SELECT schema_name, -- Table schema
table_name, -- Table name
schema_version, -- Current schema version
statistic_version, -- Version of the most recent ANALYZE run
total_rows, -- Row count recorded by the most recent ANALYZE
analyze_timestamp -- End time of the most recent ANALYZE
FROM hologres_statistic.hg_table_statistic
WHERE table_name = '<tablename>'
ORDER BY analyze_timestamp DESC;Reading the results:
Zero rows returned: ANALYZE has never run on this table.
Multiple rows: The table schema changed (for example, via
ADD COLUMN), creating a new schema version. Rows with a lowerschema_versionare obsolete and can be ignored.
schema_name | table_name | schema_version | statistic_version | total_rows | analyze_timestamp
-------------+------------------+----------------+-------------------+------------+---------------------
public | tbl_name_example | 13 | 8580 | 10002 | 2022-04-29 16:03:18
public | tbl_name_example | 10 | 8576 | 10002 | 2022-04-29 15:41:20In this example, the second row has a lower schema_version and is obsolete.
Hologres V0.10 and V1.1 do not automatically clean up expired records. Ignore rows with outdated schema_version values.To identify tables in your database that lack statistics, use the HG_STATS_MISSING view. For details, see HG_STATS_MISSING view.
FAQ
The `hg_table_statistic` table returns no rows for my table.
ANALYZE has not run on this table yet. Either AUTO ANALYZE is not running, or the table hasn't met the trigger conditions (for example, fewer than 10% of rows changed). Run ANALYZE manually:
ANALYZE <tablename>;If the table still has no statistics after AUTO ANALYZE has had time to run, or if you suspect a feature issue, submit a ticket.
The `analyze_timestamp` is much older than the current time.
AUTO ANALYZE likely failed or was manually disabled. Run ANALYZE manually to refresh statistics, then submit a ticket to investigate the root cause.