Improve Query Performance with ANALYZE and AUTO ANALYZE - Hologres

Hologres uses table statistics to generate efficient query plans. The ANALYZE command collects those statistics manually; AUTO ANALYZE collects them in the background whenever it detects meaningful data changes.

How it works

The query optimizer relies on statistics—such as row count, column width, most common values (MCV), histogram, and number of distinct values (NDV)—to estimate costs, choose join order, and set the degree of parallelism. Without up-to-date statistics, the optimizer makes poor estimates and generates suboptimal execution plans.

ANALYZE collects statistics by sampling table data. AUTO ANALYZE runs ANALYZE in the background whenever it detects meaningful data changes, so you don't have to schedule it manually.

ANALYZE

Syntax

-- Collect statistics for all columns in a table
ANALYZE <tablename>;

-- Collect statistics for specific columns only
ANALYZE <tablename>(<colname>, <colname>);

Choose the right form

	`ANALYZE <tablename>`	`ANALYZE <tablename>(<colname>, ...)`
NDV calculation	Sampling	APPROX_COUNT_DISTINCT (more accurate in many cases)
Histogram and width	Sampling	Sampling
Overhead	Lower	Higher
Best for	General use after data imports	JOIN columns, GROUP BY columns, and filter columns that need precision

Both forms overwrite statistics only for the specified columns and leave all other columns unchanged. For example, ANALYZE t(col1) updates col1 but does not affect col2.

ANALYZE table is not fully equivalent to ANALYZE table(col1, col2) even when you list all columns, because the NDV calculation method differs.

For wide-column tables (Bitmap, Bytea, or Text columns larger than 1 KB), statistics on those columns provide little benefit and increase memory consumption during sampling. Skip wide columns and analyze only JOIN columns, GROUP BY columns, and filter columns:

ANALYZE <tablename>(<join_col>, <group_by_col>, <filter_col>);

1 KB is an empirical threshold. Adjust based on your workload.

When to run ANALYZE

Run ANALYZE in the following situations:

After bulk data changes: After large-scale INSERT, UPDATE, or DELETE operations, such as data imports.
Before multi-table JOINs with performance issues: Run ANALYZE <tablename>(<colname>, <colname>) on the JOIN columns and GROUP BY columns.
After creating or importing foreign schema: After CREATE FOREIGN TABLE, IMPORT FOREIGN SCHEMA, or CREATE EXTERNAL DATABASE, run ANALYZE on the tables you plan to query.
When queries run out of memory (OOM): If you see Query executor exceeded total memory limitation xxxxx: yyyy bytes used, run ANALYZE before the import task.
When import or query performance is unexpectedly slow: Run ANALYZE to refresh statistics before investigating further.

Usage notes

Partitioned tables (V0.10 and V1.1): If you query the parent partitioned table, run ANALYZE on the parent table. If you query child tables directly, run ANALYZE on the child tables. If you do both, run ANALYZE on both.
External Database foreign tables: Starting with Hologres V3.1.0, ANALYZE supports collecting statistics for foreign tables in External Databases.
HMS foreign tables: ANALYZE does not support HMS foreign tables.

AUTO ANALYZE

AUTO ANALYZE (available in Hologres V0.10 and later) monitors table activity in the background and triggers ANALYZE automatically. It reduces the need for manual ANALYZE and helps prevent the query performance degradation that comes from stale statistics.

Prerequisites

Before you begin, ensure that you have:

Hologres V0.10 or later. Check your version on the instance details page in the Hologres Management Console. If your instance is on an earlier version, see Common upgrade preparation failure errors or contact support via online channels
Superuser privileges to enable, disable, or configure AUTO ANALYZE

Check and enable AUTO ANALYZE

-- Check whether AUTO ANALYZE is enabled (V1.1 and later)
SHOW hg_enable_start_auto_analyze_worker;

-- Check whether AUTO ANALYZE is enabled (V0.10)
SHOW hg_experimental_enable_start_auto_analyze_worker;

-- Enable or disable at the database level (V1.1 and later)
ALTER DATABASE <dbname> SET hg_enable_start_auto_analyze_worker = ON;   -- Enable (default)
ALTER DATABASE <dbname> SET hg_enable_start_auto_analyze_worker = OFF;  -- Disable

-- Enable or disable at the database level (V0.10)
ALTER DATABASE <dbname> SET hg_experimental_enable_start_auto_analyze_worker = ON;
ALTER DATABASE <dbname> SET hg_experimental_enable_start_auto_analyze_worker = OFF;

-- Enable AUTO ANALYZE for an External Database
ALTER EXTERNAL DATABASE <dbname> WITH enable_auto_analyze 'true';

Trigger conditions

Internal tables (non-partitioned and partitioned):

Every minute, the system checks for recent DML activity. ANALYZE is triggered if any of the following conditions are met:

A DML operation changes more than 10% of the table's current row count (for partition child tables, more than 10% of the partition's row count).
TRUNCATE TABLE empties the table.
A DDL change occurs, such as CREATE TABLE or an ALTER TABLE schema change. (CALL SET_TABLE_PROPERTY does not count.)

Every 10 minutes, the system checks all internal tables for row count changes since the last check. If a table's row count changed by more than 10%, ANALYZE is triggered.

The 10-minute check detects data written by non-explicit DML operations, such as real-time writes from Flink, data integration pipelines, or HoloClient.

Foreign tables:

Every 4 hours, the system checks foreign table metadata for changes. ANALYZE is triggered when the last_modify_time of the corresponding external table (such as a MaxCompute or DLF table) falls within the inspection interval.

Each inspection cycle starts only after the previous ANALYZE finishes. As long as the time since the last inspection start meets the scheduling interval, the next inspection begins.

Limitations

Partitioned tables: When a child table changes, the system analyzes the parent table. Sampling scans up to 2^24 rows (16,777,216 rows) by default. If the total row count across all partitions exceeds this limit, the system applies partition pruning and samples a subset of partitions totaling no more than 16,777,216 rows.
Partition key column statistics are always complete and unaffected by pruning. However, pruning may affect statistics for columns co-located with the partition key (such as columns with identical data), which can cause some values to be missed and row count estimates to be inaccurate. To adjust the maximum scan row limit, contact Hologres support by joining DingTalk group 32314975.
Column limit: AUTO ANALYZE collects statistics for up to 256 columns by default. For tables with more than 256 columns, it processes the first 256. Adjust with hg_experimental_auto_analyze_max_columns_count.
Memory limit: AUTO ANALYZE limits memory to 4 GB per worker by default. Tables with very wide columns may exceed this limit and cause ANALYZE to fail. Adjust with auto_analyze_work_memory_mb, but consider the impact on overall system memory.
Foreign tables in External Databases: Supported from Hologres V3.1.0 onward.
HMS foreign tables: Not supported.

Configure AUTO ANALYZE behavior

All parameters apply at the database level and take effect within one minute. Only Superusers can modify them.

ALTER DATABASE <dbname> SET <parameter> = <value>;

Parameter	Description	Default	Version
`autovacuum_naptime`	Interval (in seconds) between checks for recent DML activity	60s	V1.1.0+ (requires backend adjustment; contact support via DingTalk group 32314975)
`hg_auto_check_table_changes_interval`	Interval for checking row count changes in all internal tables	600s (10 min)	V1.1.0+
`hg_auto_check_foreign_table_changes_interval`	Interval for checking data changes in all foreign tables	14400s (4 hours)	V1.1.0+
`hg_experimental_auto_analyze_max_columns_count`	Maximum number of columns for automatic statistics collection	256	V1.1.0+
`auto_analyze_work_memory_mb`	Memory limit per worker for AUTO ANALYZE, in MB	4096 MB (4 GB)	V1.1.54+
`hg_experimental_auto_analyze_start_time`	Daily start time for AUTO ANALYZE	00:00 +0800	V1.1.54+
`hg_experimental_auto_analyze_end_time`	Daily end time for AUTO ANALYZE	23:59 +0800	V1.1.54+
`autovacuum_enabled`	Enable or disable AUTO ANALYZE for a specific table	true	V1.1.54+

hg_experimental_auto_analyze_start_time and hg_experimental_auto_analyze_end_time must use the same time zone, and start time must be less than or equal to end time.

Examples:

-- Run AUTO ANALYZE only between 00:00 and 06:00 (UTC+8) when data is not changing
ALTER DATABASE <dbname> SET hg_experimental_auto_analyze_start_time = '00:00 +0800';
ALTER DATABASE <dbname> SET hg_experimental_auto_analyze_end_time = '06:00 +0800';

-- Increase the per-worker memory limit to 9 GB for wide-column tables
ALTER DATABASE <dbname> SET auto_analyze_work_memory_mb = 9216;

-- Disable AUTO ANALYZE for a specific table (internal tables only)
ALTER TABLE <tablename> SET (autovacuum_enabled = false);

V0.10 uses different parameter names: hg_experimental_auto_check_table_changes_interval and hg_experimental_auto_check_foreign_table_changes_interval. These were renamed in V1.1.

Fast Rows (V3.1 and later)

When a table in a query has no statistics, the optimizer defaults to estimating 1,000 rows—which can lead to poor query plans.

Fast Rows retrieves the actual row count directly from the storage engine (~10 ms per call) and passes it to the optimizer, reducing the chance of plan errors caused by missing statistics.

-- Enable Fast Rows (disabled by default)
ALTER DATABASE <dbname> SET hg_experimental_get_fast_num_of_rows = ON;

-- Disable Fast Rows
ALTER DATABASE <dbname> SET hg_experimental_get_fast_num_of_rows = OFF;

When Fast Rows is disabled, a table without statistics shows a row count of 1,000 in the query plan:

When Fast Rows is enabled, the system retrieves the actual row count from storage:

Monitor statistics

Table statistics are stored in hologres_statistic.hg_table_statistic. Query this table to check ANALYZE status. Sort by analyze_timestamp to get the most recent results.

SELECT schema_name,        -- Table schema
       table_name,         -- Table name
       schema_version,     -- Current schema version
       statistic_version,  -- Version of the most recent ANALYZE run
       total_rows,         -- Row count recorded by the most recent ANALYZE
       analyze_timestamp   -- End time of the most recent ANALYZE
FROM   hologres_statistic.hg_table_statistic
WHERE  table_name = '<tablename>'
ORDER BY analyze_timestamp DESC;

Reading the results:

Zero rows returned: ANALYZE has never run on this table.
Multiple rows: The table schema changed (for example, via ADD COLUMN), creating a new schema version. Rows with a lower schema_version are obsolete and can be ignored.

schema_name |    table_name    | schema_version | statistic_version | total_rows |  analyze_timestamp
-------------+------------------+----------------+-------------------+------------+---------------------
 public      | tbl_name_example |             13 |              8580 |      10002 | 2022-04-29 16:03:18
 public      | tbl_name_example |             10 |              8576 |      10002 | 2022-04-29 15:41:20

In this example, the second row has a lower schema_version and is obsolete.

Hologres V0.10 and V1.1 do not automatically clean up expired records. Ignore rows with outdated schema_version values.

To identify tables in your database that lack statistics, use the HG_STATS_MISSING view. For details, see HG_STATS_MISSING view.

FAQ

The `hg_table_statistic` table returns no rows for my table.

ANALYZE has not run on this table yet. Either AUTO ANALYZE is not running, or the table hasn't met the trigger conditions (for example, fewer than 10% of rows changed). Run ANALYZE manually:

ANALYZE <tablename>;

If the table still has no statistics after AUTO ANALYZE has had time to run, or if you suspect a feature issue, submit a ticket.

The `analyze_timestamp` is much older than the current time.

AUTO ANALYZE likely failed or was manually disabled. Run ANALYZE manually to refresh statistics, then submit a ticket to investigate the root cause.