This topic describes how to use the ANALYZE command and the auto-analyze feature.
Analyze
Statistics are crucial for generating an optimal execution plan. Hologres collects sample statistics about data, such as data distribution, table and column characteristics, number of rows, number of columns, field width, cardinality, frequency, maximum and minimum values, high-frequency values, and bucketing distribution features. This information helps the optimizer update cost estimates for operator execution, prune the search space, determine the optimal join order, and estimate memory overhead and the degree of parallelism. This process helps generate a more efficient execution plan.
The ANALYZE command collects statistics about the content of tables in a database. The optimizer then uses these statistics to generate the most efficient query plan and improve query performance.
Syntax
-- Update the statistics of a table. By default, statistics are collected for all columns in the table. analyze <tablename>; -- Update the statistics of specific columns. This command samples more data than the command for updating table statistics, which makes the collected statistics more accurate. This command is mainly used to update statistics for columns that are used in conditional clauses. analyze <tablename>(<colname>, <colname>);Parameters
tablename specifies the name of the table for which you want to update statistics. colname specifies the name of the column for which you want to update statistics.
Syntax details
The two ANALYZE commands are described as follows.
Similarities
Both commands collect column statistics, including the number of rows, column width, most common values (MCVs), histograms, and the number of distinct values (NDV).
Both commands overwrite the statistics of the specified columns but do not affect the statistics of other columns. For example, the
analyze <tablename>(<colname1>);command overwrites (updates) the previously collected statistics for thecolname1column but does not change the statistics for thecolname2column.
Differences
The
analyze <tablename>;command calculates statistics based on sampled data.The
analyze <tablename>(<colname>, <colname>);command runs an `APPROX_COUNT_DISTINCT` calculation for the number of distinct values (NDV) of the specified columns. In many cases, the resulting value is more accurate than sampling but creates more overhead. Therefore, you should use this command only for specific, important columns. Statistics other than the NDV, such as histograms and width, are still collected through sampling.
Therefore, for a table defined as
table (colname1, colname2),analyze table;is not completely equivalent toanalyze table(colname1, colname2);.For frequently used JOIN columns and GROUP BY columns, use the
analyze <tablename>(<colname>, <colname>);command to collect additional statistics.
When to run ANALYZE
You should run the
analyze <tablename>;command in the following scenarios.After many INSERT, UPDATE, or DELETE operations are performed on a table, including data imports.
Before you perform a multi-table join query if performance has degraded. You should analyze the JOIN and GROUP BY columns.
After you execute the
CREATE FOREIGN TABLEcommand, use Analyze to collect statistics information for the foreign table.After you run the
IMPORT FOREIGN SCHEMAcommand to analyze the tables that you will query.After you run the
CREATE EXTERNAL DATABASEcommand to collect statistics for the new foreign table.
Usage notes
In Hologres V0.10 and V1.1, if you query a parent table, you must analyze the parent table. If you query a child table directly, you must analyze the child table. If you query both, you must analyze both. Otherwise, statistics may be incomplete.
If you encounter the following issues, you can run ANALYZE before an import task to improve efficiency.
A multi-table JOIN operation causes an out-of-memory (OOM) error. The typical error message is
Query executor exceeded total memory limitation xxxxx: yyyy bytes used.Low import efficiency. When you query or import data in Hologres, performance is low and running tasks take a long time to complete.
If a table has ultra-wide columns, such as BYTEA data for bitmaps or TEXT data that exceeds 1 KB, the statistics for these columns are not useful and cause sampling to consume more memory. For tables with such columns, avoid running the
analyze <tablename>;command. Instead, useanalyze <tablename>(<colname>, <colname>);to exclude the ultra-wide columns and analyze only the necessary columns, such as JOIN, GROUP BY, and filter columns.Note1 KB is an empirical value. You can adjust this width standard as needed.
Auto-analyze
To reduce repetitive manual ANALYZE operations, Hologres V0.10 and later support the auto-analyze feature. When auto-analyze is enabled, the system automatically determines whether to run ANALYZE on relevant tables in the background based on table creation, data writes, and modifications. This eliminates the need to manually run ANALYZE on tables, which reduces operational complexity and prevents missing statistics from forgotten ANALYZE operations.
Syntax
Check whether auto-analyze is enabled
SHOW hg_enable_start_auto_analyze_worker; -- Syntax for V1.1 and later to check the current enabling status SHOW hg_experimental_enable_start_auto_analyze_worker; -- Syntax for V0.10 to check the current enabling statusA superuser can use the following syntax to enable or disable the feature.
-- DB-level. Takes effect for the entire DB after execution. Syntax for enabling/disabling in V1.1 and later. ALTER DATABASE dbname SET hg_enable_start_auto_analyze_worker = ON; -- Enable (default) ALTER DATABASE dbname SET hg_enable_start_auto_analyze_worker = OFF; -- Disable -- DB-level. Takes effect for the entire DB after execution. Syntax for enabling/disabling in V0.10. ALTER DATABASE dbname SET hg_experimental_enable_start_auto_analyze_worker = ON; -- Enable (default) ALTER DATABASE dbname SET hg_experimental_enable_start_auto_analyze_worker = OFF; -- Disable -- DB-level. Enable auto-analyze for an external database. ALTER EXTERNAL DATABASE dbname WITH enable_auto_analyze 'true';
Limits
The auto-analyze feature in Hologres has the following limits:
The auto-analyze feature is supported only in Hologres V0.10 and later. You can view your Hologres instance version on the instance details page in the Hologres console. If your instance version is earlier than V0.10, you must upgrade it. For information about how to resolve common upgrade preparation failures, see Common upgrade preparation failures. You can also join the Hologres DingTalk group for feedback. For more information, see How do I get more online support?.
Only a superuser can enable or disable the auto-analyze feature.
Auto-analyze has the following limits on partitioned tables.
If a child partition table changes and triggers auto-analyze, its parent table is analyzed.
Partitioned tables have a row scan limit. During data sampling, the default maximum number of records scanned is 224 (16,777,216). If the total number of records across all child partition tables exceeds this limit, partition pruning is performed. In this case, only some partitions are sampled, and the total number of records in the sampled partitions does not exceed 16,777,216.
NoteStatistics for partition key columns are always complete and are not affected by pruning. However, this may affect the statistics of columns that have a data distribution similar to the partition key column, such as a column containing the same data as the partition key column. This means some values may not be sampled, and the row count estimate may be inaccurate. If needed, you can search for the DingTalk group ID 32314975 to join the Real-time Data Warehouse Hologres group and contact technical support. The technical support team can evaluate your instance and adjust the maximum number of scanned records.
By default, auto-analyze collects statistics for a maximum of 256 columns. If a table has more than 256 columns, the first 256 columns are selected. You can adjust this value by changing the
hg_experimental_auto_analyze_max_columns_countparameter.By default, a single worker is limited to 4 GB of memory. If ultra-wide columns exist, sampling may exceed this memory limit and cause ANALYZE to fail. You can adjust the
auto_analyze_work_memory_mbparameter to change this limit, but be aware of the impact on system memory. Higher instance types have more workers and a larger available memory limit for auto-analyze.Starting from Hologres V3.1.0, you can use ANALYZE and auto-analyze to collect statistics for foreign tables within an external database.
ANALYZE and auto-analyze are not supported for HMS foreign tables.
How auto-analyze works
After you enable auto-analyze, the system background periodically checks whether any tables need to be analyzed.
Standard tables (internal tables, including non-partitioned and partitioned tables)
Every minute, the system checks for recent actions on tables. These actions are primarily DML operations, such as INSERT, UPDATE, and DELETE, that can change the data volume. If any of the following conditions are met, the system triggers an ANALYZE operation for the table to collect statistics.
A DML operation has completed on the table, and the number of changed rows exceeds 10% of the total rows in the table. If the table is a child partition table, this threshold applies to the number of rows in that specific partition.
The table is cleared using the `TRUNCATE TABLE` command.
The table's Data Definition Language (DDL) changes. Examples include creating a table with `CREATE TABLE` or modifying the table schema with `ALTER TABLE`. This trigger does not include modifying table properties with `CALL SET_TABLE_PROPERTY`.
Every 10 minutes, the system checks for data changes in all internal tables. If the number of changed rows exceeds 10% since the last check, an ANALYZE operation is triggered for the table in the background.
NoteThis step detects data updated by non-explicit DML operations, such as real-time writes from Flink, Data Integration, or HoloClient.
Foreign tables
Every 4 hours, the system periodically checks for metadata or data changes in foreign tables. If the following condition is met, the system triggers an ANALYZE operation for the table to collect statistics.
The corresponding table in the external system, such as a MaxCompute foreign table or a DLF foreign table, has changed within the inspection interval (for example, the last 4 hours). A change is detected if the
last_modify_timeof the corresponding MaxCompute table falls within this interval.
NoteThe inspection and execution are part of the same scheduling task. Therefore, the start of the next inspection depends on the completion of the ANALYZE execution. The next inspection can begin as long as the scheduling epoch has passed since the previous inspection started.
Configuration parameters
After you enable auto-analyze, the system, by default, automatically performs periodic checks to determine which tables need to be analyzed. It then performs sampling calculations to collect statistics, which consumes some system resources.
In some scenarios, the default mechanism may not be suitable. For example, in scenarios where data is not written or updated frequently, you can reduce the frequency of auto-analyze by modifying the default parameters. You can change the default parameters as needed to optimize performance.
NoteOnly a superuser can adjust the default behavior of auto-analyze. All parameters must be set at the database level and take effect within one minute.
Syntax
-- A superuser modifies the default value of an auto-analyze parameter. ALTER DATABASE <dbname> SET <GUC>=<values>;dbname specifies the database name. GUC specifies the parameter name. values specifies the parameter value.
Parameter list
Parameter
Description
Supported versions
Default value
Example
autovacuum_naptime
The interval at which the system checks for the latest actions on tables. Unit: seconds (s).
V1.1.0 and later
NoteThis parameter must be adjusted by the backend. To make adjustments, search for the DingTalk group ID 32314975 to join the Real-time Data Warehouse Hologres group and submit a request.
60s
ALTER DATABASE <dbname> SET autovacuum_naptime = 60;ALTER DATABASE <dbname> SET autovacuum_naptime = '60s';ALTER DATABASE <dbname> SET autovacuum_naptime = '10min';
hg_auto_check_table_changes_interval
The interval at which the system checks for data changes in all internal tables. Unit: seconds (s).
V1.1.0 and later
600s (10 min)
-- Command syntax for V1.1 and later ALTER DATABASE <dbname> SET hg_auto_check_table_changes_interval = '600s'; -- Command syntax for V0.10 ALTER DATABASE <dbname> SET hg_experimental_auto_check_table_changes_interval = '600s';hg_auto_check_foreign_table_changes_interval
The interval at which the system checks for data changes in all foreign tables. Unit: seconds (s).
V1.1.0 and later
14400s (4 hours)
-- Command syntax for V1.1 and later ALTER DATABASE <dbname> SET hg_auto_check_foreign_table_changes_interval = '14400s'; -- Command syntax for V0.10 ALTER DATABASE <dbname> SET hg_experimental_auto_check_foreign_table_changes_interval = '14400s';hg_experimental_auto_analyze_max_columns_count
The number of columns for which statistics are automatically collected.
V1.1.0 and later
256
ALTER DATABASE <dbname> SET hg_experimental_auto_analyze_max_columns_count =300;auto_analyze_work_memory_mb
The memory limit for auto-analyze on a single table. Unit: MB.
V1.1.54 and later
The default is 4,096 MB (4 GB) for a single worker. The higher the instance type, the more workers are available, and the larger the actual memory limit.
Change the memory limit for auto-analyze on a single table to 9 GB.
ALTER DATABASE <dbname> SETauto_analyze_work_memory_mb =9216;hg_experimental_auto_analyze_start_time
The start time for auto-analyze to run each day.
NoteThe time zone must be the same as end_time, and the start time must be earlier than or equal to the end_time.
V1.1.54 and later
00:00 +0800
Change the settings so that auto-analyze runs only from 00:00 to 06:00. This is suitable for scenarios where data in internal and foreign tables does not change during the day and does not require analysis.
ALTER DATABASE <dbname> SET hg_experimental_auto_analyze_start_time = '00:00 +0800';ALTER DATABASE <dbname> SET hg_experimental_auto_analyze_end_time = '06:00 +0800';
hg_experimental_auto_analyze_end_time
The end time for auto-analyze to run each day.
V1.1.54 and later
23:59 +0800
autovacuum_enabled
The enabling status of auto-analyze for a table.
V1.1.54 and later
true, which means it is enabled for all tables by default.
Disable auto-analyze for a specific table. Future ANALYZE operations will skip this table.
NoteYou can use only the following command to disable auto-analyze for a Hologres internal table.
ALTER TABLE <tablename> SET (autovacuum_enabled = false);
Other optimizations
Hologres V3.1 introduces the Fast Rows feature. If Hologres finds that a table in a query lacks statistics, it directly retrieves the number of rows for that table from the storage engine. The cost of retrieving the row count is about 10 ms. The commands are as follows.
-- DB-level. Takes effect for the entire DB after execution. Syntax for enabling.
ALTER DATABASE dbname SET hg_experimental_get_fast_num_of_rows = ON; -- Enable (disabled by default)
-- DB-level. Takes effect for the entire DB after execution. Syntax for disabling.
ALTER DATABASE dbname SET hg_experimental_get_fast_num_of_rows = OFF; -- Disable (disabled by default)If the Fast Rows feature is disabled, you can reduce the likelihood of an incorrect plan in the following two cases.
If a table has no statistics, the query plan shows `rows=1000`. This indicates that statistics are missing, and the plan is generated based on an estimate of 1,000 rows.
If the system considers the table to be empty, the query plan shows `rows=1`. This indicates that statistics are missing, and the plan is generated based on an estimate of one row.

When the Fast Rows feature is enabled and a table has no statistics, the query plan shows a row count other than 1,000. This is because the system has called the storage engine to retrieve the actual number of rows for the table.

Query statistics
Table statistics are recorded in the hologres_statistic.hg_table_statistic table. You can query this table to check the status of ANALYZE operations using the following command.
To query information about the most recent ANALYZE operation, sort the results by analyze_timestamp.
SELECT schema_name, -- Table schema
table_name, -- Table name
schema_version, -- Table version
statistic_version, -- Statistics version from the last ANALYZE
total_rows, -- Number of rows from the last ANALYZE
analyze_timestamp -- End time of the last ANALYZE
FROM hologres_statistic.hg_table_statistic
WHERE table_name = '<tablename>'
ORDER BY analyze_timestamp DESC;Each table has
0 to nrecords in the hologres_statistic.hg_table_statistic table. Zero records means that ANALYZE has never been run. One or more records means that ANALYZE has been run at least once.If there are two or more records for a table, their schema_version values will be different. This occurs when the table schema has changed. For example, running a command such as
ADD COLUMNcreates a new schema version. A new statistics record is added, and the record corresponding to the old schema_version is no longer used.The following example shows a query result where the same table has two records. The schema_version of the second record is lower than that of the first. Therefore, the second record is obsolete and can be ignored.
schema_name | table_name | schema_version | statistic_version | total_rows | analyze_timestamp -------------+------------------+----------------+-------------------+------------+--------------------- public | tbl_name_example | 13 | 8580 | 10002 | 2022-04-29 16:03:18 public | tbl_name_example | 10 | 8576 | 10002 | 2022-04-29 15:41:20 (2 rows)Hologres V0.10 and V1.1 do not clean up historical expired records from the hg_table_statistic table. You can ignore this old data.
View tables with missing statistics
You can use the HG_STATS_MISSING view to identify which tables in the current database are missing statistics. For more information, see HG_STATS_MISSING View.
FAQ
If you encounter the following issues, auto-analyze may not be working correctly. Follow the suggested solutions to resolve them.
The table has no statistics records
Symptom: When you query the hologres_statistic.hg_table_statistic table for a table's statistics, no data is returned.
Possible causes:
Auto-analyze is not working, or the table does not meet the trigger conditions for auto-analyze.
There is an issue with the auto-analyze feature itself. You can submit a ticket to investigate the cause.
Solution: Manually trigger an ANALYZE operation.
analyze_timestampis too oldSymptom: In the query result, the
analyze_timestampis much earlier than the current time. This means that ANALYZE has not been run for a long time.Possible causes:
Auto-analyze failed to run.
Auto-analyze was manually disabled.
Solution: First, manually trigger an ANALYZE operation. Then, you can submit a ticket to investigate the cause.