All Products
Search
Document Center

Hologres:ANALYZE and auto-analyze

Last Updated:Sep 17, 2025

This topic describes how to use the ANALYZE command and the auto-analyze feature.

Analyze

Statistics are crucial for generating an optimal execution plan. Hologres collects sample statistics about data, such as data distribution, table and column characteristics, number of rows, number of columns, field width, cardinality, frequency, maximum and minimum values, high-frequency values, and bucketing distribution features. This information helps the optimizer update cost estimates for operator execution, prune the search space, determine the optimal join order, and estimate memory overhead and the degree of parallelism. This process helps generate a more efficient execution plan.

The ANALYZE command collects statistics about the content of tables in a database. The optimizer then uses these statistics to generate the most efficient query plan and improve query performance.

  • Syntax

    -- Update the statistics of a table. By default, statistics are collected for all columns in the table.
    analyze <tablename>;
    
    -- Update the statistics of specific columns. This command samples more data than the command for updating table statistics, which makes the collected statistics more accurate. This command is mainly used to update statistics for columns that are used in conditional clauses.
    analyze <tablename>(<colname>, <colname>);
  • Parameters

    tablename specifies the name of the table for which you want to update statistics. colname specifies the name of the column for which you want to update statistics.

  • Syntax details

    The two ANALYZE commands are described as follows.

    • Similarities

      • Both commands collect column statistics, including the number of rows, column width, most common values (MCVs), histograms, and the number of distinct values (NDV).

      • Both commands overwrite the statistics of the specified columns but do not affect the statistics of other columns. For example, the analyze <tablename>(<colname1>); command overwrites (updates) the previously collected statistics for the colname1 column but does not change the statistics for the colname2 column.

    • Differences

      • The analyze <tablename>; command calculates statistics based on sampled data.

      • The analyze <tablename>(<colname>, <colname>); command runs an `APPROX_COUNT_DISTINCT` calculation for the number of distinct values (NDV) of the specified columns. In many cases, the resulting value is more accurate than sampling but creates more overhead. Therefore, you should use this command only for specific, important columns. Statistics other than the NDV, such as histograms and width, are still collected through sampling.

      Therefore, for a table defined as table (colname1, colname2), analyze table; is not completely equivalent to analyze table(colname1, colname2);.

      For frequently used JOIN columns and GROUP BY columns, use the analyze <tablename>(<colname>, <colname>); command to collect additional statistics.

  • When to run ANALYZE

    You should run the analyze <tablename>; command in the following scenarios.

    • After many INSERT, UPDATE, or DELETE operations are performed on a table, including data imports.

    • Before you perform a multi-table join query if performance has degraded. You should analyze the JOIN and GROUP BY columns.

    • After you execute the CREATE FOREIGN TABLE command, use Analyze to collect statistics information for the foreign table.

    • After you run the IMPORT FOREIGN SCHEMA command to analyze the tables that you will query.

    • After you run the CREATE EXTERNAL DATABASE command to collect statistics for the new foreign table.

  • Usage notes

    • In Hologres V0.10 and V1.1, if you query a parent table, you must analyze the parent table. If you query a child table directly, you must analyze the child table. If you query both, you must analyze both. Otherwise, statistics may be incomplete.

    • If you encounter the following issues, you can run ANALYZE before an import task to improve efficiency.

      • A multi-table JOIN operation causes an out-of-memory (OOM) error. The typical error message is Query executor exceeded total memory limitation xxxxx: yyyy bytes used.

      • Low import efficiency. When you query or import data in Hologres, performance is low and running tasks take a long time to complete.

    • If a table has ultra-wide columns, such as BYTEA data for bitmaps or TEXT data that exceeds 1 KB, the statistics for these columns are not useful and cause sampling to consume more memory. For tables with such columns, avoid running the analyze <tablename>; command. Instead, use analyze <tablename>(<colname>, <colname>); to exclude the ultra-wide columns and analyze only the necessary columns, such as JOIN, GROUP BY, and filter columns.

      Note

      1 KB is an empirical value. You can adjust this width standard as needed.

Auto-analyze

To reduce repetitive manual ANALYZE operations, Hologres V0.10 and later support the auto-analyze feature. When auto-analyze is enabled, the system automatically determines whether to run ANALYZE on relevant tables in the background based on table creation, data writes, and modifications. This eliminates the need to manually run ANALYZE on tables, which reduces operational complexity and prevents missing statistics from forgotten ANALYZE operations.

  • Syntax

    • Check whether auto-analyze is enabled

      SHOW hg_enable_start_auto_analyze_worker;  -- Syntax for V1.1 and later to check the current enabling status
      
      SHOW hg_experimental_enable_start_auto_analyze_worker;  -- Syntax for V0.10 to check the current enabling status
    • A superuser can use the following syntax to enable or disable the feature.

      -- DB-level. Takes effect for the entire DB after execution. Syntax for enabling/disabling in V1.1 and later.
      ALTER DATABASE dbname SET hg_enable_start_auto_analyze_worker = ON;  -- Enable (default)
      ALTER DATABASE dbname SET hg_enable_start_auto_analyze_worker = OFF; -- Disable
      
      -- DB-level. Takes effect for the entire DB after execution. Syntax for enabling/disabling in V0.10.
      ALTER DATABASE dbname SET hg_experimental_enable_start_auto_analyze_worker = ON;  -- Enable (default)
      ALTER DATABASE dbname SET hg_experimental_enable_start_auto_analyze_worker = OFF; -- Disable
      
      -- DB-level. Enable auto-analyze for an external database.
      ALTER EXTERNAL DATABASE dbname WITH enable_auto_analyze 'true';
  • Limits

    The auto-analyze feature in Hologres has the following limits:

    • The auto-analyze feature is supported only in Hologres V0.10 and later. You can view your Hologres instance version on the instance details page in the Hologres console. If your instance version is earlier than V0.10, you must upgrade it. For information about how to resolve common upgrade preparation failures, see Common upgrade preparation failures. You can also join the Hologres DingTalk group for feedback. For more information, see How do I get more online support?.

    • Only a superuser can enable or disable the auto-analyze feature.

    • Auto-analyze has the following limits on partitioned tables.

      • If a child partition table changes and triggers auto-analyze, its parent table is analyzed.

      • Partitioned tables have a row scan limit. During data sampling, the default maximum number of records scanned is 224 (16,777,216). If the total number of records across all child partition tables exceeds this limit, partition pruning is performed. In this case, only some partitions are sampled, and the total number of records in the sampled partitions does not exceed 16,777,216.

        Note

        Statistics for partition key columns are always complete and are not affected by pruning. However, this may affect the statistics of columns that have a data distribution similar to the partition key column, such as a column containing the same data as the partition key column. This means some values may not be sampled, and the row count estimate may be inaccurate. If needed, you can search for the DingTalk group ID 32314975 to join the Real-time Data Warehouse Hologres group and contact technical support. The technical support team can evaluate your instance and adjust the maximum number of scanned records.

    • By default, auto-analyze collects statistics for a maximum of 256 columns. If a table has more than 256 columns, the first 256 columns are selected. You can adjust this value by changing the hg_experimental_auto_analyze_max_columns_count parameter.

    • By default, a single worker is limited to 4 GB of memory. If ultra-wide columns exist, sampling may exceed this memory limit and cause ANALYZE to fail. You can adjust the auto_analyze_work_memory_mb parameter to change this limit, but be aware of the impact on system memory. Higher instance types have more workers and a larger available memory limit for auto-analyze.

    • Starting from Hologres V3.1.0, you can use ANALYZE and auto-analyze to collect statistics for foreign tables within an external database.

    • ANALYZE and auto-analyze are not supported for HMS foreign tables.

  • How auto-analyze works

    After you enable auto-analyze, the system background periodically checks whether any tables need to be analyzed.

    • Standard tables (internal tables, including non-partitioned and partitioned tables)

      • Every minute, the system checks for recent actions on tables. These actions are primarily DML operations, such as INSERT, UPDATE, and DELETE, that can change the data volume. If any of the following conditions are met, the system triggers an ANALYZE operation for the table to collect statistics.

        • A DML operation has completed on the table, and the number of changed rows exceeds 10% of the total rows in the table. If the table is a child partition table, this threshold applies to the number of rows in that specific partition.

        • The table is cleared using the `TRUNCATE TABLE` command.

        • The table's Data Definition Language (DDL) changes. Examples include creating a table with `CREATE TABLE` or modifying the table schema with `ALTER TABLE`. This trigger does not include modifying table properties with `CALL SET_TABLE_PROPERTY`.

      • Every 10 minutes, the system checks for data changes in all internal tables. If the number of changed rows exceeds 10% since the last check, an ANALYZE operation is triggered for the table in the background.

        Note

        This step detects data updated by non-explicit DML operations, such as real-time writes from Flink, Data Integration, or HoloClient.

    • Foreign tables

      Every 4 hours, the system periodically checks for metadata or data changes in foreign tables. If the following condition is met, the system triggers an ANALYZE operation for the table to collect statistics.

      The corresponding table in the external system, such as a MaxCompute foreign table or a DLF foreign table, has changed within the inspection interval (for example, the last 4 hours). A change is detected if the last_modify_time of the corresponding MaxCompute table falls within this interval.

    Note

    The inspection and execution are part of the same scheduling task. Therefore, the start of the next inspection depends on the completion of the ANALYZE execution. The next inspection can begin as long as the scheduling epoch has passed since the previous inspection started.

  • Configuration parameters

    After you enable auto-analyze, the system, by default, automatically performs periodic checks to determine which tables need to be analyzed. It then performs sampling calculations to collect statistics, which consumes some system resources.

    In some scenarios, the default mechanism may not be suitable. For example, in scenarios where data is not written or updated frequently, you can reduce the frequency of auto-analyze by modifying the default parameters. You can change the default parameters as needed to optimize performance.

    Note

    Only a superuser can adjust the default behavior of auto-analyze. All parameters must be set at the database level and take effect within one minute.

    • Syntax

      -- A superuser modifies the default value of an auto-analyze parameter.
      ALTER DATABASE <dbname> SET <GUC>=<values>;

      dbname specifies the database name. GUC specifies the parameter name. values specifies the parameter value.

    • Parameter list

      Parameter

      Description

      Supported versions

      Default value

      Example

      autovacuum_naptime

      The interval at which the system checks for the latest actions on tables. Unit: seconds (s).

      V1.1.0 and later

      Note

      This parameter must be adjusted by the backend. To make adjustments, search for the DingTalk group ID 32314975 to join the Real-time Data Warehouse Hologres group and submit a request.

      60s

      • ALTER DATABASE <dbname> SET autovacuum_naptime = 60;

      • ALTER DATABASE <dbname> SET autovacuum_naptime = '60s';

      • ALTER DATABASE <dbname> SET autovacuum_naptime = '10min';

      hg_auto_check_table_changes_interval

      The interval at which the system checks for data changes in all internal tables. Unit: seconds (s).

      V1.1.0 and later

      600s (10 min)

      -- Command syntax for V1.1 and later
      ALTER DATABASE <dbname> SET hg_auto_check_table_changes_interval = '600s';
      -- Command syntax for V0.10
      ALTER DATABASE <dbname> SET hg_experimental_auto_check_table_changes_interval = '600s';

      hg_auto_check_foreign_table_changes_interval

      The interval at which the system checks for data changes in all foreign tables. Unit: seconds (s).

      V1.1.0 and later

      14400s (4 hours)

      -- Command syntax for V1.1 and later
      ALTER DATABASE <dbname> SET hg_auto_check_foreign_table_changes_interval = '14400s';
      -- Command syntax for V0.10
      ALTER DATABASE <dbname> SET hg_experimental_auto_check_foreign_table_changes_interval = '14400s';

      hg_experimental_auto_analyze_max_columns_count

      The number of columns for which statistics are automatically collected.

      V1.1.0 and later

      256

      ALTER DATABASE <dbname> SET hg_experimental_auto_analyze_max_columns_count =300;

      auto_analyze_work_memory_mb

      The memory limit for auto-analyze on a single table. Unit: MB.

      V1.1.54 and later

      The default is 4,096 MB (4 GB) for a single worker. The higher the instance type, the more workers are available, and the larger the actual memory limit.

      Change the memory limit for auto-analyze on a single table to 9 GB.

      ALTER DATABASE <dbname> SETauto_analyze_work_memory_mb =9216;

      hg_experimental_auto_analyze_start_time

      The start time for auto-analyze to run each day.

      Note

      The time zone must be the same as end_time, and the start time must be earlier than or equal to the end_time.

      V1.1.54 and later

      00:00 +0800

      Change the settings so that auto-analyze runs only from 00:00 to 06:00. This is suitable for scenarios where data in internal and foreign tables does not change during the day and does not require analysis.

      • ALTER DATABASE <dbname> SET hg_experimental_auto_analyze_start_time = '00:00 +0800';
      • ALTER DATABASE <dbname> SET hg_experimental_auto_analyze_end_time = '06:00 +0800';

      hg_experimental_auto_analyze_end_time

      The end time for auto-analyze to run each day.

      V1.1.54 and later

      23:59 +0800

      autovacuum_enabled

      The enabling status of auto-analyze for a table.

      V1.1.54 and later

      true, which means it is enabled for all tables by default.

      Disable auto-analyze for a specific table. Future ANALYZE operations will skip this table.

      Note

      You can use only the following command to disable auto-analyze for a Hologres internal table.

      ALTER TABLE <tablename> SET (autovacuum_enabled = false);

Other optimizations

Hologres V3.1 introduces the Fast Rows feature. If Hologres finds that a table in a query lacks statistics, it directly retrieves the number of rows for that table from the storage engine. The cost of retrieving the row count is about 10 ms. The commands are as follows.

-- DB-level. Takes effect for the entire DB after execution. Syntax for enabling.
ALTER DATABASE dbname SET hg_experimental_get_fast_num_of_rows = ON;  -- Enable (disabled by default)

-- DB-level. Takes effect for the entire DB after execution. Syntax for disabling.
ALTER DATABASE dbname SET hg_experimental_get_fast_num_of_rows = OFF;  -- Disable (disabled by default)
  • If the Fast Rows feature is disabled, you can reduce the likelihood of an incorrect plan in the following two cases.

    • If a table has no statistics, the query plan shows `rows=1000`. This indicates that statistics are missing, and the plan is generated based on an estimate of 1,000 rows.

    • If the system considers the table to be empty, the query plan shows `rows=1`. This indicates that statistics are missing, and the plan is generated based on an estimate of one row.

    image

  • When the Fast Rows feature is enabled and a table has no statistics, the query plan shows a row count other than 1,000. This is because the system has called the storage engine to retrieve the actual number of rows for the table.

    image

Query statistics

Table statistics are recorded in the hologres_statistic.hg_table_statistic table. You can query this table to check the status of ANALYZE operations using the following command.

Note

To query information about the most recent ANALYZE operation, sort the results by analyze_timestamp.

SELECT schema_name,                -- Table schema
       table_name,                 -- Table name
       schema_version,             -- Table version
       statistic_version,          -- Statistics version from the last ANALYZE
       total_rows,                 -- Number of rows from the last ANALYZE
       analyze_timestamp           -- End time of the last ANALYZE
FROM   hologres_statistic.hg_table_statistic
WHERE  table_name = '<tablename>'
ORDER BY analyze_timestamp DESC;
  • Each table has 0 to n records in the hologres_statistic.hg_table_statistic table. Zero records means that ANALYZE has never been run. One or more records means that ANALYZE has been run at least once.

  • If there are two or more records for a table, their schema_version values will be different. This occurs when the table schema has changed. For example, running a command such as ADD COLUMN creates a new schema version. A new statistics record is added, and the record corresponding to the old schema_version is no longer used.

  • The following example shows a query result where the same table has two records. The schema_version of the second record is lower than that of the first. Therefore, the second record is obsolete and can be ignored.

     schema_name |    table_name    | schema_version | statistic_version | total_rows |  analyze_timestamp
    -------------+------------------+----------------+-------------------+------------+---------------------
     public      | tbl_name_example |             13 |              8580 |      10002 | 2022-04-29 16:03:18
     public      | tbl_name_example |             10 |              8576 |      10002 | 2022-04-29 15:41:20
    (2 rows)
  • Hologres V0.10 and V1.1 do not clean up historical expired records from the hg_table_statistic table. You can ignore this old data.

View tables with missing statistics

You can use the HG_STATS_MISSING view to identify which tables in the current database are missing statistics. For more information, see HG_STATS_MISSING View.

FAQ

If you encounter the following issues, auto-analyze may not be working correctly. Follow the suggested solutions to resolve them.

  • The table has no statistics records

    • Symptom: When you query the hologres_statistic.hg_table_statistic table for a table's statistics, no data is returned.

    • Possible causes:

      • Auto-analyze is not working, or the table does not meet the trigger conditions for auto-analyze.

      • There is an issue with the auto-analyze feature itself. You can submit a ticket to investigate the cause.

    • Solution: Manually trigger an ANALYZE operation.

  • analyze_timestamp is too old

    • Symptom: In the query result, the analyze_timestamp is much earlier than the current time. This means that ANALYZE has not been run for a long time.

    • Possible causes:

      • Auto-analyze failed to run.

      • Auto-analyze was manually disabled.

    • Solution: First, manually trigger an ANALYZE operation. Then, you can submit a ticket to investigate the cause.