This topic describes how to use the ANALYZE statement and the auto-analyze feature.

Analyze

Statistics determine whether a suitable execution plan can be generated. Hologres collects statistics on sample data to analyze the data to be queried with respect to the following aspects: data distribution and characteristics, table statistics, column statistics, number of rows, number of columns, column width, cardinality, frequency, maximum value, minimum value, long key value, and bucket distribution and characteristics. These statistics can help update the estimated cost for operator execution, narrow down the query range, and estimate the optimal join order, the memory overhead, and the degree of parallelism (DOP). This way, the optimizer can generate an optimal execution plan.

You can use the ANALYZE statement to collect statistics of table data in your database. Then, the query optimizer of Hologres generates an optimal query plan based on the statistics to improve query efficiency.
  • Syntax
    -- Update the statistics on a table. By default, statistics on all columns in the table are collected.
    analyze <tablename>;
    
    -- Update the statistics on a column. This statement collects statistics on more sample data and is more precise than that used to update the statistics on a table.
    analyze <tablename>(<colname>, <colname>);
  • Parameters

    The tablename parameter specifies the name of the table for which to update statistics, and the colname parameter specifies the name of the column for which to update statistics.

In the following scenarios, we recommend that you execute the analyze <tablename>; statement on a table:
  • Data is imported into the table.
  • A large number of INSERT, UPDATE, and DELETE operations are performed on the table.
  • Statistics need to be collected for both internal and foreign tables.
  • Statistics need to be collected for a parent partitioned table.
  • If you encounter one of the following problems, you must execute the ANALYZE <tablename> statement before you run an import task. This can improve efficiency.
    • An out-of-memory (OOM) exception occurs when multiple tables are joined. The Query executor exceeded total memory limitation xxxxx: yyyy bytes used error is reported.
    • Import efficiency is low. When you query or import data in Hologres, tasks time out without completing. This results in low efficiency.
Table statistics are recorded in a table named hologres_statistic.hg_table_statistic. You can execute the following statement to understand the execution status of the ANALYZE statement:
SELECT schema_name, table_name, sample_rows, analyze_timestamp
FROM hologres_statistic.hg_table_statistic WHERE table_name='<tablename>';

Auto-analyze

Hologres V0.10 and later provide the auto-analyze feature so that you do not need to manually execute the ANALYZE statement to analyze tables. You can enable the auto-analyze feature by executing a statement. After the auto-analyze feature is enabled, Hologres determines whether to automatically analyze tables based on the write, modify, and query operations that you performed on tables. This simplifies table management.

Limits

When you use the auto-analyze feature of Hologres, take note of the following limits:
  • The auto-analyze feature is supported only in Hologres V0.10 and later. You can view the version of your Hologres instance on the instance details page in the Hologres console. If the version of your instance is earlier than V0.10, submit a ticket to upgrade your instance.
  • The auto-analyze feature is enabled by default in Hologres V1.1 and later. In Hologres V0.10, the auto-analyze feature is disabled by default, and you must execute a statement to enable the feature.
  • Only the superuser can enable or disable the auto-analyze feature.
  • The auto-analyze feature takes effect on all tables in a database. You cannot enable or disable the auto-analyze feature for specific tables.

Enable the auto-analyze feature

You can execute the following statement to check whether the auto-analyze feature is enabled:
-- Syntax in Hologres V1.1
show hg_enable_start_auto_analyze_worker;

-- Syntax in Hologres V0.10
show hg_experimental_enable_start_auto_analyze_worker;
You can execute a statement to enable the auto-analyze feature.
  • In Hologres V1.1, the auto-analyze feature is enabled by default. If the auto-analyze feature is disabled, you can execute the following statement to enable the feature:
    -- The feature setting takes effect on the specified database. Use the following syntax in Hologres V1.1 and later:
    alter database <databasename> set hg_enable_start_auto_analyze_worker = on;
  • In Hologres V0.10, the auto-analyze feature is disabled by default. You can execute the following statement to enable the feature:
    -- The feature setting takes effect on the specified database. Use the following syntax in Hologres V0.10:
    alter database <databasename> set hg_experimental_enable_start_auto_analyze_worker = on;
The auto-analyze feature is triggered in the following scenarios. If you are the superuser, you can modify the default settings of the parameters based on your business requirements. This allows you to optimize the feature performance in a fine-grained manner. For more information about how this feature is triggered and how to configure the parameters, see the "Modify the default settings of the auto-analyze feature" section of this topic.
  • A data manipulation language (DML) statement is executed.
  • A table is created.
  • A foreign table is updated.
  • The feature is triggered based on the following default settings:
    • An internal table is automatically analyzed every 10 minutes.
    • A foreign table is automatically analyzed every 4 hours.
    • A maximum of 224 (16,777,216) entries can be scanned each time a table is automatically analyzed. If the table contains more than 224 entries, the excess entries are not scanned.
    • If no data has been written to an internal table in real time from Blink-based Realtime Compute, Realtime Compute for Apache Flink, or Apache Flink for more than three days, the internal table is ignored by the auto-analyze feature.

Disable the auto-analyze feature

In some business scenarios such as a scenario in which you use fixed query statements, the auto-analyze feature may not be required. In this case, if you are the superuser of a Hologres instance, you can disable the auto-analyze feature by executing one of the following statements:
-- The feature setting takes effect on the specified database. Use the following syntax in Hologres V1.1 and later:
alter database <databasename> set hg_enable_start_auto_analyze_worker = off;

-- The feature setting takes effect on the specified database. Use the following syntax in Hologres V0.10:
alter database <databasename> set hg_experimental_enable_start_auto_analyze_worker = off;

Modify the default settings of the auto-analyze feature

After you enable the auto-analyze feature, Hologres inspects tables and triggers the feature based on the default settings. This consumes resources. The default settings may not apply to specific business scenarios. For example, if data is not frequently written to your Hologres instance, you can decrease the frequency of automatic analysis.

You can modify the default settings of the parameters based on your business requirements. This allows you to optimize the feature performance in a fine-grained manner.

  • The ALTER DATABASE <databasename> set xxx statement takes effect on the specified database. You must be the superuser to execute this statement.
  • The SET xxx statement takes effect only on the specified session, and must be executed together with other SQL statements.
  • The default time unit is seconds (s). If the default unit is used, the unit s must be omitted.
  • In some business scenarios, a smaller or larger time unit is required. For example, it is inconvenient to express days in seconds. To resolve this issue, Hologres provides more time units, including milliseconds (ms), minutes (min), hours (h), and days (day), in addition to the default time unit seconds (s). To use these time units, you must enclose parameter values in single quotation marks (' '). Examples:
    -- Syntax in Hologres V1.1 and later
    set hg_auto_check_table_changes_interval = '10min';
    
    -- Syntax in Hologres V0.10
    set hg_experimental_auto_check_table_changes_interval = '10min';

Examples

  • Execute one of the following statements to specify the interval at which Hologres automatically analyzes internal tables. By default, an internal table is automatically analyzed every 10 minutes.
    • Sample statements that use s as the time unit
      -- Syntax in Hologres V1.1 and later
      set hg_auto_check_table_changes_interval = 600;
      
      -- Syntax in Hologres V0.10
      set hg_experimental_auto_check_table_changes_interval = 600;
    • Sample statements that use min as the time unit
      -- Syntax in Hologres V1.1 and later
      set hg_auto_check_table_changes_interval = '10min';
      
      -- Syntax in Hologres V0.10
      set hg_experimental_auto_check_table_changes_interval = '10min';
  • Execute one of the following statements to specify the interval at which Hologres automatically analyzes foreign tables. By default, a foreign table is automatically analyzed every 4 hours.
    • Sample statements that use s as the time unit
      -- Syntax in Hologres V1.1 and later
      set hg_auto_check_foreign_table_changes_interval = 14400;
      
      -- Syntax in Hologres V0.10
      set hg_experimental_auto_check_foreign_table_changes_interval = 14400;
    • Sample statements that use min as the time unit
      -- Syntax in Hologres V1.1 and later
      set hg_auto_check_foreign_table_changes_interval = '4h';
      
      -- Syntax in Hologres V0.10
      set hg_experimental_auto_check_foreign_table_changes_interval = '4h';
    -- The feature setting takes effect only on the specified session. 
    set hg_experimental_auto_check_foreign_table_changes_interval = '240min';
    
    -- The feature setting takes effect on the specified database. Replace <databasename> with the name of your database. 
    alter database <databasename> set hg_experimental_auto_check_foreign_table_changes_interval = '240min';
  • Execute one of the following statements to specify the maximum latency allowed for the data that is written from Blink-based Realtime Compute, Realtime Compute for Apache Flink, or Apache Flink to an internal table. By default, if no such data has been written to the internal table in real time for more than three days, the internal table is ignored by the auto-analyze feature.
    • Sample statements that use s as the time unit
      -- Syntax in Hologres V1.1 and later
      set hg_fixed_api_modify_max_delay_interval = 259200;
      
      -- Syntax in Hologres V0.10
      set hg_experimental_fixed_api_modify_max_delay_interval = 259200;
    • Sample statements that use day as the time unit
      -- Syntax in Hologres V1.1 and later
      set hg_fixed_api_modify_max_delay_interval = '3day';
      
      -- Syntax in Hologres V0.10
      set hg_experimental_fixed_api_modify_max_delay_interval = '3day';
                                  
  • Execute one of the following statements to specify the maximum number of entries that can be scanned each time a table is automatically analyzed. The default value is 224 (16,777,216). In the default setting, if a table contains more than 224 entries, the excess entries are not scanned.
    Note For a partitioned table, the limit of 224 entries in the default setting is imposed on the parent table.
    -- Syntax in Hologres V1.1 and later
    set hg_auto_analyze_max_sample_row_count = 16777216;
    
    -- Syntax in Hologres V0.10
    set hg_experimental_auto_analyze_max_sample_row_count = 16777216;