This topic describes how to use the ANALYZE statement and the auto-analyze feature.
Analyze
Statistics determine whether a suitable execution plan can be generated. Hologres collects statistics on sample data to analyze the data to be queried with respect to the following aspects: data distribution and characteristics, table statistics, column statistics, number of rows, number of columns, column width, cardinality, frequency, maximum value, minimum value, long key value, and bucket distribution and characteristics. These statistics can help update the estimated cost for operator execution, narrow down the query range, and estimate the optimal join order, the memory overhead, and the degree of parallelism (DOP). This way, the optimizer can generate an optimal execution plan.
- Syntax
-- Update the statistics on a table. By default, statistics on all columns in the table are collected. analyze <tablename>; -- Update the statistics on a column. This statement collects statistics on more sample data and is more precise than that used to update the statistics on a table. analyze <tablename>(<colname>, <colname>);
- Parameters
The tablename parameter specifies the name of the table for which to update statistics, and the colname parameter specifies the name of the column for which to update statistics.
analyze <tablename>;
statement on a table:
- Data is imported into the table.
- A large number of INSERT, UPDATE, and DELETE operations are performed on the table.
- Statistics need to be collected for both internal and foreign tables.
- Statistics need to be collected for a parent partitioned table.
- If you encounter one of the following problems, you must execute the ANALYZE <tablename>
statement before you run an import task. This can improve efficiency.
- An out-of-memory (OOM) exception occurs when multiple tables are joined. The
Query executor exceeded total memory limitation xxxxx: yyyy bytes used
error is reported. - Import efficiency is low. When you query or import data in Hologres, tasks time out without completing. This results in low efficiency.
- An out-of-memory (OOM) exception occurs when multiple tables are joined. The
SELECT schema_name, table_name, sample_rows, analyze_timestamp
FROM hologres_statistic.hg_table_statistic WHERE table_name='<tablename>';
Auto-analyze
Hologres V0.10 and later provide the auto-analyze feature so that you do not need to manually execute the ANALYZE statement to analyze tables. You can enable the auto-analyze feature by executing a statement. After the auto-analyze feature is enabled, Hologres determines whether to automatically analyze tables based on the write, modify, and query operations that you performed on tables. This simplifies table management.
Limits
- The auto-analyze feature is supported only in Hologres V0.10 and later. You can view the version of your Hologres instance on the instance details page in the Hologres console. If the version of your instance is earlier than V0.10, submit a ticket to upgrade your instance.
- The auto-analyze feature is enabled by default in Hologres V1.1 and later. In Hologres V0.10, the auto-analyze feature is disabled by default, and you must execute a statement to enable the feature.
- Only the superuser can enable or disable the auto-analyze feature.
- The auto-analyze feature takes effect on all tables in a database. You cannot enable or disable the auto-analyze feature for specific tables.
Enable the auto-analyze feature
-- Syntax in Hologres V1.1
show hg_enable_start_auto_analyze_worker;
-- Syntax in Hologres V0.10
show hg_experimental_enable_start_auto_analyze_worker;
- In Hologres V1.1, the auto-analyze feature is enabled by default. If the auto-analyze
feature is disabled, you can execute the following statement to enable the feature:
-- The feature setting takes effect on the specified database. Use the following syntax in Hologres V1.1 and later: alter database <databasename> set hg_enable_start_auto_analyze_worker = on;
- In Hologres V0.10, the auto-analyze feature is disabled by default. You can execute
the following statement to enable the feature:
-- The feature setting takes effect on the specified database. Use the following syntax in Hologres V0.10: alter database <databasename> set hg_experimental_enable_start_auto_analyze_worker = on;
- A data manipulation language (DML) statement is executed.
- A table is created.
- A foreign table is updated.
- The feature is triggered based on the following default settings:
- An internal table is automatically analyzed every 10 minutes.
- A foreign table is automatically analyzed every 4 hours.
- A maximum of 224 (16,777,216) entries can be scanned each time a table is automatically analyzed. If the table contains more than 224 entries, the excess entries are not scanned.
- If no data has been written to an internal table in real time from Blink-based Realtime Compute, Realtime Compute for Apache Flink, or Apache Flink for more than three days, the internal table is ignored by the auto-analyze feature.
Disable the auto-analyze feature
-- The feature setting takes effect on the specified database. Use the following syntax in Hologres V1.1 and later:
alter database <databasename> set hg_enable_start_auto_analyze_worker = off;
-- The feature setting takes effect on the specified database. Use the following syntax in Hologres V0.10:
alter database <databasename> set hg_experimental_enable_start_auto_analyze_worker = off;
Modify the default settings of the auto-analyze feature
After you enable the auto-analyze feature, Hologres inspects tables and triggers the feature based on the default settings. This consumes resources. The default settings may not apply to specific business scenarios. For example, if data is not frequently written to your Hologres instance, you can decrease the frequency of automatic analysis.
You can modify the default settings of the parameters based on your business requirements. This allows you to optimize the feature performance in a fine-grained manner.
- The
ALTER DATABASE <databasename> set xxx
statement takes effect on the specified database. You must be the superuser to execute this statement. - The
SET xxx
statement takes effect only on the specified session, and must be executed together with other SQL statements. - The default time unit is seconds (s). If the default unit is used, the unit s must be omitted.
- In some business scenarios, a smaller or larger time unit is required. For example,
it is inconvenient to express days in seconds. To resolve this issue, Hologres provides
more time units, including milliseconds (ms), minutes (min), hours (h), and days (day),
in addition to the default time unit seconds (s). To use these time units, you must
enclose parameter values in single quotation marks (' '). Examples:
-- Syntax in Hologres V1.1 and later set hg_auto_check_table_changes_interval = '10min'; -- Syntax in Hologres V0.10 set hg_experimental_auto_check_table_changes_interval = '10min';
Examples
- Execute one of the following statements to specify the interval at which Hologres
automatically analyzes internal tables. By default, an internal table is automatically
analyzed every 10 minutes.
- Sample statements that use s as the time unit
-- Syntax in Hologres V1.1 and later set hg_auto_check_table_changes_interval = 600; -- Syntax in Hologres V0.10 set hg_experimental_auto_check_table_changes_interval = 600;
- Sample statements that use min as the time unit
-- Syntax in Hologres V1.1 and later set hg_auto_check_table_changes_interval = '10min'; -- Syntax in Hologres V0.10 set hg_experimental_auto_check_table_changes_interval = '10min';
- Sample statements that use s as the time unit
- Execute one of the following statements to specify the interval at which Hologres
automatically analyzes foreign tables. By default, a foreign table is automatically
analyzed every 4 hours.
- Sample statements that use s as the time unit
-- Syntax in Hologres V1.1 and later set hg_auto_check_foreign_table_changes_interval = 14400; -- Syntax in Hologres V0.10 set hg_experimental_auto_check_foreign_table_changes_interval = 14400;
- Sample statements that use min as the time unit
-- Syntax in Hologres V1.1 and later set hg_auto_check_foreign_table_changes_interval = '4h'; -- Syntax in Hologres V0.10 set hg_experimental_auto_check_foreign_table_changes_interval = '4h';
-- The feature setting takes effect only on the specified session. set hg_experimental_auto_check_foreign_table_changes_interval = '240min'; -- The feature setting takes effect on the specified database. Replace <databasename> with the name of your database. alter database <databasename> set hg_experimental_auto_check_foreign_table_changes_interval = '240min';
- Sample statements that use s as the time unit
- Execute one of the following statements to specify the maximum latency allowed for
the data that is written from Blink-based Realtime Compute, Realtime Compute for Apache
Flink, or Apache Flink to an internal table. By default, if no such data has been
written to the internal table in real time for more than three days, the internal
table is ignored by the auto-analyze feature.
- Sample statements that use s as the time unit
-- Syntax in Hologres V1.1 and later set hg_fixed_api_modify_max_delay_interval = 259200; -- Syntax in Hologres V0.10 set hg_experimental_fixed_api_modify_max_delay_interval = 259200;
- Sample statements that use day as the time unit
-- Syntax in Hologres V1.1 and later set hg_fixed_api_modify_max_delay_interval = '3day'; -- Syntax in Hologres V0.10 set hg_experimental_fixed_api_modify_max_delay_interval = '3day';
- Sample statements that use s as the time unit
- Execute one of the following statements to specify the maximum number of entries that
can be scanned each time a table is automatically analyzed. The default value is 224 (16,777,216). In the default setting, if a table contains more than 224 entries, the excess entries are not scanned.
Note For a partitioned table, the limit of 224 entries in the default setting is imposed on the parent table.
-- Syntax in Hologres V1.1 and later set hg_auto_analyze_max_sample_row_count = 16777216; -- Syntax in Hologres V0.10 set hg_experimental_auto_analyze_max_sample_row_count = 16777216;