All Products
Search
Document Center

AnalyticDB for PostgreSQL:Statistics collection

Last Updated:Apr 19, 2024

Statistics are important information about data distribution and query performance. You can collect statistics to optimize and execute query plans for your database systems to improve query performance and overall database performance. AnalyticDB for PostgreSQL collects statistics by using ANALYZE statements.

AnalyticDB for PostgreSQL statistics include the database metadata such as the total number of rows, the max and min values, and the number of null values in a table. Statistics are crucial for the query optimizer to generate optimal execution plans. Inaccurate or missing statistics may cause the query optimizer to generate inefficient execution plans. For example, if an indexed table has a large number of rows and the index column is referenced in the WHERE clause for a point query, invalid statistics may cause the query optimizer to traverse the table instead of using indexes. This results in much lower query performance. Appropriate statistics collection can ensure high query performance.

Statistics collection modes

AnalyticDB for PostgreSQL uses the gp_autostats_mode parameter to determine the statistics collection mode.

  • none: disables automatic statistics collection. All statistics must be manually collected by using statements.

  • on_no_stats: triggers statistics collection for CREATE TABLE AS SELECT, INSERT, or COPY operations on a table that has no existing statistics. In other cases, statistics must be manually collected by using statements.

  • on_change: triggers statistics collection for CREATE TABLE AS SELECT, INSERT, UPDATE, DELETE, or COPY operations on a table when the number of rows affected exceeds the threshold specified by the gp_autostats_on_change_threshold parameter. This mode is completely automatic and does not need manual statistics collection.

SQL statements for manual statistics collection

If the gp_autostats_mode parameter is not set to on_change, the following statements are available for manual statistics collection:

  • To collect statistics on all tables, execute an ANALYZE statement without parameters. This statement may take a long time.

  • To collect statistics on a single table, execute an ANALYZE statement with the table name parameter. Example: ANALYZE t1;.

  • To collect statistics on multiple columns of a table, execute an ANALYZE statement with the table and column name parameters. Example: ANALYZE t1(a, b);.

Scenarios for manual statistics collection

Statistics do not need to be frequently collected. You can execute ANALYZE statements to collect statistics in the following scenarios:

  • Data is imported, more than 20% of data is inserted, updated, or deleted, or indexes are created.

  • Data is frequently inserted, updated, or deleted during extract, transform, load (ETL) tasks.

  • An execution plan shows a single row in a table and contains multiple Broadcast, Sort and GroupByAgg, or NestLoop operators.

References

Use the ANALYZE statement to collect statistics on AnalyticDB for PostgreSQL