AnalyticDB for PostgreSQL statistics include the database metadata such as the total number of rows, the max and min values, and the number of null values in a table. Statistics are crucial for the query optimizer to generate optimal execution plans. Inaccurate or missing statistics can cause the optimizer to generate inefficient execution plans. For example, assume that an indexed table has a large number of rows and the index column is referenced in the WHERE clause for a point query. Invalid statistics may cause the optimizer to traverse the table instead of using indexes. This results in much lower query performance. Appropriate statistics collection can ensure high query performance.

Statistics collection modes

AnalyticDB for PostgreSQL uses the gp_autostats_mode parameter to determine the statistics collection mode.

  • none: disables automatic statistics collection. All statistics must be manually collected by using statements.
  • on_no_stats: triggers statistics collection for CREATE TABLE AS SELECT, INSERT, or COPY operations on a table that has no existing statistics. In other cases, statistics must be manually collected by using statements.
  • on_change: triggers statistics collection for CREATE TABLE AS SELECT, INSERT, UPDATE, DELETE, or COPY operations on a table when the number of rows affected exceeds the threshold defined by the gp_autostats_on_change_threshold parameter. This mode is completely automatic and does not need manual statistics collection.

SQL statements for manual statistics collection

If the gp_autostats_mode parameter is not set to on_change, the following statements are available for manual statistics collection:

  • To collect statistics on all tables, execute the ANALYZE statement without parameters. This statement may take a long time.
  • To collect statistics on a single table, execute the ANALYZE statement with the table name parameter. Example: ANALYZE t1;.
  • To collect statistics on multiple columns of a table, execute the ANALYZE statement with the table and column name parameters. Example: ANALYZE t1(a, b);.

Scenarios for manual statistics collection

Statistics do not need to be frequently collected. You can execute ANALYZE statements to collect statistics in the following scenarios:

  • Data is imported, more than 20% of data is inserted, updated, or deleted, or indexes are created.
  • Data is frequently inserted, updated, or deleted during extract, transform, load (ETL) tasks.
  • The execution plan shows a single row in a table and contains multiple Broadcast, Sort and GroupByAgg, or NestLoop operators.