Statistics include information about database and table metadata. Query optimizers use the most recent statistics to optimize and run execution plans to improve database performance. This topic describes how to use the ANALYZE statement to manually collect statistics in AnalyticDB for PostgreSQL.
Scenarios
You do not need to frequently perform manual statistics collection. You can execute the ANALYZE statement to manually collect statistics in the following scenarios:
More than 20% of data in a table is changed by
INSERT,UPDATE, orDELETEstatements or an index is created for the table.Multiple
INSERT,UPDATE, orDELETEstatements are executed during an extract, transform, load (ETL) job.Data is written to a new table by using operations including functions and stored procedures.
An execution plan shows that only one row is estimated to be returned. The execution plan contains multiple operators, such as Broadcast, Sort+GroupByAgg, and NestLoop.
Usage
You can execute the ANALYZE statement to manually collect statistics in AnalyticDB for PostgreSQL.
Syntax
ANALYZE [VERBOSE] [ROOTPARTITION ALL|[table [(column[, ... ])]] ]The ANALYZE statement can be executed without parameters. In this case, the statement collects statistics on all tables in a database. This operation may require an extended period of time.
Parameters
VERBOSE: displays statistics about the execution process of theANALYZEstatement, such as the number of scanned rows and the number of visible rows. If you do not specify the VERBOSE parameter, no statistics about the execution process are displayed.ROOTPARTITION ALL|[table [(column[, ... ])]]: collects statistics on the parent tables of partitioned tables.ROOTPARTITION ALL: collects statistics on the parent tables of all partitioned tables in the database.ROOTPARTITION [table]: collects statistics on the parent table of a specific partitioned table in the database.If you do not specify the
ROOTPARTITIONparameter, theANALYZEstatement collects statistics on all parent and subpartitioned tables.
table[(column1, column2, ...)]: collects statistics on multiple columns of a table. If you do not specify a column name, the ANALYZE statement collects statistics on all columns of the table. To quickly execute theANALYZEstatement, you can collect statistics only on columns that involveJOIN,WHERE,ORDER BY,GROUP BY, orHAVINGstatements. In this case, you must specify the table and column names in theANALYZEstatement. Example:ANALYZE t1(a, b).
Example
Collect statistics on the o_orderdate and o_orderpriority columns of the orders table.
ANALYZE orders(o_orderdate, o_orderpriority);