This topic describes the policies that can be used to collect statistics and provides suggestions on how to configure statistics collection.

The quality of SQL query execution plans has an effect on database performance. To provide accurate statistics for the cost-based optimizer (CBO) to generate execution plans, PolarDB-X collects statistics in the background.

Statistics collection policies

PolarDB-X supports the statistics collection policies that are described in the following table.

Name Collection frequency Type of SQL statement Parameter
Auto analyze Once a week. The default collection time is 02:00-05:00.
  1. SQL statements that include the WHERE (RAND() < 3.9623395E-4) clause.
  2. SQL statements in which the IP address is specified as null in the comment.
  3. SQL statements that include the HYPERLOGLOG function.
ENABLE_BACKGROUND_STATISTIC_COLLECTION
Queries on INFORMATION_SCHEMA Once a week. The default collection time is 02:00-05:00. SELECT table_schema, table_name, table_rows FROM information_schema.tables WHERE table_schema IN (xxx,xxx,xxx....) AND table_name IN (yyy,yyy,yyy...) N/A
Real-time feedback statistics Statistics collection is triggered when the number of data rows that are inserted or deleted exceeds 20% of the number of data rows that are previously stored in the table. The requirements are the same as the requirements of the auto analyze policy. ENABLE_STATISTIC_FEEDBACK

Considerations

  • If the data volume of a table is large, the IOPS on data nodes increase during the specified hours for statistics collection.
  • If the IOPS increase during non-specified statistics collection hours, the following issues may occur:
    1. The data volume of the table is excessively large. As a result, the statistics collection task takes an excessively long period of time to collect data from a single logical table. Therefore, the statistics collection task runs from the specified hours to the non-specified hours.
    2. Real-time statistics collection is being performed. If excessively large amounts of data is updated and the system takes an excessively long period of time to import the updated data, real-time statistics collection is triggered.

Suggestions

  1. If you want to not collect statistics in the early morning, you can set the value of the ENABLE_BACKGROUND_STATISTIC_COLLECTION parameter and the value of the ENABLE_STATISTIC_FEEDBACK parameter to False on the Parameter Settings page in the PolarDB-X console. This way, the scheduled statistic collection feature and real-time statistics collection feature are disabled. If the two features are disabled, we recommend that you manually execute analyze table [table_name]; statements on a regular basis to collect statistics during off-peak hours.
  2. If you want to immediately stop a collection task, perform the following steps:
    1. Execute the show full physical_processlist where info like '%rand%'; statement to view the SQL statements that are being executed on the data nodes.
    2. Execute a kill statement to terminate the SQL query that is performed to collect statistics. For example, you can execute kill '0-0-222'; to terminate the corresponding SQL query. In '0-0-222', the first 0 indicates the ID of the data node, the second 0 indicates the ID of the database shard, and 222 indicates the ID of the SQL statement for which you want to terminate the execution.