This topic describes the functionality and classification of statistics in AnalyticDB for MySQL, the configurations of automatic statistics collection, and the methods for manually collecting statistics and querying statistics.
Overview
The query optimizer converts queries into execution plans that are executed by the execution engine. The quality of an execution plan impacts its query performance. Statistics about data columns can be used to help the query optimizer generate high-quality execution plans.
AnalyticDB for MySQL provides the automatic statistics collection feature. By default, this feature is enabled. AnalyticDB for MySQL clusters of V3.1.9.2 or later also provide the column group statistics feature. By default, this feature is disabled. You can manually enable this feature. Within the maintenance window, AnalyticDB for MySQL automatically performs a full or sampled collection of basic, histogram, and column group statistics based on the data volume of the current table. If a large number of columns are involved, it may take a couple of days to perform a full collection on the columns. Outside the maintenance window, AnalyticDB for MySQL incrementally collects basic statistics on a regular basis.
The statistics collection policies of AnalyticDB for MySQL vary based on the data import methods.
If data is batch imported by using
INSERT OVERWRITE
, AnalyticDB for MySQL immediately collects basic statistics after the data import is complete.If data is imported in real time by using
INSERT INTO
orREPLACE INTO
, AnalyticDB for MySQL does not initiate an incremental collection task until the next maintenance window or the incremental collection period after each BUILD task. We recommend that you manually collect basic statistics after data is imported.
You can also disable the automatic statistics collection feature and execute the ANALYZE TABLE
statement to collect statistics. For more information, see the "Manually collect statistics" section of this topic.
The automatic statistics collection feature of AnalyticDB for MySQL manages internal tables but not external tables. The manual statistics collection feature of AnalyticDB for MySQL manages both internal tables and external tables.
Usage notes
Statistics collection is supported for both Data Lakehouse Edition (V3.0) and Data Warehouse Edition (V3.0) clusters of V3.1.6.1 or later.
For information about how to query the minor version of a cluster, see How do I query the version of an AnalyticDB for MySQL cluster?
To update the minor version of a cluster, submit a ticket.
Classification and selection of statistics
AnalyticDB for MySQL collects three types of statistics: basic statistics, histogram statistics, and column group statistics. For histogram and column group statistics, you can use the full or sampled collection method. For basic statistics, you can use the full, sampled, or automatic incremental collection method. By default, automatic incremental collection is enabled.
Only AnalyticDB for MySQL clusters of V3.1.9.2 or later support sampled collection for basic, histogram, and column group statistics.
Basic statistics
Basic statistics include the maximum value, the minimum value, the average length (in bytes), the number of distinct values, and the proportion of NULL values in a column.
Collection of basic statistics is suitable for the following columns:
Columns that are not involved in filtering or join operations.
Columns in which data is evenly distributed, such as primary key columns.
Histogram statistics
Histograms are created by dividing basic statistics into data buckets by data range. Each bucket in a histogram describes the characteristics of the data within a specific range.
Histograms are divided into the following types:
Hybrid histogram: similar to an equal-height histogram. It can better describe hot spot values.
Frequency histogram: suitable for columns that have a small number of distinct values. Each value corresponds to a bucket.
AnalyticDB for MySQL automatically selects appropriate types of histograms.
Histogram statistics are suitable for the columns that contain unevenly distributed data and take part in filtering and join operations. If data is evenly distributed, basic statistics can be used to replace histogram statistics in filtering and join operation scenarios.
Compared with basic statistics, histograms can more accurately reflect the statistics about tables. If a large number of tables are involved, histogram statistics of all columns may decrease the cache hit ratio. Histogram statistics occupy more statistics cache and incur higher costs than basic statistics. By default, the statistics cache can contain about 20,000 columns of histogram statistics or 2 million columns of basic statistics.
Column group statistics
Only AnalyticDB for MySQL clusters of V3.1.9.2 or later support column group statistics collection.
Basic and histogram statistics are about a single column. Column group statistics are about multiple columns of a table, describing how these columns correlate with each other.
Column group statistics are suitable for aggregating multiple columns. If the columns highly correlate with each other, you can use column group statistics to estimate the number of output rows. This helps you choose an appropriate execution plan.
Automatic statistics collection
Enable or disable the automatic statistics collection feature
AnalyticDB for MySQL provides the automatic statistics collection feature. By default, this feature is enabled. You can execute the following statement to disable or re-enable the automatic statistics collection feature:
SET adb_config O_CBO_AUTONOMOUS_STATS_ENABLED = [false | true];
Enable or disable the automatic column group statistics collection feature
AnalyticDB for MySQL clusters of V3.1.9.2 or later provide the automatic column group statistics collection feature. By default, this feature is disabled. You can execute the following statement to enable or disable the automatic column group statistics collection feature:
SET adb_config O_CBO_MAINTENANCE_WINDOW_COLLECT_GROUP_STATS_ENABLED = [false | true];
Before you can use the automatic column group statistics collection feature, you must enable the automatic statistics collection feature.
Set a maintenance window
The default maintenance window for automatic statistics collection is from 04:00 to 05:00. You can execute the following statement to modify the maintenance window. We recommend that you set the maintenance window to off-peak hours. The interval between the start time and end time can range from 1 minute to 3 hours. The start time must be earlier than the end time. If the specified maintenance window does not meet the preceding requirements, the default maintenance window is used.
The maintenance time must use the same time zone as the current time of the cluster.
SET adb_config O_CBO_MAINTENANCE_WINDOW_DURATION = [04:00-05:00];
Set a data volume threshold for statistics collection
You can execute the following statement to set a data volume threshold for statistics collection. By default, the threshold is 5 billion rows.
SET adb_config O_CBO_MAINTENANCE_WINDOW_COLLECTOR_ROW_LIMIT = 10000;
If the number of rows in a table exceeds the data volume threshold, the following policies apply:
For AnalyticDB for MySQL clusters of versions earlier than 3.1.9.2, statistics collection skips the table.
For AnalyticDB for MySQL clusters of V3.1.9.2 or later, a sampled collection is executed on the table.
Enable or disable scanning throttling for statistics collection
Within the maintenance window, AnalyticDB for MySQL limits the scanning rate for statistics collection to reduce I/O resource usage. By default, scanning throttling is enabled. If resources are idle within the maintenance window, you can disable scanning throttling to accelerate statistics collection.
SET adb_config O_CBO_AUTONOMOUS_STATS_SCAN_RATE_LIMIT_ENABLED = [false | true];
Collect statistics from a specified resource group
By default, automatic statistics collection uses a system account. If you want to perform automatic statistics collection in a specified resource group, you can execute the following statement to specify a database account. After you specify a database account, AnalyticDB for MySQL performs automatic statistics collection in the resource group with which the database account is associated. Make sure that the database account has permissions to query all columns of all tables and has been associated with a resource group. For more information, see Create a resource group.
SET adb_config O_CBO_AUTONOMOUS_STATS_ACCOUNT = [user_name];
Set an expiration ratio for columns
The default expiration ratio of columns is 0.1 (10%). If the proportion of the updated, deleted, inserted, or replaced rows to the total rows of a table is greater than the expiration ratio, the statistics of the table are considered expired. Then, AnalyticDB for MySQL re-collects statistics on all columns of the expired table within the maintenance window. If the expiration ratio of columns does not exceed the specified value, statistics are not automatically collected within the maintenance window.
SET adb_config O_CBO_STATS_EXPIRED_RATIO = 0.1;
Manually collect statistics
Collect statistics on the entire table
The ANALYZE TABLE
statement scans the entire table to collect statistics. For tables that have large volumes of data, a large amount of time is required to collect statistics on the entire table. We recommend that you execute this statement during off-peak hours or perform a sampled collection of statistics.
Syntax
ANALYZE TABLE [schema_name.]table_name [UPDATE [BASIC|HISTOGRAM|GROUP_STATS]] [ON column_name[,...]] [WITH ENABLE SAMPLING]
Parameters
Parameter | Required | Description |
| No | The name of the database. |
| Yes | The name of the table on which you want AnalyticDB for MySQL to collect statistics. You can specify only a single table in each |
| No | The type of statistics. Valid values:
Important Only AnalyticDB for MySQL clusters of V3.1.9.2 or later support column group statistics collection. |
| No | The column on which to collect statistics. If no column is specified, statistics are collected on all columns of the table. |
| No | Enables sampled collection. Important Only AnalyticDB for MySQL clusters of V3.1.9.2 or later support sampled collection for basic, histogram, and column group statistics. |
Examples
Collect basic statistics on all columns of the
adb_demo.customer
table. You can execute one of the following statements:ANALYZE TABLE adb_demo.customer;
ANALYZE TABLE adb_demo.customer UPDATE BASIC;
Collect basic statistics on the
customer_id
column of theadb_demo.customer
table.ANALYZE TABLE adb_demo.customer UPDATE BASIC ON customer_id;
Collect histogram statistics on the
customer_id
andlogin_time
columns of theadb_demo.customer
table.ANALYZE TABLE adb_demo.customer UPDATE HISTOGRAM ON customer_id,login_time;
Perform a sampled collection of column group statistics on the
customer_id
andlogin_time
columns of theadb_demo.customer
table.ANALYZE TABLE adb_demo.customer UPDATE GROUP_STATS ON customer_id,login_time with enable sampling;
Collect statistics on partitions
Limits
Only AnalyticDB for MySQL Data Lakehouse Edition (V3.0) clusters of V3.1.9.1 or later allow you to execute the ANALYZE TABLE
statement to collect basic statistics on partitions of Object Storage Service (OSS) external tables.
For information about how to query the minor version of a cluster, see How do I query the version of an AnalyticDB for MySQL cluster? To update the minor version of a cluster, contact technical support.
Syntax
ANALYZE TABLE table_name WITH PARTITIONS = ARRAY[ARRAY[PARTITION_KEYS] [, PARTITION_KEYS, ....]]
Parameters
Parameter | Required | Description |
| Yes | The name of the table on which you want AnalyticDB for MySQL to collect statistics. You can specify only a single table in each |
| Yes | The partition on which to collect statistics. |
Examples
Collect statistics on the 2023-01 and 2023-02 partitions of the
test1
table.ANALYZE TABLE test1 WITH PARTITIONS = ARRAY[ARRAY['2023-01'], ARRAY['2023-02']];
Collect statistics on the (1,1) and (1,0) partitions of the
test2
table.ANALYZE TABLE test2 WITH PARTITIONS = ARRAY[ARRAY[1, 1], ARRAY[1, 0]];
Query statistics
Statistics are stored in AnalyticDB for MySQL in the binary format. You can use the system table INFORMATION_SCHEMA
to query statistics.
Execute the following statement to query table-level statistics:
SELECT * FROM INFORMATION_SCHEMA.TABLE_STATISTICS;
Execute the following statement to query column-level statistics, including basic statistics, histogram statistics, and column group statistics:
SELECT * FROM INFORMATION_SCHEMA.COLUMN_STATISTICS;
FAQ
You may encounter issues when you use the statistics feature. For information about how to solve these issues, see Statistics FAQ.