The query optimizer uses statistics to generate execution plans. Better statistics lead to better plans—and faster queries. AnalyticDB for MySQL collects statistics automatically during a configurable maintenance window and lets you trigger collection manually at any time using ANALYZE TABLE.
Supported editions
The statistics feature is supported on Enterprise Edition, Basic Edition, Data Lakehouse Edition, and Data Warehouse Edition clusters with kernel version 3.1.6.1 or later.
To view and update the minor version of an AnalyticDB for MySQL cluster, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.
Statistics types
AnalyticDB for MySQL supports three statistics types. Choosing the right type for each column balances accuracy against cache usage.
| Type | What it captures | Best for |
|---|---|---|
Basic (BASIC) | Max, min, average byte length, number of distinct values, NULL ratio per column | Columns not used in filtering or joins; columns with evenly distributed data such as primary keys |
Histogram (HISTOGRAM) | Data distribution across buckets; automatically selects Hybrid Histogram or Frequency Histogram | Columns with skewed data used in filtering or joins |
Column group (GROUP_STATS) | Correlation between multiple columns in the same table | Multi-column aggregations where columns are highly correlated |
Column group statistics and sampled collection require kernel version 3.1.9.2 or later.
Basic statistics
Basic statistics capture the maximum value, minimum value, average byte length, number of distinct values, and NULL ratio for a column. This is the lightest statistics type—collected by full scan, sampling, or incremental update (default: incremental).
Use basic statistics for:
Columns not involved in filtering or join operations
Columns with evenly distributed data, such as primary key columns
Histogram
A histogram divides a column's data range into buckets. Each bucket describes the data distribution for that range, giving the optimizer a more precise picture of value frequency than basic statistics alone.
AnalyticDB for MySQL automatically selects the histogram type:
Hybrid Histogram: A variation of an equal-height histogram that handles hot spot values more accurately.
Frequency Histogram: For columns with few distinct values—each unique value gets its own bucket.
Use histograms for columns with skewed data that appear in WHERE clauses or join conditions. For evenly distributed columns, basic statistics are sufficient.
Cache trade-off: Histograms are more accurate but use more cache than basic statistics. The default cache holds statistics for approximately 20,000 histogram columns or 2 million basic statistics columns. In schemas with many tables, collecting histograms for all columns can lower the cache hit ratio.
Column group statistics
Column group statistics require kernel version 3.1.9.2 or later.
The optimizer assumes columns are independent when estimating row counts for multi-column conditions. When columns are correlated, this assumption breaks down and the optimizer produces inaccurate estimates—leading to suboptimal join and aggregation plans.
Column group statistics capture the correlation between multiple columns in the same table. When columns are highly correlated, they allow the optimizer to produce more accurate row estimates.
Use column group statistics for multi-column aggregations where the involved columns are highly correlated.
How automatic collection works
Automatic statistics collection is enabled by default. AnalyticDB for MySQL runs two collection modes:
Maintenance window: Full or sampled collection of basic, histogram, and column group statistics, based on table data volume. A full collection across many columns can take several days.
Outside the maintenance window: Periodic incremental collection of basic statistics only.
Collection behavior by import method
| Import method | Automatic behavior | Recommended action |
|---|---|---|
INSERT OVERWRITE (batch) | Basic statistics collected immediately after import | No action needed |
INSERT INTO / REPLACE INTO (real-time) | Incremental collection triggered at next maintenance window, or during the incremental collection period after a BUILD task completes | Manually collect basic statistics after importing |
Automatic collection applies only to internal tables. Manual collection applies to both internal and external tables.
Data volume threshold and sampling
Sampled collection requires kernel version 3.1.9.2 or later.
By default, tables exceeding 5,000,000,000 rows (5 billion) trigger different behavior depending on kernel version:
Kernel version earlier than 3.1.9.2: Statistics collection skips the table.
Kernel version 3.1.9.2 or later: Statistics are collected by sampling.
Configure automatic collection
Enable or disable automatic statistics collection
Automatic statistics collection is enabled by default. To disable or re-enable it:
SET adb_config O_CBO_AUTONOMOUS_STATS_ENABLED = [false|true];Enable or disable column group statistics collection
Column group statistics collection requires kernel version 3.1.9.2 or later, and automatic statistics collection must be enabled.
Column group statistics collection is disabled by default. To enable or disable it:
SET adb_config O_CBO_MAINTENANCE_WINDOW_COLLECT_GROUP_STATS_ENABLED = [false|true];Set the maintenance window
The default maintenance window is 04:00–05:00, using the cluster's time zone. Set it to off-peak hours to minimize impact on query performance.
Constraints:
The interval between start and end times must be at least 1 minute and no more than 3 hours.
The start time must be earlier than the end time.
An invalid time falls back to the default.
SET adb_config O_CBO_MAINTENANCE_WINDOW_DURATION = [04:00-05:00];Set the data volume threshold
To change the row count threshold that triggers sampling (default: 5,000,000,000):
SET adb_config O_CBO_MAINTENANCE_WINDOW_COLLECTOR_ROW_LIMIT = 10000;Enable or disable I/O throttling
During the maintenance window, AnalyticDB for MySQL applies I/O throttling to limit resource usage when scanning tables. Throttling is enabled by default. If resources are idle during the maintenance window, disable throttling to speed up collection:
SET adb_config O_CBO_AUTONOMOUS_STATS_SCAN_RATE_LIMIT_ENABLED = [false|true];Run collection in a specific resource group
By default, automatic collection runs under a system account. To run it in a specific resource group, specify a database account:
SET adb_config O_CBO_AUTONOMOUS_STATS_ACCOUNT = [user_name];The specified account must have query permissions on all columns of all tables and be attached to a resource group. For details, see Create and manage resource groups.
Set the expiration ratio
Statistics are considered expired when the fraction of rows modified by UPDATE, DELETE, INSERT, or REPLACE operations exceeds the expiration ratio. Expired statistics are recollected in full during the next maintenance window.
The default expiration ratio is 0.1 (10%). To adjust it:
SET adb_config O_CBO_STATS_EXPIRED_RATIO = 0.1;Valid range: 0–1. Avoid setting this above 0.5.
Collect statistics manually
Use ANALYZE TABLE to collect statistics on demand. Manual collection applies to both internal and external tables.
For large tables, use sampling (WITH ENABLE SAMPLING) instead of a full scan. Run full-table collection during off-peak hours when sampling is not available.
Collect statistics on a table
ANALYZE TABLE [schema_name.]table_name [UPDATE [BASIC|HISTOGRAM|GROUP_STATS]] [ON column_name[,...]] [WITH ENABLE SAMPLING]| Parameter | Required | Description |
|---|---|---|
schema_name | No | The database name. |
table_name | Yes | The table to analyze. Each statement can specify only one table. Supports internal and external tables. |
UPDATE [BASIC|HISTOGRAM|GROUP_STATS] | No | The statistics type to collect. Defaults to BASIC. GROUP_STATS requires kernel version 3.1.9.2 or later. |
ON column_name[,...] | No | The columns to analyze. Omit to analyze all columns. |
WITH ENABLE SAMPLING | No | Collect by sampling instead of full scan. Requires kernel version 3.1.9.2 or later. |
Examples
Collect basic statistics on all columns of adb_demo.customer:
ANALYZE TABLE adb_demo.customer;
-- Equivalent:
ANALYZE TABLE adb_demo.customer UPDATE BASIC;Collect basic statistics on the customer_id column only:
ANALYZE TABLE adb_demo.customer UPDATE BASIC ON customer_id;Collect histogram statistics on customer_id and login_time:
ANALYZE TABLE adb_demo.customer UPDATE HISTOGRAM ON customer_id, login_time;Collect column group statistics by sampling on customer_id and login_time:
ANALYZE TABLE adb_demo.customer UPDATE GROUP_STATS ON customer_id, login_time WITH ENABLE SAMPLING;Collect statistics on partitions
Partition-level statistics collection is supported on Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters with kernel version 3.1.9.1 or later. This performs a full collection of basic statistics on partitions of OSS external tables.
ANALYZE TABLE table_name WITH PARTITIONS = ARRAY[ARRAY[PARTITION_KEYS] [, PARTITION_KEYS, ...]]| Parameter | Required | Description |
|---|---|---|
table_name | Yes | The table to analyze. Each statement can specify only one table. |
PARTITION_KEYS | Yes | The partitions to analyze. |
Examples
Collect statistics on the 2023-01 and 2023-02 partitions of test1:
ANALYZE TABLE test1 WITH PARTITIONS = ARRAY[ARRAY['2023-01'], ARRAY['2023-02']];Collect statistics on the (1,1) and (1,0) partitions of test2:
ANALYZE TABLE test2 WITH PARTITIONS = ARRAY[ARRAY[1, 1], ARRAY[1, 0]];View statistics
Statistics are stored in binary format. Query the INFORMATION_SCHEMA system tables to inspect them.
View table-level statistics:
SELECT * FROM INFORMATION_SCHEMA.TABLE_STATISTICS;View column-level statistics (basic, histogram, and column group):
SELECT * FROM INFORMATION_SCHEMA.COLUMN_STATISTICS;FAQ
Why does ANALYZE show up as a slow query?
ANALYZE statements run during the maintenance window with I/O throttling and low CPU priority. This means they can run for extended periods and get flagged as slow queries—but this does not affect regular query performance. Resources are preferentially allocated to user queries when they arrive.
If CPU load is not elevated outside the maintenance window, ignore the slow query flag. If CPU overload is affecting query response time, see the troubleshooting steps below.
What causes high CPU load during statistics collection?
Two common causes:
During the maintenance window (04:00–05:00 by default): The system performs a full table scan to collect column statistics. This temporarily raises CPU utilization.
After a minor version upgrade to 3.1.6 or later for Data Warehouse Edition clusters: If the cluster is upgraded from an earlier version, AnalyticDB for MySQL performs a one-time full collection. CPU load may be elevated for the first few days, then returns to normal after the initial scan completes.
In both cases, check whether average query response time is actually affected. Because ANALYZE runs at low CPU priority, peak CPU utilization during collection does not mean queries are being starved.
What do I do if statistics collection is degrading query performance?
Try these steps in order:
Reschedule the maintenance window to your actual off-peak hours:
SET adb_config O_CBO_MAINTENANCE_WINDOW_DURATION = [04:00-05:00];Adjust the I/O limit for system queries if you cannot identify a clear off-peak window. We recommend changing the I/O limit to a value that is greater than or equal to 16 MB. The default is 50 MB:
SET adb_config CSTORE_IO_LIMIT_SYSTEM_QUERY_BPS = 52428800;Isolate collection to a low-priority resource group to prevent it from competing with user queries:
SET adb_config O_CBO_AUTONOMOUS_STATS_ACCOUNT = [user_name];Raise the expiration ratio to reduce how much data gets recollected each cycle. The default is 0.1; avoid setting it above 0.5:
SET adb_config O_CBO_STATS_EXPIRED_RATIO = 0.1;
If none of the above resolves the issue, disable automatic collection and switch to manual collection:
SET adb_config O_CBO_AUTONOMOUS_STATS_ENABLED = false;After disabling, collect statistics manually using ANALYZE TABLE. Without automatic collection, statistics can become stale, which may degrade query plan quality over time.
Why haven't statistics been updated for several days?
Two likely explanations:
Statistics have not expired yet. Statistics expire only when the fraction of modified rows (from
UPDATE,DELETE,INSERT, orREPLACE) crosses the expiration ratio (default: 10%). If data changes are small relative to the total table size, statistics are considered current and are not recollected.Large schemas take longer than one day. The system allocates roughly 1 hour per day for non-incremental statistics collection. If a cluster has more than 1,000 columns, a full pass can take up to a week. This is expected behavior—queries continue to run, and statistics are updated as the cycle completes.
Are statistics collected automatically when data is imported into a new table?
It depends on how you import:
`INSERT OVERWRITE` (batch import): Basic statistics are collected automatically right after the import completes.
`INSERT INTO` or `REPLACE INTO` (real-time import): Statistics are collected at the next maintenance window, or during the incremental collection period after a BUILD task completes. Run
ANALYZE TABLEimmediately after importing to collect basic statistics right away.