All Products
Search
Document Center

AnalyticDB:Statistics

Last Updated:Mar 28, 2026

The query optimizer uses statistics to generate execution plans. Better statistics lead to better plans—and faster queries. AnalyticDB for MySQL collects statistics automatically during a configurable maintenance window and lets you trigger collection manually at any time using ANALYZE TABLE.

Supported editions

The statistics feature is supported on Enterprise Edition, Basic Edition, Data Lakehouse Edition, and Data Warehouse Edition clusters with kernel version 3.1.6.1 or later.

To view and update the minor version of an AnalyticDB for MySQL cluster, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.

Statistics types

AnalyticDB for MySQL supports three statistics types. Choosing the right type for each column balances accuracy against cache usage.

TypeWhat it capturesBest for
Basic (BASIC)Max, min, average byte length, number of distinct values, NULL ratio per columnColumns not used in filtering or joins; columns with evenly distributed data such as primary keys
Histogram (HISTOGRAM)Data distribution across buckets; automatically selects Hybrid Histogram or Frequency HistogramColumns with skewed data used in filtering or joins
Column group (GROUP_STATS)Correlation between multiple columns in the same tableMulti-column aggregations where columns are highly correlated
Important

Column group statistics and sampled collection require kernel version 3.1.9.2 or later.

Basic statistics

Basic statistics capture the maximum value, minimum value, average byte length, number of distinct values, and NULL ratio for a column. This is the lightest statistics type—collected by full scan, sampling, or incremental update (default: incremental).

Use basic statistics for:

  • Columns not involved in filtering or join operations

  • Columns with evenly distributed data, such as primary key columns

Histogram

A histogram divides a column's data range into buckets. Each bucket describes the data distribution for that range, giving the optimizer a more precise picture of value frequency than basic statistics alone.

AnalyticDB for MySQL automatically selects the histogram type:

  • Hybrid Histogram: A variation of an equal-height histogram that handles hot spot values more accurately.

  • Frequency Histogram: For columns with few distinct values—each unique value gets its own bucket.

Use histograms for columns with skewed data that appear in WHERE clauses or join conditions. For evenly distributed columns, basic statistics are sufficient.

Cache trade-off: Histograms are more accurate but use more cache than basic statistics. The default cache holds statistics for approximately 20,000 histogram columns or 2 million basic statistics columns. In schemas with many tables, collecting histograms for all columns can lower the cache hit ratio.

Column group statistics

Important

Column group statistics require kernel version 3.1.9.2 or later.

The optimizer assumes columns are independent when estimating row counts for multi-column conditions. When columns are correlated, this assumption breaks down and the optimizer produces inaccurate estimates—leading to suboptimal join and aggregation plans.

Column group statistics capture the correlation between multiple columns in the same table. When columns are highly correlated, they allow the optimizer to produce more accurate row estimates.

Use column group statistics for multi-column aggregations where the involved columns are highly correlated.

How automatic collection works

Automatic statistics collection is enabled by default. AnalyticDB for MySQL runs two collection modes:

  • Maintenance window: Full or sampled collection of basic, histogram, and column group statistics, based on table data volume. A full collection across many columns can take several days.

  • Outside the maintenance window: Periodic incremental collection of basic statistics only.

Collection behavior by import method

Import methodAutomatic behaviorRecommended action
INSERT OVERWRITE (batch)Basic statistics collected immediately after importNo action needed
INSERT INTO / REPLACE INTO (real-time)Incremental collection triggered at next maintenance window, or during the incremental collection period after a BUILD task completesManually collect basic statistics after importing
Automatic collection applies only to internal tables. Manual collection applies to both internal and external tables.

Data volume threshold and sampling

Important

Sampled collection requires kernel version 3.1.9.2 or later.

By default, tables exceeding 5,000,000,000 rows (5 billion) trigger different behavior depending on kernel version:

  • Kernel version earlier than 3.1.9.2: Statistics collection skips the table.

  • Kernel version 3.1.9.2 or later: Statistics are collected by sampling.

Configure automatic collection

Enable or disable automatic statistics collection

Automatic statistics collection is enabled by default. To disable or re-enable it:

SET adb_config O_CBO_AUTONOMOUS_STATS_ENABLED = [false|true];

Enable or disable column group statistics collection

Important

Column group statistics collection requires kernel version 3.1.9.2 or later, and automatic statistics collection must be enabled.

Column group statistics collection is disabled by default. To enable or disable it:

SET adb_config O_CBO_MAINTENANCE_WINDOW_COLLECT_GROUP_STATS_ENABLED = [false|true];

Set the maintenance window

The default maintenance window is 04:00–05:00, using the cluster's time zone. Set it to off-peak hours to minimize impact on query performance.

Constraints:

  • The interval between start and end times must be at least 1 minute and no more than 3 hours.

  • The start time must be earlier than the end time.

  • An invalid time falls back to the default.

SET adb_config O_CBO_MAINTENANCE_WINDOW_DURATION = [04:00-05:00];

Set the data volume threshold

To change the row count threshold that triggers sampling (default: 5,000,000,000):

SET adb_config O_CBO_MAINTENANCE_WINDOW_COLLECTOR_ROW_LIMIT = 10000;

Enable or disable I/O throttling

During the maintenance window, AnalyticDB for MySQL applies I/O throttling to limit resource usage when scanning tables. Throttling is enabled by default. If resources are idle during the maintenance window, disable throttling to speed up collection:

SET adb_config O_CBO_AUTONOMOUS_STATS_SCAN_RATE_LIMIT_ENABLED = [false|true];

Run collection in a specific resource group

By default, automatic collection runs under a system account. To run it in a specific resource group, specify a database account:

SET adb_config O_CBO_AUTONOMOUS_STATS_ACCOUNT = [user_name];

The specified account must have query permissions on all columns of all tables and be attached to a resource group. For details, see Create and manage resource groups.

Set the expiration ratio

Statistics are considered expired when the fraction of rows modified by UPDATE, DELETE, INSERT, or REPLACE operations exceeds the expiration ratio. Expired statistics are recollected in full during the next maintenance window.

The default expiration ratio is 0.1 (10%). To adjust it:

SET adb_config O_CBO_STATS_EXPIRED_RATIO = 0.1;

Valid range: 0–1. Avoid setting this above 0.5.

Collect statistics manually

Use ANALYZE TABLE to collect statistics on demand. Manual collection applies to both internal and external tables.

For large tables, use sampling (WITH ENABLE SAMPLING) instead of a full scan. Run full-table collection during off-peak hours when sampling is not available.

Collect statistics on a table

ANALYZE TABLE [schema_name.]table_name [UPDATE [BASIC|HISTOGRAM|GROUP_STATS]] [ON column_name[,...]] [WITH ENABLE SAMPLING]
ParameterRequiredDescription
schema_nameNoThe database name.
table_nameYesThe table to analyze. Each statement can specify only one table. Supports internal and external tables.
UPDATE [BASIC|HISTOGRAM|GROUP_STATS]NoThe statistics type to collect. Defaults to BASIC. GROUP_STATS requires kernel version 3.1.9.2 or later.
ON column_name[,...]NoThe columns to analyze. Omit to analyze all columns.
WITH ENABLE SAMPLINGNoCollect by sampling instead of full scan. Requires kernel version 3.1.9.2 or later.

Examples

Collect basic statistics on all columns of adb_demo.customer:

ANALYZE TABLE adb_demo.customer;
-- Equivalent:
ANALYZE TABLE adb_demo.customer UPDATE BASIC;

Collect basic statistics on the customer_id column only:

ANALYZE TABLE adb_demo.customer UPDATE BASIC ON customer_id;

Collect histogram statistics on customer_id and login_time:

ANALYZE TABLE adb_demo.customer UPDATE HISTOGRAM ON customer_id, login_time;

Collect column group statistics by sampling on customer_id and login_time:

ANALYZE TABLE adb_demo.customer UPDATE GROUP_STATS ON customer_id, login_time WITH ENABLE SAMPLING;

Collect statistics on partitions

Partition-level statistics collection is supported on Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters with kernel version 3.1.9.1 or later. This performs a full collection of basic statistics on partitions of OSS external tables.
ANALYZE TABLE table_name WITH PARTITIONS = ARRAY[ARRAY[PARTITION_KEYS] [, PARTITION_KEYS, ...]]
ParameterRequiredDescription
table_nameYesThe table to analyze. Each statement can specify only one table.
PARTITION_KEYSYesThe partitions to analyze.

Examples

Collect statistics on the 2023-01 and 2023-02 partitions of test1:

ANALYZE TABLE test1 WITH PARTITIONS = ARRAY[ARRAY['2023-01'], ARRAY['2023-02']];

Collect statistics on the (1,1) and (1,0) partitions of test2:

ANALYZE TABLE test2 WITH PARTITIONS = ARRAY[ARRAY[1, 1], ARRAY[1, 0]];

View statistics

Statistics are stored in binary format. Query the INFORMATION_SCHEMA system tables to inspect them.

View table-level statistics:

SELECT * FROM INFORMATION_SCHEMA.TABLE_STATISTICS;

View column-level statistics (basic, histogram, and column group):

SELECT * FROM INFORMATION_SCHEMA.COLUMN_STATISTICS;

FAQ

Why does ANALYZE show up as a slow query?

ANALYZE statements run during the maintenance window with I/O throttling and low CPU priority. This means they can run for extended periods and get flagged as slow queries—but this does not affect regular query performance. Resources are preferentially allocated to user queries when they arrive.

If CPU load is not elevated outside the maintenance window, ignore the slow query flag. If CPU overload is affecting query response time, see the troubleshooting steps below.

What causes high CPU load during statistics collection?

Two common causes:

  • During the maintenance window (04:00–05:00 by default): The system performs a full table scan to collect column statistics. This temporarily raises CPU utilization.

  • After a minor version upgrade to 3.1.6 or later for Data Warehouse Edition clusters: If the cluster is upgraded from an earlier version, AnalyticDB for MySQL performs a one-time full collection. CPU load may be elevated for the first few days, then returns to normal after the initial scan completes.

In both cases, check whether average query response time is actually affected. Because ANALYZE runs at low CPU priority, peak CPU utilization during collection does not mean queries are being starved.

What do I do if statistics collection is degrading query performance?

Try these steps in order:

  1. Reschedule the maintenance window to your actual off-peak hours:

    SET adb_config O_CBO_MAINTENANCE_WINDOW_DURATION = [04:00-05:00];
  2. Adjust the I/O limit for system queries if you cannot identify a clear off-peak window. We recommend changing the I/O limit to a value that is greater than or equal to 16 MB. The default is 50 MB:

    SET adb_config CSTORE_IO_LIMIT_SYSTEM_QUERY_BPS = 52428800;
  3. Isolate collection to a low-priority resource group to prevent it from competing with user queries:

    SET adb_config O_CBO_AUTONOMOUS_STATS_ACCOUNT = [user_name];
  4. Raise the expiration ratio to reduce how much data gets recollected each cycle. The default is 0.1; avoid setting it above 0.5:

    SET adb_config O_CBO_STATS_EXPIRED_RATIO = 0.1;

If none of the above resolves the issue, disable automatic collection and switch to manual collection:

SET adb_config O_CBO_AUTONOMOUS_STATS_ENABLED = false;

After disabling, collect statistics manually using ANALYZE TABLE. Without automatic collection, statistics can become stale, which may degrade query plan quality over time.

Why haven't statistics been updated for several days?

Two likely explanations:

  • Statistics have not expired yet. Statistics expire only when the fraction of modified rows (from UPDATE, DELETE, INSERT, or REPLACE) crosses the expiration ratio (default: 10%). If data changes are small relative to the total table size, statistics are considered current and are not recollected.

  • Large schemas take longer than one day. The system allocates roughly 1 hour per day for non-incremental statistics collection. If a cluster has more than 1,000 columns, a full pass can take up to a week. This is expected behavior—queries continue to run, and statistics are updated as the cycle completes.

Are statistics collected automatically when data is imported into a new table?

It depends on how you import:

  • `INSERT OVERWRITE` (batch import): Basic statistics are collected automatically right after the import completes.

  • `INSERT INTO` or `REPLACE INTO` (real-time import): Statistics are collected at the next maintenance window, or during the incremental collection period after a BUILD task completes. Run ANALYZE TABLE immediately after importing to collect basic statistics right away.