All Products
Search
Document Center

AnalyticDB:Statistics

Last Updated:Nov 25, 2025

This topic describes the purpose and types of statistics in AnalyticDB for MySQL. It also describes how statistics are collected automatically, how to collect them manually, and how to view them.

Function introduction

The query optimizer uses statistics to generate high-quality execution plans for the execution engine. The quality of an execution plan directly affects query performance.

AnalyticDB for MySQL includes an automatic statistics collection feature that is enabled by default. Clusters with kernel version 3.1.9.2 or later also support column group statistics. This feature is disabled by default and must be enabled manually. During the maintenance window, AnalyticDB for MySQL automatically performs a full or sampled collection of basic, histogram, and column group statistics based on the table's data volume. A full collection may take several days to complete if statistics are required for many columns. Outside the maintenance window, AnalyticDB for MySQL periodically performs automatic incremental collections of basic statistics.

AnalyticDB for MySQL uses different collection policies depending on the data import method:

  • For batch imports using INSERT OVERWRITE, AnalyticDB for MySQL immediately and automatically collects basic statistics.

  • For real-time imports using INSERT INTO or REPLACE INTO, AnalyticDB for MySQL triggers an incremental collection task during the maintenance window or the incremental collection period after a build is complete. We recommend that you manually collect basic statistics once after importing data.

You can also disable automatic statistics collection and run the ANALYZE TABLE statement to collect statistics manually. For more information, see Manually collect statistics.

Automatic statistics collection in AnalyticDB for MySQL applies only to internal tables, not external tables. Manual statistics collection applies to both internal and external tables.

Usage notes

The statistics feature is supported on Enterprise Edition, Basic Edition, Data Lakehouse Edition, and Data Warehouse Edition clusters with kernel version 3.1.6.1 or later.

Note

To view and update the minor version of an AnalyticDB for MySQL cluster, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.

Classification and selection of statistics

AnalyticDB for MySQL collects three types of statistics: basic (BASIC), histogram (HISTOGRAM), and column group (GROUP_STATS). Histograms and column group statistics are collected by full or sampled collection. Basic statistics are collected by full, sampled, or automatic incremental collection. The default method is automatic incremental collection.

Important

Only clusters with kernel version 3.1.9.2 or later support sampled collection for basic, histogram, and column group statistics.

Basic statistics

Basic statistics include the maximum value, minimum value, average length in bytes, number of distinct values, and proportion of NULL values in a column.

Scenarios:

  • Columns that are not used in filtering or join operations.

  • Columns with evenly distributed data, such as primary key columns.

Histogram

A histogram is created by dividing data within a range into buckets based on the basic statistics of a table. Each bucket in a histogram describes the data characteristics of a specific range.

Classification:

  • Hybrid Histogram: A variation of an equal-height histogram that can better describe hot spot values.

  • Frequency Histogram: Suitable for columns with a small number of distinct values. Each value corresponds to a bucket.

Note

AnalyticDB for MySQL automatically selects a suitable histogram for you.

Scenarios: Columns with unevenly distributed data that are used in filtering and join operations. If data is evenly distributed, basic statistics can replace histograms in filtering and join scenarios.

Compared with basic statistics, histograms more accurately reflect table statistics but are more costly and use more cache space. In scenarios with many tables, collecting histograms for all columns can decrease the cache hit ratio. The default cache can hold statistics for approximately 20,000 columns of histograms or 2 million columns of basic statistics.

Column group statistics

Important

Only clusters with kernel version 3.1.9.2 or later support the column group statistics feature.

Basic statistics and histograms are collected for single columns. Column group statistics are collected for multiple columns in the same table and describe the correlation between them.

Scenarios: Use for multi-column aggregations. If multiple columns are highly correlated, use column group statistics to more accurately estimate the number of output rows and choose a better execution method.

Automatic statistics collection

Enable or disable automatic statistics collection

AnalyticDB for MySQL supports automatic statistics collection. This feature is enabled by default. You can run the following command to disable or re-enable it.

SET adb_config O_CBO_AUTONOMOUS_STATS_ENABLED = [false|true];

Enable or disable automatic collection of column group statistics

Clusters with kernel version 3.1.9.2 or later support automatic collection of column group statistics. This feature is disabled by default. You can run the following command to enable or disable it.

SET adb_config O_CBO_MAINTENANCE_WINDOW_COLLECT_GROUP_STATS_ENABLED = [false|true];
Note

Before you use automatic collection of column group statistics, make sure that automatic statistics collection is enabled.

Set a maintenance window

The default maintenance window for automatic statistics collection is 04:00 to 05:00. You can run the following command to change the maintenance window. We recommend that you set the window to off-peak hours. The interval between the start and end times must be at least 1 minute and no more than 3 hours. The start time must be earlier than the end time. If you enter an invalid time, the default time is used.

Note

The maintenance window uses the same time zone as the cluster.

SET adb_config O_CBO_MAINTENANCE_WINDOW_DURATION = [04:00-05:00];

Set a data volume threshold for statistics collection

You can use the following command to set a data volume threshold for statistics collection. The default value is 5,000,000,000 (5 billion) rows.

SET adb_config O_CBO_MAINTENANCE_WINDOW_COLLECTOR_ROW_LIMIT = 10000;

If the number of rows in a table exceeds the data volume threshold:

  • For clusters with a kernel version earlier than 3.1.9.2, statistics collection skips the table.

  • For clusters with kernel version 3.1.9.2 or later, statistics for the table are collected by sampling.

Enable or disable load limits for statistics collection

During the maintenance window, AnalyticDB for MySQL limits the load during automatic statistics collection to reduce I/O resource usage when scanning tables. This feature is enabled by default. If resources are idle during the maintenance window, you can disable the load limit to speed up statistics updates.

SET adb_config O_CBO_AUTONOMOUS_STATS_SCAN_RATE_LIMIT_ENABLED = [false|true];

Collect statistics in a specified resource group

By default, automatic statistics collection uses a system account to run commands. To run automatic statistics collection commands in a specific resource group, you can run the following command to specify a database account. After you specify a database account, AnalyticDB for MySQL runs the commands in the resource group attached to that account. The database account must have query permissions on all columns of all tables and be attached to a resource group. For more information about how to attach a resource group, see Create and manage resource groups.

SET adb_config O_CBO_AUTONOMOUS_STATS_ACCOUNT = [user_name];

Set the expiration ratio for columns

The default expiration ratio for columns is 0.1 (10%). When the ratio of rows affected by `UPDATE`, `DELETE`, `INSERT`, or `REPLACE` operations to the total number of rows in a table exceeds the expiration ratio, the statistics for the table are considered expired. AnalyticDB for MySQL then recollects statistics for all columns of the expired table during the maintenance window. If the expiration ratio is not exceeded, statistics are not automatically collected during the maintenance window.

SET adb_config O_CBO_STATS_EXPIRED_RATIO = 0.1;

Manually collect statistics

Collect statistics on an entire table

The ANALYZE TABLE command scans an entire table to collect statistics. For large tables, collecting statistics on the entire table can be time-consuming. We recommend that you run this command during off-peak hours or collect statistics by sampling.

Syntax

ANALYZE TABLE [schema_name.]table_name [UPDATE [BASIC|HISTOGRAM|GROUP_STATS]] [ON column_name[,...]] [WITH ENABLE SAMPLING]

Parameters

Parameter

Required

Description

schema_name

No

The database name.

table_name

Yes

The table name. AnalyticDB for MySQL collects statistics for this table. Each ANALYZE TABLE statement can specify only one table. The table can be an internal or external table.

UPDATE [BASIC|HISTOGRAM|GROUP_STATS]

No

The type of statistics. Valid values:

  • BASIC (default): basic statistics.

  • HISTOGRAM: histogram.

  • GROUP_STATS: column group statistics.

Important

Only clusters with kernel version 3.1.9.2 or later support the column group statistics feature.

ON column_name[,...]

No

The columns for which to collect statistics. If you do not specify columns, statistics are collected for all columns in the table.

WITH ENABLE SAMPLING

No

Collect statistics by sampling.

Important

Only clusters with kernel version 3.1.9.2 or later support sampled collection for basic, histogram, and column group statistics.

Examples

  • Collect basic statistics on all columns of the adb_demo.customer table. You can use either of the following statements:

    • ANALYZE TABLE adb_demo.customer;
    • ANALYZE TABLE adb_demo.customer UPDATE BASIC;
  • Collect basic statistics on the customer_id column of the adb_demo.customer table.

    ANALYZE TABLE adb_demo.customer UPDATE BASIC ON customer_id;
  • Collect histogram information on the customer_id and login_time columns of the adb_demo.customer table.

    ANALYZE TABLE adb_demo.customer UPDATE HISTOGRAM ON customer_id,login_time;
  • Collect column group statistics by sampling on the customer_id and login_time columns of the adb_demo.customer table.

    ANALYZE TABLE adb_demo.customer UPDATE GROUP_STATS ON customer_id,login_time with enable sampling;

Collect statistics on partitions

Version limits

Only Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters with kernel version 3.1.9.1 or later support using the ANALYZE TABLE command to perform a full collection of basic statistics on partitions of OSS external tables.

Note

To view and update the minor version of an AnalyticDB for MySQL cluster, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.

Syntax

ANALYZE TABLE table_name WITH PARTITIONS = ARRAY[ARRAY[PARTITION_KEYS] [, PARTITION_KEYS, ....]]

Parameter description

Parameter

Required

Description

table_name

Yes

The table name. AnalyticDB for MySQL collects statistics for this table. Each ANALYZE TABLE statement can specify only one table. The table can be an internal or external table.

PARTITION_KEYS

Yes

The partitions for which to collect statistics.

Examples

  • Collect statistics on the 2023-01 and 2023-02 partitions of the test1 partitioned table.

    ANALYZE TABLE test1 WITH PARTITIONS = ARRAY[ARRAY['2023-01'], ARRAY['2023-02']];
  • Collect statistics on the (1,1) and (1,0) partitions of the test2 partitioned table.

    ANALYZE TABLE test2 WITH PARTITIONS = ARRAY[ARRAY[1, 1], ARRAY[1, 0]];

View statistics

Statistics are stored in binary format in AnalyticDB for MySQL. You can view statistics in the INFORMATION_SCHEMA system table.

  • Run the following command to view table-level statistics:

    SELECT * FROM INFORMATION_SCHEMA.TABLE_STATISTICS;
  • Run the following command to view column-level statistics, including basic statistics, histograms, and column group statistics:

    SELECT * FROM INFORMATION_SCHEMA.COLUMN_STATISTICS;

FAQ

Why is the ANALYZE command diagnosed as a slow query?

ANALYZE statements that are automatically initiated during the maintenance window are executed with I/O throttling and low CPU priority. The statements may be diagnosed as slow queries because they are executed for extended periods of time. However, this does not affect services. If the CPU load is not high or CPU overload is not closely associated with the maintenance window, you can ignore this issue. If the CPU is continuously overloaded, refer to the "What do I do if the query response time is affected by CPU overload that is caused by statistics collection?" section of this topic to resolve this issue.

What causes high CPU load when the statistics feature is used?

A CPU may become overloaded due to the following reasons:

  • During the default maintenance window from 04:00 to 05:00, the system performs a full scan on each table to collect column statistics. During this period, the CPU is overloaded.

  • In most cases, statistics are incrementally collected, which does not consume large amounts of resources. By default, the statistics feature is enabled for AnalyticDB for MySQL Data Warehouse Edition clusters that run V3.1.6 or later. When a cluster is updated from an earlier minor version to V3.1.6 or later, full statistics are collected. This may cause CPU overloads during the first days after the minor version update. The CPU load decreases after the full data scan is complete.

When the CPU is overloaded, check whether the query response time is affected. If the average query response time does not significantly change, the query response time is not affected. The value of the CPU utilization metric may be high during statistics collection, but when queries are executed, resources are preferentially allocated for the execution. This is because ANALYZE statements are executed with I/O throttling and low CPU priority.

What do I do if high CPU load affects query response time?

You can resolve this issue by using the following solutions:

  • Change the maintenance window to off-peak hours.

    set adb_config O_CBO_MAINTENANCE_WINDOW_DURATION = [04:00-05:00];
  • If you cannot determine the appropriate off-peak hours, we recommend that you change the I/O limit for system queries to a value that is greater than or equal to 16 MB. The default value is 50 MB.

    set adb_config CSTORE_IO_LIMIT_SYSTEM_QUERY_BPS = 52428800;
  • Assign statistics collection to a low-priority resource group to isolate loads. For more information, see the "Automatic statistics collection" section of the Statistics topic.

    set adb_config O_CBO_AUTONOMOUS_STATS_ACCOUNT = [user_name];
  • Increase the expiration ratio for columns to reduce the data to be collected. The default value is 0.1. The value ranges between 0 and 1. We recommend that you do not set the expiration ratio to a value that is greater than 0.5.

    set adb_config O_CBO_STATS_EXPIRED_RATIO = 0.1;

If none of the preceding solutions resolve this issue, execute the set adb_config O_CBO_AUTONOMOUS_STATS_ENABLED=false; statement to disable automatic statistics collection. However, performance may be degraded. If you want to collect statistics in the future, you must manually collect statistics. For more information, see the "Manually collect statistics" section of the Statistics topic.

Why are statistics not updated for several days?

This issue is caused by the following reasons:

  • The statistics are not expired.

    Statistics expire when the amount of data that is updated, inserted, or replaced reaches the expiration ratio. The default expiration ratio is 0.1 (10%). If only a small amount of your data is changed, you can continue to use your cluster as expected and observe this issue for another week.

  • A large amount of data is contained in many columns and tables.

    By default, only 1 hour per day is required to collect statistics that are not included in incremental updates. If a large number of columns and tables are involved, such as more than 1,000 columns, the system may not be able to complete an update within one day. The update may require one week to complete. In this case, the absence of statistics updates within several days is normal. You can continue to use the cluster as expected and observe this issue.

Are statistics automatically updated when data is imported into a new table?

If data is batch imported by using the INSERT OVERWRITE statement, basic statistics are automatically collected after the data import is complete. If data is imported in real time by using the INSERT INTO or REPLACE INTO statement, statistics are collected during the next maintenance window or an incremental collection task is triggered during the incremental collection period after each BUILD task is complete. We recommend that you manually collect basic statistics after data is imported. For more information, see the "Manually collect statistics" section of the Statistics topic.