All Products
Search
Document Center

MaxCompute:ANALYZE

Last Updated:Mar 26, 2026

The ANALYZE TABLE statement collects column statistics for a table. The query optimizer uses these statistics to generate efficient execution plans.

Syntax

ANALYZE TABLE <table_name> [PARTITION (<pt_spec>)]
    COMPUTE STATISTICS FOR COLUMNS [(<column_name> [, <column_name> ...])];

Omit PARTITION for non-partitioned tables. Omit the column list to collect statistics for all columns.

Delete statistics

To delete statistics instead of collecting them:

-- Delete statistics for all columns
ANALYZE TABLE <table_name> DELETE STATISTICS FOR COLUMNS;

-- Delete statistics for specific columns
ANALYZE TABLE <table_name> DELETE STATISTICS FOR COLUMNS (<column_name> [, <column_name> ...]);

Parameters

Parameter Description
table_name The name of the table.
column_name The name of a column to analyze. Omit to analyze all columns.
pt_spec The partition to analyze. Format: <partition_key>='<value>'[, <partition_key>='<value>' ...]. For example, ds='20201221' or ds='20201221', hr='11'.

Statistics output fields

After running ANALYZE TABLE, use SHOW STATISTIC to view the results. The output includes the following fields:

Field Applies to Description
MaxValue Numeric types Maximum value in the column.
MinValue Numeric types Minimum value in the column.
DistinctNum All types Number of distinct values (NDV).
NullNum All types Number of null values.
TopK All types Up to 20 values with the highest occurrence frequency. For example, 10=1.0 means the value 10 appears once.
MaxLength STRING, VARCHAR Maximum column value length.
AvgLength STRING, VARCHAR Average column value length.

Examples

Collect statistics on a non-partitioned table

The following minimal example shows the basic syntax:

ANALYZE TABLE my_table COMPUTE STATISTICS FOR COLUMNS (col1, col2);

The following complete example creates a table, inserts data, collects statistics, and verifies the results.

-- Create a non-partitioned table
CREATE TABLE IF NOT EXISTS analyze2_test (
    tinyint1   TINYINT,
    smallint1  SMALLINT,
    int1       INT,
    bigint1    BIGINT,
    double1    DOUBLE,
    decimal1   DECIMAL,
    decimal2   DECIMAL(20, 10),
    string1    STRING,
    varchar1   VARCHAR(10),
    boolean1   BOOLEAN,
    timestamp1 TIMESTAMP,
    datetime1  DATETIME
) LIFECYCLE 30;

-- Insert sample data
INSERT OVERWRITE TABLE analyze2_test
SELECT * FROM VALUES
    (1Y,  20S, 4, 8L,         123452.3, 12.4, 52.5,  'str1',   'str21',  false, TIMESTAMP '2018-09-17 00:00:00', DATETIME '2018-09-17 00:59:59'),
    (10Y,  2S, 7, 11111118L,   67892.3, 22.4, 42.5,  'str12',  'str200', true,  TIMESTAMP '2018-09-17 00:00:00', DATETIME '2018-09-16 00:59:59'),
    (20Y,  7S, 4, 2222228L,      12.3,   2.4,  2.57, 'str123', 'str2',   false, TIMESTAMP '2018-09-18 00:00:00', DATETIME '2018-09-17 00:59:59'),
    (null, null, null, null, null, null, null, null, null, null, null, null)
AS t(tinyint1, smallint1, int1, bigint1, double1, decimal1, decimal2, string1, varchar1, boolean1, timestamp1, datetime1);

-- Collect statistics for a single column
ANALYZE TABLE analyze2_test COMPUTE STATISTICS FOR COLUMNS (tinyint1);

-- Collect statistics for multiple columns
ANALYZE TABLE analyze2_test COMPUTE STATISTICS FOR COLUMNS (smallint1, string1, boolean1, timestamp1);

-- Collect statistics for all columns
ANALYZE TABLE analyze2_test COMPUTE STATISTICS FOR COLUMNS;

Verify the results with SHOW STATISTIC:

-- Verify single-column statistics
SHOW STATISTIC analyze2_test COLUMNS (tinyint1);

-- Verify multi-column statistics
SHOW STATISTIC analyze2_test COLUMNS (smallint1, string1, boolean1, timestamp1);

-- Verify all-column statistics
SHOW STATISTIC analyze2_test COLUMNS;

The following result is returned for the tinyint1 column:

ID = 20201126085225150gnqo****
tinyint1:MaxValue:     20
tinyint1:DistinctNum:  4.0
tinyint1:MinValue:     1
tinyint1:NullNum:      1.0
tinyint1:TopK:         {1=1.0, 10=1.0, 20=1.0}

Collect statistics on a partitioned table

The following complete example creates a partitioned table, inserts data across multiple partitions, collects statistics for a specific partition, and verifies the results.

-- Create a partitioned table
CREATE TABLE IF NOT EXISTS srcpart_test (
    key   STRING,
    value STRING
)
PARTITIONED BY (ds STRING, hr STRING)
LIFECYCLE 30;

-- Insert data into multiple partitions
INSERT INTO TABLE srcpart_test PARTITION(ds='20201220', hr='11') VALUES
    ('123', 'val_123'), ('76', 'val_76'), ('447', 'val_447'), ('1234', 'val_1234');

INSERT INTO TABLE srcpart_test PARTITION(ds='20201220', hr='12') VALUES
    ('3', 'val_3'), ('12331', 'val_12331'), ('42', 'val_42'), ('12', 'val_12');

INSERT INTO TABLE srcpart_test PARTITION(ds='20201221', hr='11') VALUES
    ('543', 'val_543'), ('2', 'val_2'), ('4', 'val_4'), ('9', 'val_9');

INSERT INTO TABLE srcpart_test PARTITION(ds='20201221', hr='12') VALUES
    ('23', 'val_23'), ('56', 'val_56'), ('4111', 'val_4111'), ('12333', 'val_12333');

-- Collect statistics for the ds='20201221' partition
ANALYZE TABLE srcpart_test PARTITION(ds='20201221') COMPUTE STATISTICS FOR COLUMNS (key, value);

Verify the results:

SHOW STATISTIC srcpart_test PARTITION (ds='20201221') COLUMNS (key, value);

The following result is returned, with statistics broken out by sub-partition:

ID = 20210105121800689g28p****
(ds=20201221,hr=11) key:MaxLength      3.0
(ds=20201221,hr=11) key:AvgLength:     1.0
(ds=20201221,hr=11) key:DistinctNum:   4.0
(ds=20201221,hr=11) key:NullNum:       0.0
(ds=20201221,hr=11) key:TopK:          {2=1.0, 4=1.0, 543=1.0, 9=1.0}

(ds=20201221,hr=11) value:MaxLength    7.0
(ds=20201221,hr=11) value:AvgLength:   5.0
(ds=20201221,hr=11) value:DistinctNum: 4.0
(ds=20201221,hr=11) value:NullNum:     0.0
(ds=20201221,hr=11) value:TopK:        {val_2=1.0, val_4=1.0, val_543=1.0, val_9=1.0}

(ds=20201221,hr=12) key:MaxLength      5.0
(ds=20201221,hr=12) key:AvgLength:     3.0
(ds=20201221,hr=12) key:DistinctNum:   4.0
(ds=20201221,hr=12) key:NullNum:       0.0
(ds=20201221,hr=12) key:TopK:          {12333=1.0, 23=1.0, 4111=1.0, 56=1.0}

(ds=20201221,hr=12) value:MaxLength    9.0
(ds=20201221,hr=12) value:AvgLength:   7.0
(ds=20201221,hr=12) value:DistinctNum: 4.0
(ds=20201221,hr=12) value:NullNum:     0.0
(ds=20201221,hr=12) value:TopK:        {val_12333=1.0, val_23=1.0, val_4111=1.0, val_56=1.0}

When you analyze a partition, MaxCompute returns separate statistics for each sub-partition within the specified range.