All Products
Search
Document Center

MaxCompute:Clustering optimization recommendations

Last Updated:Dec 05, 2025

MaxCompute analyzes the read and write characteristics of tables and generates clustering suggestions to improve job performance and reduce compute unit (CU) consumption. You can evaluate the estimated benefits and recommendation details to decide whether to apply these suggestions.

Usage notes

  • Supported regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), and China (Chengdu).

  • Clustering optimization recommendations do not support Layer 3 model projects.

  • Clustering optimization statistics are intended to cover the run history of all jobs. However, these statistics do not include jobs that involve multiple Fuxi Job instances, so a small number of jobs may not be included.

  • For more information about clustered table optimization jobs, see Hash Clustering.

View clustering optimization recommendations

You can view the recommended tables for all or specific projects in the current region. The estimated benefits and recommendation details can help you decide whether to adopt the suggestions. The procedure is as follows:

  1. Log on to the MaxCompute console and select a region in the top-left corner.

  2. In the navigation pane on the left, choose Intelligent Optimization > Data Layout Optimization.

  3. On the Clustering Optimization tab, click Estimated Benefits and filter the parameters to find tables that are recommended for clustering.

    Parameter

    Description

    Project Name

    Select a MaxCompute project name from the drop-down list. If you do not select a project, all projects are included by default.

    Table Name

    Enter the table name. Fuzzy search is supported. Separate multiple table names with a comma ,.

    Recommendation Generation Date

    The date the recommendation was generated. The default is the previous day.

    • Estimated benefit indicators

      Indicator

      Description

      Estimated Number of Beneficial Jobs/Day

      The estimated number of jobs that can benefit each day after converting the recommended table to a clustering table.

      Estimated Shuffle Reduction/Day

      The estimated daily savings in shuffle volume after converting the recommended table to a clustered table.

      Saving shuffle volume can reduce the CU-hours that jobs use. Typically, saving 1 TB of shuffle volume can save 2 to 4 CU-hours per day.

    • Optimization recommendation list

      Review the suggested parameter values in the list and view the suggestions for more details about table optimization.

      Column Name

      Description

      Project

      The project where the table with recommended clustering property modifications is located.

      Table Name

      The name of the table with recommended clustering property modifications.

      Type

      The recommended clustering type for the table. Only recommendations based on Hash Clustering are supported.

      Suggested Cluster Key

      The recommended ClusterKey for the table, mainly related to Shuffle Removal and point query data filtering.

      Suggested Sort Key

      The recommended SortKey for the table. Mainly related to data filtering and storage compression rate.

      Bucket Count

      The recommended number of Buckets for the table. Mainly related to the parallelism of writing tables and reading tables after Shuffle Removal.

      Recommendation Index

      From 1 star to 5 stars, the more stars, the more recommended it is to modify the clustering properties of the table. The calculation method of the star rating is as follows:

      Evaluation dimension

      Deduction rule

      Weight coefficient

      Timeliness of the optimizable pattern

      The observation window is less than 14 days.

      Deduct 1 star

      Shuffle data savings (read)

      Savings are less than 1 TB.

      Deduct 1 star

      Write job popularity

      No write job records are found for the day.

      Deduct 1 star

      Number of optimizable partitions

      • More than 3 optimizable partitions: Deduct 1 star

      • More than 31 optimizable partitions: Deduct 2 stars

      Dynamic adjustment

      Note
      • If no write job records are counted on the day, it is impossible to estimate the daily write increase cost, resulting in a deduction.

      • When there are many partitions that can be optimized, Cluster optimization must be effective after rewriting or actively rewriting many existing partitions with new data. This situation will result in a deduction.

      Estimated Shuffle Reduction/Day

      The estimated Shuffle volume savings per day after converting the table to a clustering table.

      Observation Interval

      The number of days the same optimization recommendation appears within a period.

      Actions

      Click a suggestion to go to the Details of Table with Optimization Recommendations page, which includes the following:

      • Optimization Recommendations

      • Current Status of Table

      • Overview of Estimated Benefits

        • Estimated Shuffle Reduction/Day

        • Beneficial Table Read Jobs

        • Full Table Write Jobs

        • Full Table Read Jobs

Apply clustering optimization recommendations

Directly apply clustering recommendations to the original table

  • Interface Operation

    For partitioned tables, you can directly apply the recommendations to convert the original table into the recommended clustering table with one click. Here's how:

    1. In the navigation pane on the left, choose Intelligent Optimization > Data Layout Optimization.

    2. On the Clustering Optimization tab, click Estimated Benefits.

    3. Click View Details in the Actions column for the target table to open the Details of Table with Optimization Recommendations page.

    4. Click Apply Recommendations in the upper-right corner to complete the conversion.

  • Operate through SQL commands as follows:

    --Change the table to a Hash Clustering table
    ALTER TABLE <table_name> [CLUSTERED BY (<col_name> [, <col_name>, ...])
                           [SORTED BY (<col_name> [ASC | DESC] [, <col_name> [ASC | DESC] ...])]
                           INTO <number_of_buckets> BUCKETS];
  • After the conversion is complete, you can view the optimized clustered table and run read jobs to ensure that all jobs run as expected. If any issues occur, immediately roll back the operation. The command is as follows:

    --Change the Hash Clustering table to a non-Hash Clustering table
    ALTER TABLE <table_name> NOT CLUSTERED;
Important
  • After converting to a clustered table, you cannot perform incremental write operations such as INSERT INTO or Tunnel upload.

  • You cannot directly apply clustering recommendations to non-partitioned tables.

  • Modifying clustering properties will result in increased latency and CU consumption for write table jobs. Conversely, CU consumption for read table jobs will decrease, leading to overall CU savings.

  • In certain scenarios, we recommend rewriting partitions and verifying the benefits for downstream jobs that can be optimized. For more information, see (Recommended) Rewrite partition data.

(Recommended) Rewrite partition data

After the clustering properties of a table are modified, the changes take effect only on new partitions. To apply the optimization to existing partitions, you must rewrite the data in those partitions. For the following scenarios, you must rewrite partitions and verify the benefits for downstream jobs that can be optimized:

  • Tables used by high-priority and latency-sensitive jobs.

  • Tables with large partition data, where a single write exceeds 10TB.

  • Tables where downstream jobs read multiple partitions and need simultaneous rewriting to cluster partitions for optimization.

  1. For large tables in jobs with daily full incremental merge writes, rewrite the last day's partition to mitigate increased costs and slower speeds when the full incremental job runs the next day.

    -- Assume the partition key column is ds, and the daily new partitions are 20241015, 20241016. The new partition data is derived from merging incremental data from the previous day's partition.
    INSERT OVERWRITE TABLE <table_name> PARTITION(ds) 
    SELECT * FROM <table_name> WHERE ds = max_pt('<table_name>');
  2. When optimizable jobs read multiple existing partitions, rewrite those within the read range to apply cluster optimization sooner.

    -- Assume the partition column is ds, and the daily new partitions are 20241015, 20241016, ..., with the read range starting from 20241015
    INSERT OVERWRITE TABLE <table_name> PARTITION(ds) 
    SELECT * FROM <table_name> WHERE ds >='20241015';
  3. After you rewrite the data, run a trial of the optimizable job to verify the effectiveness of the optimization.

Apply clustering recommendations to a new table

For non-partitioned tables, you cannot directly modify the original table's clustering properties. Instead, you can manually apply them by creating a new cluster table. Follow these steps:

  1. Create a new cluster table.

    -- Confirm the creation statement of the existing table
    SHOW CREATE TABLE <orignal_table>;
    -- Insert CLUSTER information in the appropriate position according to the CREATE TABLE syntax to create a new Cluster table
    CREATE TABLE <new_table> [CLUSTERED BY (<col_name> [, <col_name>, ...])
                              [SORTED BY (<col_name> [ASC | DESC] [, <col_name> [ASC | DESC] ...])]
                              INTO <number_of_buckets> BUCKETS];
  2. Import data into the new table.

    INSERT OVERWRITE TABLE <new_table>
    SELECT * FROM <original_table>;
  3. Rename the original table to a backup table.

    ALTER TABLE <orignal_table> RENAME TO <orignal_table_backup>;
  4. Rename the new table to the original table.

    ALTER TABLE <new_table> RENAME TO <orignal_table>;

The new table has the same name as the original table, and its data is clustered based on the new clustering properties.

Clustering recommendation rollback

You can view the new clustered table to ensure that all jobs run as expected. If any issues occur, immediately roll back the changes. The commands are as follows:

  1. Delete the new table (with the same name as the original table).

    DROP TABLE IF EXISTS <orignal_table>;
  2. Rename the backup table to the original table.

    ALTER TABLE <orignal_table_backup> RENAME TO <orignal_table>;

View clustering optimization benefits

On the Clustering Optimization tab, click Actual Benefits to view the actual benefits of clustering optimization. Follow these steps:

  1. Log on to the MaxCompute console and select a region in the top-left corner.

  2. In the navigation pane on the left, choose Intelligent Optimization > Data Layout Optimization.

  3. On the Clustering Optimization tab, click Actual Benefits.

  4. You can filter by Project Name and Analysis Time to view a summary and details of the benefits for clustered tables whose clustering properties have been modified.

    • Benefit indicator descriptions

      Metric

      Description

      Number of Benefited Jobs

      The number of times jobs read the recently modified clustered table during the benefit calculation period.

      Saved CU-hours

      The reduction in CU-hours consumed by jobs that read the recently modified clustered table. This compares consumption during the benefit calculation period to consumption before the table was clustered.

      Reduced Shuffle Amount

      The reduction in Shuffle consumption by jobs that read the recently modified clustered table. This compares consumption during the benefit calculation period to consumption before the table was clustered.

      Clustering optimization benefits are calculated by comparing the average consumption of jobs with the same signature before and after the modification. The statistics cover clustered tables that were modified based on recommendations within the last 365 days.

    • Optimized list

      Column Name

      Description

      Project

      The project where the clustering table with modified clustering properties is located.

      Table Name

      The name of the table with modified clustering properties.

      Clustering Attribute Modification Time

      The date of the most recent modification of the table's clustering properties.

      Number of Benefited Jobs

      The number of times the table was read within the benefit statistics interval after modifying the clustering properties.

      Saved Computing Duration

      The savings in computation time for jobs reading the table within the benefit statistics interval compared to before.

      Saved CU-hours

      The savings in CU hours consumed by jobs reading the table within the benefit statistics interval compared to before.

      Reduced Shuffle Amount

      The savings in Shuffle volume consumed by jobs reading the table within the benefit statistics interval compared to before.

      Actions

      Click a recommendation to go to the Details of Optimized Table page, which includes the following information:

      • Current Status of Table

        • Clustering Type

        • ClusterKey

        • SortKey

        • Number of Buckets

        • Partition Key

      • Benefit Overview

        • Number of Benefited Jobs

        • Saved CU-hours

        • Reduced Shuffle Amount

      • Beneficial read table jobs list

        • Signature

        • Saved Computing Duration

        • Saved CU-hours

        • Reduced Shuffle Amount

    Important
    • After converting a table to a clustered table, the daily job benefit statistics are updated on the following day (T+1). You can view the real-time effects of optimization through job operation and maintenance tools or LogView.

    • Statistics on the benefits of clustering optimization are derived from historical job run records with the same Signature. These statistics are subject to various factors, such as daily job performance fluctuations. If the benefits fall short of expectations, please review the execution details of jobs on different dates before and after optimization to identify influencing factors.

    • Please note that clustering optimization benefit statistics are provided for guidance only. The actual CU savings will be calculated based on your bill.