All Products
Search
Document Center

MaxCompute:Compaction

Last Updated:Mar 26, 2026

Frequent writes to delta tables generate many small files, which reduce I/O throughput and increase storage pressure. Compaction merges all data files into a consolidated set of base files, keeping only the latest version of each row, so queries scan fewer files and run faster.

Overview

Delta tables support two data merging methods:

MethodWhat it doesTrigger
ClusteringMerges only delta files into a larger file without changing data content. Delta files contain data from committed operations.Automatic — the system runs clustering periodically based on file size and count. No manual action required.
CompactionMerges all data files into a batch of base files. For rows with the same primary key, only the latest row is kept. Historical row versions and system column data are not retained.Manual — run ALTER TABLE ... COMPACT MAJOR.

Prerequisites

Before you begin, ensure that you have:

  • A delta table (created with tblproperties ("transactional"="true"))

  • The session property odps.merge.task.mode=service set in your session

Syntax

ALTER TABLE <table_name>
  [PARTITION (<partition_key> = '<partition_value>' [, ...])]
  COMPACT MAJOR;

Parameters

ParameterRequiredDescription
table_nameYesName of the delta table to compact.
partition_keyNoPartition key name. Specify to compact a single partition.
partition_valueNoValue of the partition key.

Usage notes

  • Session property required: Set odps.merge.task.mode=service before running compaction.

    SET odps.merge.task.mode=service;
  • Storage cost: Compaction generates new base files before removing old ones, so storage usage increases temporarily. You can configure a trigger frequency based on your business requirements.

  • Write amplification and conflicts: The compaction policy is designed to prevent serious write amplification and conflict-caused failures in frequent compaction scenarios.

Billing

Billing methodHow compaction is charged
Pay-as-you-goAmount of data scanned × 1 × unit price
SubscriptionDeducted from the subscription computing quota

Example

The following example creates a partitioned delta table, inserts data across three batches, runs compaction, and then queries historical data using time travel.

-- Create a delta table with a primary key and two partition columns.
CREATE TABLE mf_dt (pk BIGINT NOT NULL PRIMARY KEY, val BIGINT NOT NULL)
  PARTITIONED BY (dd STRING, hh STRING)
  TBLPROPERTIES ("transactional"="true");

-- Insert data in three batches to simulate incremental writes.
INSERT INTO TABLE mf_dt PARTITION(dd='01', hh='01') VALUES (1, 1), (2, 2);
INSERT INTO TABLE mf_dt PARTITION(dd='01', hh='01') VALUES (2, 20), (3, 3);
INSERT INTO TABLE mf_dt PARTITION(dd='01', hh='01') VALUES (3, 30), (4, 4);

-- Run compaction on the partition.
SET odps.merge.task.mode=service;
ALTER TABLE mf_dt PARTITION(dd='01', hh='01') COMPACT MAJOR;

-- Query the latest snapshot after compaction.
SELECT * FROM mf_dt TIMESTAMP AS OF get_latest_timestamp('mf_dt')
  WHERE dd='01' AND hh='01';

-- Query the second-to-latest snapshot (time travel still works after compaction).
SELECT * FROM mf_dt TIMESTAMP AS OF get_latest_timestamp('mf_dt', 2)
  WHERE dd='01' AND hh='01';