How to use the COMPACTION SQL command - MaxCompute - Alibaba Cloud Documentation Center

Frequent writes to delta tables generate many small files, which reduce I/O throughput and increase storage pressure. Compaction merges all data files into a consolidated set of base files, keeping only the latest version of each row, so queries scan fewer files and run faster.

Overview

Delta tables support two data merging methods:

Method	What it does	Trigger
Clustering	Merges only delta files into a larger file without changing data content. Delta files contain data from committed operations.	Automatic — the system runs clustering periodically based on file size and count. No manual action required.
Compaction	Merges all data files into a batch of base files. For rows with the same primary key, only the latest row is kept. Historical row versions and system column data are not retained.	Manual — run `ALTER TABLE ... COMPACT MAJOR`.

Prerequisites

Before you begin, ensure that you have:

A delta table (created with tblproperties ("transactional"="true"))
The session property odps.merge.task.mode=service set in your session

Syntax

ALTER TABLE <table_name>
  [PARTITION (<partition_key> = '<partition_value>' [, ...])]
  COMPACT MAJOR;

Parameters

Parameter	Required	Description
`table_name`	Yes	Name of the delta table to compact.
`partition_key`	No	Partition key name. Specify to compact a single partition.
`partition_value`	No	Value of the partition key.

Usage notes

Session property required: Set odps.merge.task.mode=service before running compaction.
```
SET odps.merge.task.mode=service;
```
Storage cost: Compaction generates new base files before removing old ones, so storage usage increases temporarily. You can configure a trigger frequency based on your business requirements.
Write amplification and conflicts: The compaction policy is designed to prevent serious write amplification and conflict-caused failures in frequent compaction scenarios.

Billing

Billing method	How compaction is charged
Pay-as-you-go	`Amount of data scanned × 1 × unit price`
Subscription	Deducted from the subscription computing quota

Example

The following example creates a partitioned delta table, inserts data across three batches, runs compaction, and then queries historical data using time travel.

-- Create a delta table with a primary key and two partition columns.
CREATE TABLE mf_dt (pk BIGINT NOT NULL PRIMARY KEY, val BIGINT NOT NULL)
  PARTITIONED BY (dd STRING, hh STRING)
  TBLPROPERTIES ("transactional"="true");

-- Insert data in three batches to simulate incremental writes.
INSERT INTO TABLE mf_dt PARTITION(dd='01', hh='01') VALUES (1, 1), (2, 2);
INSERT INTO TABLE mf_dt PARTITION(dd='01', hh='01') VALUES (2, 20), (3, 3);
INSERT INTO TABLE mf_dt PARTITION(dd='01', hh='01') VALUES (3, 30), (4, 4);

-- Run compaction on the partition.
SET odps.merge.task.mode=service;
ALTER TABLE mf_dt PARTITION(dd='01', hh='01') COMPACT MAJOR;

-- Query the latest snapshot after compaction.
SELECT * FROM mf_dt TIMESTAMP AS OF get_latest_timestamp('mf_dt')
  WHERE dd='01' AND hh='01';

-- Query the second-to-latest snapshot (time travel still works after compaction).
SELECT * FROM mf_dt TIMESTAMP AS OF get_latest_timestamp('mf_dt', 2)
  WHERE dd='01' AND hh='01';