Delta Table Concepts, ACID Transactions & SQL Operations - MaxCompute

A Delta Table is a high-performance table format in MaxCompute designed for large-scale analytical datasets. It includes two types: the Append Delta Table for tables without a primary key and the PK Delta Table for tables with a primary key. This topic describes the basic features and operations of Delta Tables.

Overview

A Delta Table is a high-performance table format in Alibaba Cloud MaxCompute. It supports features such as atomicity, consistency, isolation, and durability (ACID) transactions, incremental queries, time travel, dynamic cluster bucketing, real-time data updates, and schema evolution. The native data lakehouse and near-real-time computing capabilities in MaxCompute allow you to use standard SQL to create, update, and query Delta Tables. You do not need to manage complex underlying storage and metadata. MaxCompute automatically maintains and optimizes them to provide a balance between ease of use and cost-effectiveness.

Features

Category	Feature	Append Delta Table	PK Delta Table
Basic DML	Insert, Update, Delete, and Merge Into.	Supported	Support
ACID transactions	Read Committed/Snapshot Isolation. For more information, see ACID transaction management.	Supported	Supported
Primary key	Defines a primary key.	Not supported	Supports partial column updates
Schema evolution	Add, delete, rename, reorder, and change the data type of columns. For more information, see ALTER TABLE.	Supported	Supported
Data import	Supports streaming writes for high-concurrency, scalable incremental data import using tools such as Use Flink to write data to a Delta table, DataWorks data integration, and DataHub. Supports incremental and full batch writes.	Stream/Batch Upload Data is visible immediately after a stream upload.	Upsert
Time travel	Supports querying historical snapshots by time point or version number to reproduce results or perform audit analysis. For more information, see Time travel.	Supported	Supported
Incremental computing	Delta Live Materialized View (MV) and Incremental Read. For more information, see Incremental computing and Incremental queries.	Supported (Incremental materialized view is being adapted.)	Supported
Data organization optimization	Automatically maintains incremental data files, including optimizations such as small file merging, multi-level COMPACTION, and data sorting, keeping data storage and computation in a stable and efficient state. For more information, see Optimizing data organization for Append Delta Tables and Data organization optimization for PK Delta tables.	Supported You do not need to configure the number of buckets. Dynamic Bucketing automatically adapts to the data distribution.	Supported
Query performance optimization	Partition-level and file-level statistics (such as Min/Max), partition pruning, column pruning, and predicate pushdown.	Supported	Supported
Security and compliance	Storage encryption / Dynamic data masking / Row-level access control.	Supported	Supported
Disaster recovery and backup	Table snapshots / Local backup / Zone-disaster Recovery.	Support	Support
Cost	AliORC column-store compression and tiered storage.	Supported	Support

User experience

Real-time data updates for real-time services

Delta Tables support real-time data writes and updates (upserts) through Stream Upload. Data becomes visible immediately after it is written. To balance the real-time performance of data writes with query performance, MaxCompute uses a tiered storage and optimization strategy:

Ensure real-time writes: New data is first written quickly to non-clustered buckets without being sorted. This process ensures low latency and high throughput for data writes.
Improve SQL query performance: The background Incremental Reclustering service asynchronously reorganizes and optimizes incremental data into sorted, clustered buckets. When you run a query, the query engine can efficiently prune the sorted base data and scan only a small amount of incremental data. This approach balances data freshness with query efficiency.

Efficient incremental data processing and analysis

Built on its underlying incremental data read and write capabilities, MaxCompute provides a rich set of high-level features to improve the timeliness of end-to-end data analytics. You can combine advanced features, such as Incremental computing and Delta Live Materialized View (Delta Live MV) (invitational preview), to build efficient real-time data processing pipelines and accelerate the transformation of data into business insights.

Adapt to business growth and overcome the limits of previous table formats

Dynamic bucket allocation: Append Delta Tables support dynamic bucket allocation. You do not need to specify the number of buckets in the Data Definition Language (DDL) statement. You also do not need to estimate the future data volume of each partition to determine a suitable number of buckets. As you write more data, the Dynamic Bucketing service automatically splits existing buckets or creates new ones. This feature dynamically adapts to changes in business data volume and resolves issues, such as data skew and fragmentation, that are caused by buckets being too large or too small.
Schema Evolution: Delta Tables support schema evolution to meet evolving business requirements for adjusting data fields and improving data precision. This feature supports adding, deleting, modifying, and renaming columns, provides full backward compatibility, and prevents accidental data deletion or loss.
Overcomes the limitations of traditional tables: A single Delta Table supports operations such as INSERT INTO, UPDATE, DELETE, and MERGE INTO, along with clustering and sorting. Traditional partitioned tables, clustered tables, and Transaction Tables do not support all of these capabilities simultaneously.
Supports multi-engine computing, including MaxCompute SQL, MaxFrame, and Spark on MaxCompute. Open-source engines such as Flink, Spark, and StarRocks can also access Delta Tables through the Spark Connector and open storage APIs.

Balance performance and reliability

Delta Tables are suitable for managing massive data volumes, from terabytes to petabytes. Even with extremely large data volumes, metadata operations remain fast. Queries support features such as partition pruning, column pruning, and predicate pushdown to avoid unnecessary data scans.
ACID transaction management: Delta Tables use optimistic concurrency control to support concurrent operations from multiple writers. Write conflicts are detected and retried to ensure data consistency.
Security and compliance: Delta Tables meet data security and compliance requirements. They support storage encryption, table-level and column-level access control lists (ACLs), row-level permissions, and dynamic data masking.
Backup and rollback: The versioned backup and rollback mechanism, which uses a recycle bin mode, ensures that if data corruption or accidental deletion occurs, you can quickly roll back the table to a previous healthy state. This reduces operations and maintenance (O&M) and management risks.

SQL operations

DDL

Create an Append Delta Table

-- Create an Append Delta Table
CREATE TABLE <table_name> (
  <col_name <data_type> [NOT NULL] [DEFAULT <default_value>] [comment <col_comment>], ...   
) 
[comment <table_comment>]
[RANGE CLUSTERED BY (<col_name> [, <col_name>, ...]) ]
TBLPROPERTIES (
  "table.format.version"="2" 
  ["acid.data.retain.hours"="hours"...]
)
[LIFECYCLE <days>];

The following table describes the TBLPROPERTIES parameters.

Parameter	Required	Description	Notes
"table.format.version"="2"	Yes	Declares the table format as Delta Table.
acid.data.retain.hours	No	The default value is 24. The value range is `[24, 168]`.	The time range in hours for which historical data states can be queried using Time Travel. A value of 0 indicates that historical data states are not retained, and Time Travel queries are not supported. If a historical data state exists for longer than the value of this parameter, it can be deleted or compacted. If a Time Travel query specifies a time earlier than the value of this parameter, an error is reported. For example, if this parameter is set to 72 hours, an error is reported if you try to query a historical data state from more than 72 hours ago.
acid.incremental.query.out.of.time.range.enabled	No	The default value is `false`.	If set to true, the endTimestamp specified in an incremental query can be later than the latest commit time of the table. If the endTimestamp is later than the current time, multiple queries may return different results because new data may be inserted. You can modify the value of this parameter for a table.

Create a PK Delta Table

-- Create a PK Delta Table
CREATE TABLE <table_name> (
  <col_name <data_type> [NOT NULL] [DEFAULT <default_value>] [comment <col_comment>], ...   
  PRIMARY KEY (<pk_col_name>[, <pk_col_name2>, ...] )
) 
[comment <table_comment>]
TBLPROPERTIES (
  "table.format.version"="2" 
  [, "write.bucket.num" = "N", "acid.data.retain.hours"="hours"...]
)
[LIFECYCLE <days>];

The parameters are described as follows:

PRIMARY KEY (PK): Required. You must specify this parameter when you create a PK Delta Table. The primary key can contain one or more columns, and the combination of values in these columns must be unique within the table. The syntax follows the standard SQL primary key syntax. The primary key columns must be set to NOT NULL and cannot be modified.
After you set the primary key, data in the table is deduplicated based on the primary key columns. The unique constraint is effective within a single partition or for an entire non-partitioned table.
The following table describes the TBLPROPERTIES parameters.

Parameter	Required	Description	Notes
"table.format.version"="2"	Yes	Declares the table format as Delta Table.	Previously, PK Delta Tables were declared by setting "transactional"="true" and specifying a PRIMARY KEY. Now, you can declare them by setting "table.format.version"="2" and specifying a PRIMARY KEY.
write.bucket.num	No	The default value is 16. The value range is `(0, 4096]`.	The number of buckets for each partition or for a non-partitioned table. This also indicates the number of concurrent nodes for data writes. You can modify this parameter for partitioned tables, and the change takes effect for new partitions by default. You cannot modify this parameter for non-partitioned tables. Consider the following suggestions when you use this parameter: If you import data using Tunnel, this parameter specifies the number of concurrent Tunnel nodes. The setting affects the import traffic and is constrained by the maximum number of concurrent Tunnel nodes. If you write data using SQL, this parameter specifies the degree of parallelism for the Reducers that write the data. It is constrained by the maximum number of concurrent Reducer nodes. We recommend that the data size written to each bucket is about 500 MB. For example, if the estimated partition size is 500 GB, the number of buckets should be set to about 1,000. This makes the average size of each bucket about 500 MB. For very large tables, the 500 MB limit can be exceeded. A bucket size of about 2 GB to 3 GB is more appropriate.
acid.data.retain.hours	No	The default value is 24. The value range is `[24, 168]`.	The time range in hours for which historical data states can be queried using Time Travel. A value of 0 indicates that historical data states are not retained, and Time Travel queries are not supported. If a historical data state exists for longer than the value of this parameter, it can be deleted or compacted. If a Time Travel query specifies a time earlier than the value of this parameter, an error is reported. For example, if this parameter is set to 72 hours, an error is reported if you try to query a historical data state from more than 72 hours ago.
acid.incremental.query.out.of.time.range.enabled	No	The default value is `false`.	If set to true, the endTimestamp specified in an incremental query can be later than the latest commit time of the table. If the endTimestamp is later than the current time, multiple queries may return different results because new data may be inserted. You can modify the value of this parameter for a table.
acid.write.precombine.field	No	You can specify the name of one column.	If a column name is specified, the system deduplicates data based on the primary key (PK) columns and the specified column during file processing for the same commit. This ensures data uniqueness and consistency. Note If the data volume of a single commit exceeds 128 MB, multiple files are generated. This parameter does not apply to multiple files.
acid.partial.fields.update.enable	No	When set to `true`, you can use SQL or Tunnel to perform partial column updates on a Delta Table.	Set this parameter when you create the table. You cannot modify it after the table is created.

Notes

Item	Append Delta Table	PK Delta Table	Clustered table
Number of buckets	You do not need to specify write.bucket.num. The number of buckets changes dynamically based on the actual data volume.	You must specify the number of buckets in the DDL statement. The default value is 16.	/
Data organization policy	RANGE CLUSTERED BY. CLUSTERED BY is not supported. You do not need to specify a SORT BY field. Data within a bucket is sorted by default based on the fields specified in RANGE CLUSTERED BY.	You cannot set CLUSTERED BY. By default, a hash cluster is created based on the primary key.	CLUSTERED BY
Lifecycle	Must be greater than or equal to the Time Travel query lifecycle. That is, `lifecycle >= acid.data.retain.hours / 24`. This is checked when the table is created. An error is reported if the condition is not met.	/	/

You cannot directly convert an existing standard table into a Delta Table.
PK Delta Tables do not support schema evolution for primary key (PK) columns.
PK Delta Tables do not currently support the JSON data type.
CREATE TABLE AS is not supported.

DML

Delta Tables support Data Manipulation Language (DML) syntax such as Insert or overwrite data (INSERT INTO | INSERT OVERWRITE), Update or delete data (UPDATE | DELETE), and MERGE INTO.

DQL

Delta Tables support general-purpose query analysis. For more information, see DQL operations (SELECT).

Data import

Append Delta Tables support data import using batch data upload (Upload) and stream data upload (Stream Upload). Because Append Delta Tables do not have a primary key, data transmission does not support Upsert or Delete operations.
PK Delta Tables support writing data using the Tunnel Upsert/Delete API. If a primary key does not exist, the Upsert operation inserts a new row. If a primary key already exists, the Upsert operation updates the corresponding non-primary key fields.

Data organization optimization

An Append Delta Table uses an underlying Range Clustering structure for data organization. For more information about this optimization, see Optimizing data organization for Append Delta Tables. By default, Row_ID is used as the clustering key, and the number of buckets is dynamically allocated as the data volume grows. After you specify a Cluster Key, a background clustering job performs incremental reclustering on the data to maintain its overall order.
For information about the data organization structure of a PK Delta Table, see Data organization optimization for PK Delta tables. The table uses an underlying Hash Clustering structure to enable efficient data writes and updates by hash-bucketing the primary key (PK) field.