A Delta Table is a high-performance table format in MaxCompute designed for large-scale analytical datasets. It includes two types: the Append Delta Table for tables without a primary key and the PK Delta Table for tables with a primary key. This topic describes the basic features and operations of Delta Tables.
Overview
A Delta Table is a high-performance table format in Alibaba Cloud MaxCompute. It supports features such as atomicity, consistency, isolation, and durability (ACID) transactions, incremental queries, time travel, dynamic cluster bucketing, real-time data updates, and schema evolution. The native data lakehouse and near-real-time computing capabilities in MaxCompute allow you to use standard SQL to create, update, and query Delta Tables. You do not need to manage complex underlying storage and metadata. MaxCompute automatically maintains and optimizes them to provide a balance between ease of use and cost-effectiveness.
Features
Category | Feature | Append Delta Table | PK Delta Table |
Basic DML | Insert, Update, Delete, and Merge Into. | Supported | Support |
ACID transactions | Read Committed/Snapshot Isolation. For more information, see ACID transaction management. | Supported | Supported |
Primary key | Defines a primary key. | Not supported | Supports partial column updates |
Schema evolution | Add, delete, rename, reorder, and change the data type of columns. For more information, see ALTER TABLE. | Supported | Supported |
Data import |
| Stream/Batch Upload Data is visible immediately after a stream upload. | Upsert |
Time travel | Supports querying historical snapshots by time point or version number to reproduce results or perform audit analysis. For more information, see Time Travel. | Supported | Supported |
Incremental computing | Delta Live Materialized View (MV) and Incremental Read. For more information, see Incremental computing and Incremental query. | Supported (Incremental materialized view is being adapted.) | Supported |
Data organization optimization | Automatically maintains incremental data files, including optimizations such as small file merging, multi-level COMPACTION, and data sorting, keeping data storage and computation in a stable and efficient state. For more information, see Append Delta Table data organization optimization and PK Delta Table data organization optimization. | Supported You do not need to configure the number of buckets. Dynamic Bucketing automatically adapts to the data distribution. | Supported |
Query performance optimization | Partition-level and file-level statistics (such as Min/Max), partition pruning, column pruning, and predicate pushdown. | Supported | Supported |
Security and compliance | Storage Encryption / Dynamic Data Masking / Row-level Access Control, etc. | Supported | Supported |
Disaster recovery and backup | Introduction to table snapshots (Invitational preview) / Local backup in recycle bin mode / Zone-disaster recovery. | Support | Support |
Cost | AliORC column-store compression and tiered storage. | Supported | Support |
User experience
Real-time data updates for real-time services
Delta Tables support real-time data writes and updates (upserts) through Stream Upload. Data becomes visible immediately after it is written. To balance the real-time performance of data writes with query performance, MaxCompute uses a tiered storage and optimization strategy:
Ensure real-time writes: New data is first written quickly to non-clustered buckets without being sorted. This process ensures low latency and high throughput for data writes.
Improve SQL query performance: The background Incremental Reclustering service asynchronously reorganizes and optimizes incremental data into sorted, clustered buckets. When you run a query, the query engine can efficiently prune the sorted base data and scan only a small amount of incremental data. This approach balances data freshness with query efficiency.
Efficient incremental data processing and analysis
Built on its underlying incremental data read and write capabilities, MaxCompute provides a rich set of high-level features to improve the timeliness of end-to-end data analytics. You can combine advanced features, such as incremental computation and Dynamic Materialized View (Delta Live MV) (Invitational Preview), to build efficient real-time data processing pipelines and accelerate the transformation of data into business insights.
Adapt to business growth and overcome the limits of previous table formats
Dynamic bucket allocation: Append Delta Tables support dynamic bucket allocation. You do not need to specify the number of buckets in the Data Definition Language (DDL) statement. You also do not need to estimate the future data volume of each partition to determine a suitable number of buckets. As you write more data, the Dynamic Bucketing service automatically splits existing buckets or creates new ones. This feature dynamically adapts to changes in business data volume and resolves issues, such as data skew and fragmentation, that are caused by buckets being too large or too small.
Schema Evolution: Delta Tables support schema evolution to meet evolving business requirements for adjusting data fields and improving data precision. This feature supports adding, deleting, modifying, and renaming columns, provides full backward compatibility, and prevents accidental data deletion or loss.
Overcomes the limitations of traditional tables: A single Delta Table supports operations such as INSERT INTO, UPDATE, DELETE, and MERGE INTO, along with clustering and sorting. Traditional partitioned tables, clustered tables, and Transaction Tables do not support all of these capabilities simultaneously.
Supports multi-engine computing, including MaxCompute SQL, MaxFrame, and Spark on MaxCompute. Open-source engines such as Flink, Spark, and StarRocks can also access Delta Tables through the connector and open storage APIs.
Balance performance and reliability
Delta Tables are suitable for managing massive data volumes, from terabytes to petabytes. Even with extremely large data volumes, metadata operations remain fast. Queries support features such as partition pruning, column pruning, and predicate pushdown to avoid unnecessary data scans.
ACID transaction management: Delta Tables use optimistic concurrency control to support concurrent operations from multiple writers. Write conflicts are detected and retried to ensure data consistency.
Security and compliance: Delta Tables meet data security and compliance requirements. They support storage encryption, table-level and column-level access control lists (ACLs), row-level permissions, and dynamic data masking.
Backup and rollback: The versioned backup and rollback mechanism, which uses a recycle bin mode, ensures that if data corruption or accidental deletion occurs, you can quickly roll back the table to a previous healthy state. This reduces operations and maintenance (O&M) and management risks.
SQL operations
DDL
Create an Append Delta Table
-- Create an Append Delta Table
CREATE TABLE <table_name> (
<col_name <data_type> [NOT NULL] [DEFAULT <default_value>] [comment <col_comment>], ...
)
[comment <table_comment>]
[RANGE CLUSTERED BY (<col_name> [, <col_name>, ...]) ]
TBLPROPERTIES (
"table.format.version"="2"
["acid.data.retain.hours"="hours"...]
)
[LIFECYCLE <days>];The following table describes the TBLPROPERTIES parameters.
Parameter | Required | Description | Notes |
"table.format.version"="2" | Yes | Declares the table format as Delta Table. | |
acid.data.retain.hours | No | The default value is 24. The value range is | The time range in hours for which historical data states can be queried using Time Travel.
|
acid.incremental.query.out.of.time.range.enabled | No | The default value is | If set to true, the endTimestamp specified in an incremental query can be later than the latest commit time of the table. If the endTimestamp is later than the current time, multiple queries may return different results because new data may be inserted. You can modify the value of this parameter for a table. |
Create a PK Delta Table
-- Create a PK Delta Table
CREATE TABLE <table_name> (
<col_name <data_type> [NOT NULL] [DEFAULT <default_value>] [comment <col_comment>], ...
PRIMARY KEY (<pk_col_name>[, <pk_col_name2>, ...] )
)
[comment <table_comment>]
TBLPROPERTIES (
"table.format.version"="2"
[, "write.bucket.num" = "N", "acid.data.retain.hours"="hours"...]
)
[LIFECYCLE <days>];The parameters are described as follows:
PRIMARY KEY (PK): Required. You must specify this parameter when you create a PK Delta Table. The primary key can contain one or more columns, and the combination of values in these columns must be unique within the table. The syntax follows the standard SQL primary key syntax. The primary key columns must be set to NOT NULL and cannot be modified.
After you set the primary key, data in the table is deduplicated based on the primary key columns. The unique constraint is effective within a single partition or for an entire non-partitioned table.
The following table describes the TBLPROPERTIES parameters.
Parameter | Required | Description | Notes |
"table.format.version"="2" | Yes | Declares the table format as Delta Table. |
|
write.bucket.num | No | The default value is 16. The value range is | The number of buckets for each partition or for a non-partitioned table. This also indicates the number of concurrent nodes for data writes. You can modify this parameter for partitioned tables, and the change takes effect for new partitions by default. You cannot modify this parameter for non-partitioned tables. Consider the following suggestions when you use this parameter:
|
acid.data.retain.hours | No | The default value is 24. The value range is | The time range in hours for which historical data states can be queried using Time Travel.
|
acid.incremental.query.out.of.time.range.enabled | No | The default value is | If set to true, the endTimestamp specified in an incremental query can be later than the latest commit time of the table. If the endTimestamp is later than the current time, multiple queries may return different results because new data may be inserted. You can modify the value of this parameter for a table. |
acid.write.precombine.field | No | You can specify the name of one column. | If a column name is specified, the system deduplicates data based on the primary key (PK) columns and the specified column during file processing for the same commit. This ensures data uniqueness and consistency. Note If the data volume of a single commit exceeds 128 MB, multiple files are generated. This parameter does not apply to multiple files. |
acid.partial.fields.update.enable | No | When set to | Set this parameter when you create the table. You cannot modify it after the table is created. |
Notes
Item | Append Delta Table | PK Delta Table | Clustered table |
Number of buckets | You do not need to specify write.bucket.num. The number of buckets changes dynamically based on the actual data volume. | You must specify the number of buckets in the DDL statement. The default value is 16. | / |
Data organization policy | RANGE CLUSTERED BY. CLUSTERED BY is not supported. You do not need to specify a SORT BY field. Data within a bucket is sorted by default based on the fields specified in RANGE CLUSTERED BY. | You cannot set CLUSTERED BY. By default, a hash cluster is created based on the primary key. | CLUSTERED BY |
Lifecycle | Must be greater than or equal to the Time Travel query lifecycle. That is, | / | / |
You cannot directly convert an existing standard table into a Delta Table.
PK Delta Tables do not support schema evolution for primary key (PK) columns.
PK Delta Tables do not currently support the JSON data type.
CREATE TABLE AS is not supported.
DML
Delta Tables support Data Manipulation Language (DML) syntax such as Insert or overwrite data (INSERT INTO | INSERT OVERWRITE), Update or delete data (UPDATE | DELETE), and MERGE INTO.
DQL
Delta Tables support general-purpose query analysis. For more information, see DQL operations (SELECT).
Data import
Append Delta Tables support data import using batch data upload (Upload) and stream data upload (Stream Upload). Because Append Delta Tables do not have a primary key, data transmission does not support Upsert or Delete operations.
PK Delta Tables support writing data using the Tunnel Upsert/Delete API. If a primary key does not exist, the Upsert operation inserts a new row. If a primary key already exists, the Upsert operation updates the corresponding non-primary key fields.
Data organization optimization
An Append Delta Table uses an underlying Range Clustering structure for data organization. For more information about this optimization, see Append Delta Table data organization optimization. By default, Row_ID is used as the clustering key, and the number of buckets is dynamically allocated as the data volume grows. After you specify a Cluster Key, a background clustering job performs incremental reclustering on the data to maintain its overall order.
For information about the data organization structure of a PK Delta Table, see PK Delta Table data organization optimization. The table uses an underlying Hash Clustering structure to enable efficient data writes and updates by hash-bucketing the primary key (PK) field.