All Products
Search
Document Center

MaxCompute:Manage transactions

Last Updated:Dec 25, 2023

All data modification operations on Delta transactional tables (DTTs) are managed by MetaService as transactions in a unified manner to ensure atomicity, consistency, isolation, durability (ACID) semantics of DTTs. The multiversion concurrency control (MVCC) model is used to ensure read/write snapshot isolation, and the optimistic concurrent control (OCC) model is used to control optimistic transaction concurrency.

Conflict detection rules

The following table describes the rules that are used to handle the conflicts between concurrent jobs on the same non-partitioned table or partition.

Job type

INSERT OVERWRITE/TRUNCATE

(job that ends later)

INSERT INTO

(job that ends later)

UPDATE/DELETE

(job that ends later)

MINOR COMPACT

(job that ends later)

MAJOR COMPACT

(job that ends later)

INSERT

OVERWRITE/TRUNCAT (job that ends earlier)

Both jobs succeed.

The result data of the job that ends later overwrites that of the job that ends earlier.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

INSERT INTO

(job that ends earlier)

Both jobs succeed.

The result data of the job that ends later overwrites that of the job that ends earlier.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

Both jobs succeed.

The compaction operation of the job that ends later succeeds.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

UPDATE/DELETE

(job that ends earlier)

Both jobs succeed.

The result data of the job that ends later overwrites that of the job that ends earlier.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

Both jobs succeed.

The compaction operation of the job that ends later succeeds.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

MINOR

COMPACT

(job that ends earlier)

Both jobs succeed.

The result data of the job that ends later overwrites that of the job that ends earlier.

Both jobs succeed.

The result data of the job that ends earlier does not affect the job that ends later.

Both jobs succeed.

The result data of the job that ends earlier does not affect the job that ends later.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

Both jobs succeed.

The compaction operation of the job that ends later succeeds.

MAJOR

COMPACT

(job that ends earlier)

Both jobs succeed.

The result data of the job that ends later overwrites that of the job that ends earlier.

Both jobs succeed.

The result data of the job that ends earlier does not affect the job that ends later.

Both jobs succeed.

The result data of the job that ends earlier does not affect the job that ends later.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

The job that ends earlier succeeds.

An error is returned for the job that ends later.

Concurrency conflict optimization

The preceding conflict detection rules do not apply to concurrent operations at the row or file level. A sequence of data processing operations are managed as a transaction. The transaction conflict logic of some frequently performed operations is optimized based on semantics of the operations to better support concurrency control while ensuring correctness. For example, you perform a clustering operation and an INSERT INTO operation in a transaction at the same time. The transaction does not fail even if the start time and commit time of the transaction overlap. This is because the clustering operation changes the method of organizing data but does not change the data status. The clustering operation and the INSERT INTO operation do not cause status inconsistency and can be concurrently performed. The transaction conflict logic will continue to be optimized to adapt to more scenarios.

If a conflict detection fails, a meta-level retry is performed on the premise of data correctness. Data does not need to be read or written again. This improves user experience and reduces resource consumption.

Metadata is atomically updated to ensure data consistency.

The concurrency conflict optimization applies to only single-table transactions.

Management of data file versions

Each transaction generates a batch of new data files. These data files are associated with the following transaction versions:

  • Time version: the transaction commit time of the TIMESTAMP type. A new time version is generated only for operations that are triggered by users and have logical data changes. Clustering and compaction operations only reorganize and optimize physical data, and do not add or modify data. Therefore, no new time version is generated for clustering and compaction operations. This way, when users perform incremental queries based on time versions, data files generated by clustering and compaction operations are not queried. This meets business requirements.

  • ID version: an auto-increment integer. An incremental ID version is generated for any data operation in a transaction, including clustering and compaction operations performed within the engine. The ID version is mainly used for internal transaction management. The ID version is also public to users and can be used for incremental queries.