The Beam storage engine is designed with row-oriented Delta storage and PAX-based column-oriented Base storage to handle online transaction processing (OLTP) and online analytical processing (OLAP) workloads. OLTP involves high-concurrency reads and writes. OLAP involves batch writes and large-scale scanning.
Beam is an in-house next-generation storage engine that is developed based on table access methods of PostgreSQL 12 for AnalyticDB for PostgreSQL.
The Beam storage engine consists of two parts:
A row-oriented Delta storage that handles real-time writes.
A PAX-based column-oriented Base storage that handles batch writes and large-scale scanning.
Compared with row-oriented heap tables, Beam tables require less disk I/O operations and have much better query performance in analysis scenarios. The Beam storage engine supports primary keys, write deduplication, and concurrent update and delete operations. You can use Data Transmission Service (DTS) to synchronize data to Beam tables. The Beam storage engine allows you to simultaneously handle OLTP and OLAP workloads by using a single copy of storage. This suits different business scenarios without the need to synchronize data between row-oriented and column-oriented engines.
Usage notes
Only AnalyticDB for PostgreSQL V7.0.x instances in elastic storage mode support the Beam storage engine.
The Beam storage engine is officially available after the public preview is complete on AnalyticDB for PostgreSQL V7.0.6.2 in elastic storage mode. This version fixes the issues that occur during the public preview. We recommend that you update your AnalyticDB for PostgreSQL instance to V7.0.6.2 or later.
Features
High-performance real-time writes
The Beam storage engine consists of a row-oriented Delta storage and a PAX-based column-oriented Base storage. When data is written, an appropriate storage mode is selected based on the method of data writing. If you use a real-time streaming write method such as INSERT INTO VALUES, data is written to the row-oriented Delta storage to achieve real-time write performance that can match row-oriented heap tables.
High-throughput batch import
If you use a batch processing method such as the COPY or INSERT INTO SELECT statement, data is written to the column-oriented Base storage to achieve higher throughput and write performance.
High-performance OLAP query
The Beam storage engine uses the following methods to optimize query performance:
Column pruning
Multiple compression algorithms
Zone map filtering
I/O prefetch
This greatly decreases the requirements for disk I/O operations, increases the I/O utilization, and improves query performance.
Primary key and write deduplication
The Beam storage engine supports the primary key feature of PostgreSQL and allows you to build primary key indexes for Beam tables. This is similar to how heap tables implement data deduplication. In addition, the Beam storage engine supports the UPSERT syntax.
DTS synchronization
By using primary keys and DTS, you can synchronize data to Beam tables in the same manner as you do to heap tables. You can set the destination table to a Beam table in a DTS task to achieve better query performance without the need to synchronize data between row-oriented and column-oriented tables.
References
For information about how to use the Beam storage engine, see Beam usage.
The Beam storage engine supports multiple compression algorithms. The dictionary encoding feature can compress string data to integer data to improve storage efficiency and accelerate filter-based and aggregate-based queries. For more information, see Dictionary encoding.
If you frequently perform range queries or equivalence filtering on one or more columns, you can specify a Beam sort key to improve query performance. For more information, see Beam sorting optimization (V7.0).