The Xuanwu analytical storage engine provides high-reliability, high-availability, high-performance, and low-cost enterprise-class data storage capabilities. This engine provides underlying support for AnalyticDB for MySQL to implement high throughput for real-time data write and high performance for real-time query.

High-throughput real-time data write

AnalyticDB for MySQL utilizes a three-layer architecture to provide strong throughput capability. The access layer, storage node layer, and persistent distributed storage layer can be expanded in parallel. AnalyticDB for MySQL implements high throughput and high concurrency for real-time data write by using the hybrid row-column storage engine and asynchronously merging incremental data into full data.

AnalyticDB for MySQL uses the Raft consensus protocol and the apply method to synchronously write data to ensure the write consistency. This allows you to query data immediately after it is written and ensure the write consistency. The storage engine allows you to update and delete data in real time with high throughput by using the Mark-for-delete technology. The storage engine also ensures the data atomicity and integrity by using the multiversion concurrency control (MVCC) technology.

Hybrid row-column storage

Xuanwu supports hybrid row-column storage and row storage. Hybrid row-column storage is a column-based storage mode that also supports row storage, which is similar to the Optimized Row Columnar (ORC) or Parquet format in Apache Hadoop. Hybrid row-column storage of Xuanwu supports analytical column pruning, data scanning with high throughput and high performance, and row alignment. This provides good random query performance and excellent performance in scenarios that involve multidimensional index filtering.

The following figure shows the storage format of hybrid row-column storage.

Hybrid row-column storage

Adaptive indexing

In online analytical processing (OLAP) scenarios, multidimensional query must be supported. Traditional single-column or combined indexes in the online transaction processing (OLTP) scenario are insufficient to meet the requirement. Xuanwu uses adaptive indexing on columns to automatically configure the index data structure for column types such as string, number, text, JSON, and vector. Additionally, Xuanwu allows you to retrieve data by using column-level indexes from combined dimensions and multiway merges by means of progressive streaming. This greatly improves the data filtering performance.

Indexes have the following types: inverted index, BKD-tree index, and bitmap index. Index performance is affected by data distribution characteristics, including cardinality (hash degree), and number of records for range queries or table records. In some scenarios, overheads of indexing are higher than those of data scanning. Example: age > 0 and age <100. Xuanwu determines whether to select indexing or scanning based on Cost-Based Optimization (CBO).

The following figure shows how to use multiway merges for indexes of multiple types.

Adaptive indexing

Fusion of structured and unstructured indexes

The index manager of Xuanwu at the storage layer manages structured and unstructured indexes in a centralized manner. These indexes include BKD indexes of numerical values, inverted indexes of strings, unstructured JSON indexes and vector indexes, and full-text indexes of text data. A unified expression is provided for the computing layer. This makes the SQL logic of the computing layer compatible with heterogeneous data types and accelerates indexes. AnalyticDB for MySQL performs correlation analysis between full-text data and structured tables. The complex logic of SQL statements can all be supported.

After the preceding SQL statement is executed, correlation analysis is performed on the result set of full-text indexes in the subquery. The analyzed results are sorted in descending order by score, and the first 10,000 rows are exported.