Letting the Executor "See" Data Locations: Lizard Physical Addressing Optimization

This article introduces how PolarDB-X's physical addressing optimization lets the executor directly access data locations to eliminate redundant B+ tree traversals.

By Yuchen

Background

Alibaba Cloud PolarDB for Xscale (PolarDB-X) uses a shared-nothing architecture. It competes by being compatible with the open source MySQL ecosystem. It supports both centralized and distributed deployment modes. It features financial-grade high availability, reliability, and distributed scalability. As a database fully compatible with the MySQL ecosystem, the Data Node (DN) optimizes the InnoDB storage structure. This greatly improves data access efficiency. This topic outlines the technical details of executor optimization in the DN engine.

Before analyzing specific executor optimization policies, review the basic data positioning mechanism in the InnoDB storage engine. This is the foundation of all performance optimizations.

Data Positioning

In the InnoDB storage engine, user-defined tables rely entirely on B+ tree indexes. There are two types of B+ tree indexes:

• Clustered index: This corresponds to the primary key B+ tree. Leaf nodes store complete row records.

• Secondary index: This corresponds to the B+ tree of normal columns. Leaf nodes store only the secondary index key values and primary key values.

These two indexes play different roles in data retrieval. They determine different query paths and performance features.

To understand performance differences, examine the behavior of these indexes in query paths. Start with the clustered index positioning flow. This path is the shortest and most efficient.

Clustered Index Positioning Process

Clustered index retrieval is a top-down process:

First, use binary search in non-leaf nodes to locate the target primary key.
Continue traversing downward based on pointers within the node.
Because non-leaf nodes serve only as index paths and do not store actual row data, the search continues to the leaf nodes.
Finally, perform a binary search in the leaf node to obtain the complete row record data.

This process requires only one B+ tree traversal to get the full data row. Therefore, clustered index query efficiency is high.

Secondary Index Table Lookup Issues

However, querying with a secondary index is more complex. Because leaf nodes of secondary indexes do not contain complete data rows and store only index key values and primary key values, using a secondary index usually requires a table lookup.

Take the query statement select * from t1 where k > 1 as an example. The complete query process is as follows:

Secondary index scan: Selecting secondary index k for filtering avoids a full table scan. First, search the B+ tree of the secondary index from the root node to the leaf node.
Obtain primary key value: After scanning records where k > 1, fetch the primary key value from the leaf node of the secondary index.
Table lookup: Use the obtained primary key value to perform another B+ tree search on the clustered index. This retrieves the complete row data.

This is a typical scenario of "secondary index table lookup". Because the secondary index stores incomplete information, the system must first scan the secondary index B+ tree to locate records. Then, it scans the clustered index B+ tree using the primary key value to obtain complete data.

Compared to the single traversal of a clustered index, a secondary index query requires two B+ tree traversals. This increases query overhead. The performance impact is significant when many records require table lookups.

Optimization Scheme of the Lizard Transaction System

To address these issues, the Lizard transaction system introduces system columns. It maintains the page number (page_no) of the corresponding clustered index record within the secondary index record. This implements the Guess Primary PageNo (GPP) scheme and significantly reduces the overhead of secondary index table lookups.

The process is as follows:

Locate the record in the secondary index B+ tree. Then, obtain its gpp_no (guessed primary key page number).
Access the physical page directly. Attempt to retrieve the corresponding primary key record using a binary search.
If the page_no is valid, the system skips a full scan of the clustered index B+ tree.

Using the GPP scheme, secondary index table lookups are optimized from "secondary index B+ tree traversal + clustered index B+ tree traversal" to "secondary index B+ tree traversal + direct page access". This optimization reduces the complexity of table lookup operations from O(log N) based on tree height to O(1) for direct access. It significantly reduces disk I/O counts and CPU overhead. Performance improves notably in batch table lookup scenarios. This effectively resolves the performance bottlenecks of traditional secondary index table lookups.

Executor Optimization

The GPP scheme of the Lizard transaction system enables direct access to secondary index physical pages. However, the upper-layer SQL executor must use this physical location information efficiently. This requires optimizing the interaction between the SQL layer and the InnoDB storage tier. The following section compares "logical addressing" and "physical addressing." It explains how the Lizard execution engine eliminates redundant overhead during interactions.

Traditional mode (logical addressing)

In the traditional MySQL executor architecture, the SQL layer and the InnoDB layer interact using "logical identifiers." InnoDB implements these identifiers as follows:

• Clustered index primary key: If a table has a primary key, the system uses the primary key as the logical identifier.

• Implicit RowID: If a table lacks an explicit primary key, the system uses an internal 6-byte globally incrementing ID generated by InnoDB as the implicit primary key.

Typical index query interaction logic works as follows:

Data positioning: The SQL layer executor invokes APIs such as index_read to query the index. InnoDB navigates the index B+ tree to locate the leaf node. It then extracts the logical identifier (Primary key / RowID) for the record.
Logical identifier pass-through: InnoDB passes the extracted logical identifiers to the SQL layer executor. The executor stores these identifiers in a memory buffer or spills them to a temporary file if needed. It then performs operations such as sorting, intersection, and union.
Logical addressing: The SQL layer passes the logical identifiers back to InnoDB. Upon receiving them, InnoDB must perform another full scan of the clustered index B+ tree. It starts from the root node and traverses to the leaf node using in-page binary searches and potential disk I/O. This retrieves the complete row data.

The figure shows that traditional logical addressing is unaware of the physical distribution of data. Therefore, two complete B+ tree scans are required.

Decoupling and cost

MySQL was designed with a pluggable storage engine architecture. The SQL layer and the storage engine interact using the abstract handler API. In this architecture, the executor is unaware of underlying physical storage details. It uses logical identifiers for addressing. This decouples the SQL layer from the storage engine layer and provides flexibility. A single database instance can use different storage engines as needed. Different storage engines can also evolve and optimize independently without requiring changes to the SQL layer code.

However, this decoupling comes with performance costs:

For clustered indexes, during the data location phase, InnoDB scans the B+ tree once to get the logical identifier. It also gets the exact physical location of the record. However, this physical information is not passed through to the SQL layer. As a result, the B+ tree scan must run again during the second interaction between SQL and InnoDB.
For secondary indexes, the executor must first get the primary key value from the secondary index. It passes this value to the SQL layer for sorting or set operations. Then, it sends the primary key value back to the storage engine to retrieve data from the table. During this process, the storage engine cannot use physical location information for optimization. Each table lookup requires a full B+ tree traversal starting from the root node of the clustered index.

Optimization Pattern (Physical Addressing)

To address performance bottlenecks caused by "logical addressing" in the traditional mode, the PolarDB-X storage engine team delved into the kernel. They fully refactored and optimized everything from underlying interaction mechanisms to physical storage access paths:

1. Data positioning: In the data positioning interaction logic, InnoDB still needs to perform a complete B+ tree traversal. It selects a secondary index or a clustered index for the query based on conditions. However, when locating data at the leaf node, we pass through more than just the logical identifier to the SQL layer. We pass logical information (Primary Key / RowID) plus physical information (page_no).

• For clustered indexes, we return the Primary Key / RowID of the record plus the page_no of the physical page where the record is located.

• For secondary indexes, we use our existing GPP (Guess Primary Pageno) capability. We return the page_no of the physical page containing the clustered index record that corresponds to the secondary index record.

2. Logical + physical information pass-through: To let the SQL layer perceive the physical engine, we defined a reg struct. This struct encapsulates the logical and physical information passed through from the InnoDB layer. During SQL layer processing, we still use logical information for comparison. Physical information is not involved in comparison.

3. Physical addressing: After processing the SQL layer logic, the reg struct carries logical and physical information back to the InnoDB layer. InnoDB uses the physical information for physical addressing to access the physical page directly. This avoids the second complete B+ tree scan process.

Throughout the optimization process, we did not change the original interaction logic between the MySQL SQL layer and the InnoDB layer. Instead, we passed through physical information during the interaction. This lets the SQL layer perceive the underlying physical distribution. Carrying this physical information during the second interaction reduces one B+ tree scan. This significantly improves query performance.

Performance

Based on the discussion above, we selected index merge and multi range read as typical scenarios for testing. This verifies the optimization effect of physical addressing:

• In a single query, index merge uses multiple independent indexes on one table. It then merges the results from each index.

• Multi range read targets scenarios where secondary index queries require table lookups. It converts the random I/O of secondary indexes into an ordered arrangement of primary keys. This transforms random I/O into ordered I/O.

Query Type	Metric	Base	Index Merge GPP	Improvement
Intersection Merge	QPS	413968.34	481810.13	16.38%
	95% Latency	1.91	1.73	/
Union Merge	QPS	147093.55	218326.52	48.43%
	95% Latency	5.09	3.68	/
Sort-Union Merge	QPS	1799.21	3520.24	95.64%
	95% Latency	390.30	227.40	/
MRR	QPS	183932.70	263934.09	43.49%
	95% Latency	4.49	3.02	/

Testing results show that Lizard's optimization on the executor provides physical addressing as a new interaction model. This yields significant performance improvements.

Community

Letting the Executor "See" Data Locations: Lizard Physical Addressing Optimization

Background

Data Positioning

Clustered Index Positioning Process

Secondary Index Table Lookup Issues

Optimization Scheme of the Lizard Transaction System

Executor Optimization

Traditional mode (logical addressing)

Decoupling and cost

Optimization Pattern (Physical Addressing)

Performance

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

Database for FinTech Solution

Database Migration Solution

Oracle Database Migration Solution

Database Gateway