An In-Depth Analysis of UNDO Logs in InnoDB

This article introduces Undo Log in InnoDB, including its role, design ideas, record content, organizational structure, and various functional implementations.

By Hanzhi

Undo Log is an important component of InnoDB. It affects the two most important parts of InnoDB: Concurrency Control and Crash Recovery. The implementation of Undo Log in InnoDB is log and data. This article will introduce Undo Log in InnoDB, including its role, design ideas, record content, organizational structure, and various functional implementations. The article will go deep into certain code implementations, but it still hopes to replace concrete code with abstract implementation ideas in detail. This article is based on MySQL 8.0, but MySQL versions are consistent in most design ideas. Considering the limited space and avoiding the interference of too much information, some content is only mentioned or intentionally omitted in this article, such as indexes, transaction systems, temporary tables, XA transactions, Virtual Columns, external records, and Blob.

1. The Role of Undo Log

Undo Log records the historical value before each modification and applies for fault recovery with Redo Log. This is the first role of Undo Log in InnoDB.

1.1 Transaction Rollback

When designing the database, we assume the database may suddenly crash at any time due to reasons, such as hardware failures, software bugs, and O&M operations. Transactions that have not been committed yet may already have some data written to disk, which would violate the atomicity guarantee if left unprocessed. Any transaction modification is committed or canceled altogether. In response to this problem, the intuitive idea is to wait until the transaction is committed before any modification is allowed, which is the No-Steal strategy. On the one hand, this approach causes a lot of memory space pressure. On the other hand, a large amount of random I/O will affect performance at the time of submission. Therefore, the database implementation usually writes the Undo Log while the normal transaction is in progress to record the historical values before this modification. When Crash occurs, you can erase the modifications of the uncommitted transaction by playing back the Undo Log during the Recovery process. InnoDB uses this approach.

Since Undo Log supports transaction rollback during Crash Recovery, naturally, during normal operation, deadlock processing or transaction rollback requested by users can also be completed using this part of data.

1.2 Multi-Version Concurrency Control (MVCC)

Almost all mainstream databases adopt the multi-version concurrency control (MVCC) method to avoid conflicts between read-only transactions and write transactions and to avoid write operations waiting for reading operations. The database saves multiple historical data for each record for reading transactions to access, and new writes only need to add new versions without waiting. InnoDB reuses the historical version data that has been recorded in Undo Log to meet the requirements of MVCC.

2. What Kind of Undo Log?

The page-based Redo Log mentioned in An In-Depth Analysis of REDO Logs in InnoDB can help support concurrent Redo applications, thus shortening the Crash Recovery time of databases. InnoDB uses Undo Log to implement MVCC, and version history data is allowed during database operation. Therefore, the transaction rollback of Undo logs during Crash Recovery can be rolled back asynchronously in the background like a normal transaction so the database can resume services first. The design idea of Undo Log is different from Redo Log. Undo Log requires concurrency between transactions and convenient multi-version data maintenance. Its replay logic does not want to change due to changes in the physical storage of the database. Undo Log in InnoDB uses the transaction-based Logical Logging method.

At the same time, more responsibilities mean more complex management logic. InnoDB maintains and uses Undo Log as a kind of data. In other words, Undo Log, like other database data, will write its corresponding Redo Log and ensure its atomicity through Redo Log. Therefore, a more appropriate name for it should be Undo Data.

3. Content in Undo Record

Whenever a Record needs to be modified in InnoDB, its historical version is written to an Undo Log. The corresponding Undo Record is of the Update type. When a new Record is inserted, there is no historical version, but an Undo Record of the Insert type is still written here to facilitate the Delete operation during transaction rollback.

3.1 Undo Record of the Insert Type

This Undo Record corresponds to the TRX_UNDO_INSERT_REC type in the code. Unlike the record of the Update type, this Undo Record is only prepared for possible transaction rollback and does not play a role in the MVCC function. Therefore, you only need to record the Key of the corresponding Record for finding the Record position during rollback.

Undo Number is an incremental number of an Undo record. Table ID is used to indicate which table is modified. The length of the following set of Key Fields is variable because the primary key of the corresponding table may consist of multiple fields. Here, you need to record the complete primary key information of the Record. When rolling back, you can use this information to locate the corresponding Record in the index. In addition, two bytes are left at the beginning and end of the Undo Record to record the positions of its predecessor and successor Undo Records.

3.2 Undo Record of the Update Type

Since MVCC needs to keep multiple historical versions of the Record, when the historical version of a Record is still in use, it cannot be deleted. Therefore, when you need to delete it, you only modify the Delete Mark of the corresponding Record. If the Record is reinserted at this time, it is just to modify the Delete Mark. It will convert the delete and insert operations of these two cases into the update operation. Together with the regular Record modification, the Undo Record of the Update type here corresponds to three types: TRX_UNDO_UPD_EXIST_REC, TRX_UNDO_DEL_MARK_REC, and TRX_UNDO_UPD_DEL_REC. Their storage content is similar:

In addition to the same header and tail information as the Insert Undo Record and the primary key Key Fileds, Update Undo Record adds something:

Transaction Id records the transaction II that generated this historical version, which is used for version visibility judgment in subsequent MVCC.
Rollptr points to the location of the previous version of the record, including space number, page number, and offset within the page. All historical versions of a Record can be found along Rollptr.
Update Fields records the Delta information of the current Record version relative to a subsequent modification, including the number, length, and historical value of all modified Fields.

4. The Organization Mode of Undo Record

The preceding part introduces the content stored in an Undo Record. Each modification will produce at least one Undo Record. How can so many Undo Records be organized to support efficient access and management? This part will introduce this from several levels. First, we will introduce the logical organization mode without considering physical storage. Then, we will introduce how to store records in the actual 16-KB physical block in the physical organization section. Then, the file organization method section introduces the overall file structure. Finally, we will introduce how they are organized in memory.

4.1 Logical Organization Mode – Undo Log

Each transaction will modify a set of Records, and a corresponding set of Undo Records will be generated. These Undo Records that connect one by one will form the Undo Log of this transaction. In addition to the Undo Records, an Undo Log Header is added at the beginning to record necessary control information. The following figure shows the structure of an Undo Log:

The Trx ID of the transaction that generated this Undo Log is recorded in the Undo Log Header. Trx No is the commitment order of the transaction, and this will also be used to judge whether Purge can be performed, which will be described later. Delete Mark indicates whether there is an Undo Record of the TRX_UNDO_DEL_MARK_REC type in the Undo Log to avoid unnecessary scanning during Purge. The end position of the Undo Log Header is recorded in the Log Start Offset to facilitate compatibility when adding content to the Header later. This is followed by flag information. Next, Undo Log and Prev Undo Log label the two Undo Logs before and after, which will be introduced later. Finally, mount to the History List prepared for the Purge by History List Node.

If different transactions modify the same record in the index, different historical versions will be generated. These historical versions are used to form a linked list through Rollptr for MVCC to use. It is shown in the following figure:

In the example, three transactions operate on table t, and the primary key id records 1. First, transaction I inserts this record and sets the value of field a to A. Then, transactions J and K modify the value of field a in this record with an id of 1 to B and C, respectively. Transactions I, J, and K have their own logically consecutive Undo logs, and each Undo log has its own Undo Log Header. From this Record in the index, you can find the historical versions of this record in the Undo Log of these three transactions along Rollptr. At the same time, only the corresponding primary key value (id=1) is recorded in the Undo Record of the Insert type, while the Undo Record of the Update type also records the corresponding historical version of the generated transaction Trx_id and the modified historical value of field a.

4.2 Physical Organization Mode – Undo Segment

The structure of an Undo Log is described before. We cannot control the size of an Undo Log that a transaction generates, but the final write to disk is based on a fixed block size. In InnoDB, the block size is 16 KB by default. So, how can we use a fixed block size to carry an indefinite Undo Log to gain efficient space allocation and reuse to avoid space waste? The basic idea of InnoDB is to make multiple smaller Undo Logs compact in one Undo Page, while for larger Undo Logs, Undo Pages are allocated as needed to distribute the load. Now, take a look at the physical storage:

The preceding figure shows the schematic diagram of an Undo Segment. An Undo Segment needs to be held whenever a write transaction is started. The allocation and release of all disk space in an Undo Segment (the application and release of a 16-KB Page) are managed by an FSP Segment. This is consistent with the management of the Leaf Node Segment and Non-Leaf Node Segment in the index. A separate article will introduce this topic later.

Undo Segment will hold at least one Undo Page, and each Undo Page will record the Undo Page Header from the 38th byte to the 56th byte, where the type of Undo Page is recorded, and the position of the last Undo Record and the beginning of the idle part of the current Page (the position to be written by the next Undo Record). The first Undo Page in the Undo Segment will also record Undo Segment Header from the 56th byte to the 86th byte, which is the Handle of disk space managed in the Undo Segment. The state of the Undo Segment (such as TRX_UNDO_CACHED and TRX_UNDO_TO_PURGE), the position of the last Undo Record in this Undo Segment, the Header of this FSP Segment, and a linked list of all Undo Pages currently allocated are recorded.

The remaining space of Undo Page is used to store Undo Logs. For shorter Undo Logs (such as Undo Log 1 and Undo Log 2 in the preceding figure), InnoDB will reuse Undo Page to store multiple Undo Logs to avoid wasting space of the Page. For longer Undo Logs (such as Undo Log 3), multiple Undo Pages may be allocated for storage. Note: The reuse of Undo Page only occurs on the first Page.

4.3 File Organization Mode – Undo Tablespace

At every moment, one Undo Segment is exclusive to one transaction. Each write transaction holds at least one Undo Segment. When many write transactions are concurrently running, multiple Undo Segments are required. The Undo file in InnoDB prepares many slots for Undo Segments, and every 1,024 slots form a Rollback. Each Undo Tablespace will contain up to 128 Rollback Segments. The third Page in the Undo Tablespace file will be fixed as the directory of these 128 Rollback Segments (the Rollback Segment Array Header), in which up to 128 pointers point to the Page where each Rollback Segment Header is located. The Rollback Segment Header is allocated on demand, which contains 1,024 Slots. Each Slot occupies four bytes and points to the First Page of one Undo Segment. In addition, the History List of committed transactions in the Rollback Segment will be recorded. The subsequent Purge process will start the recycling work from here in sequence.

The number of Rollback Segments affects the maximum transaction parallelism supported by InnoDB. MySQL 8.0 supports up to 127 independent Undo Tablespaces. On the one hand, it avoids the expansion of ibdata1 and facilitates undo space reclamation. On the other hand, it increases the maximum number of Rollback Segments and concurrent write transactions that can be supported. It is shown in the following figure:

4.4 Memory Organization Structure

The preceding introduction is the organizational structure of Undo data on the disk. Except for that, the corresponding data structure will be maintained in memory to manage Undo Log, as shown in the following figure:

There will be an undo::Tablespace memory structure for each Undo Tablespace on the disk, the most important of which is a set of trx_rseg_t, which corresponds to a Rollback Segment Header described before. In addition to basic meta-information, trx_rseg_t maintains four trx_undo_t linked lists, and the Update List contains the Undo Segment being used to write Undo records of the Update type. There is more free space in the Update Cache List. Subsequent transactions can reuse Undo Segments of the Update type. Correspondingly, Insert List and Insert Cache List are Undo Segments that are in use and Undo Segments that have more space and can be reused later of the Insert type. trx_undo_t corresponds to the Undo Segment described before. Next, we will introduce the roles and functions of Undo in InnoDB from the aspects of Undo writing, Undo for Rollback, MVCC, Crash Recovery, and Undo cleaning.

5. Undo Writing

When a write transaction starts, a Rollback Segment will be allocated through the trx_assign_rseg_durable. The memory structure trx_t of the transaction will also point to the corresponding trx_rseg_t memory structure through the rsegs pointer. The allocation strategy here is simple. Try the next Active Rollback Segment in turn. After that, when the first real modification needs to be written to Undo Record, trx_undo_assign_undo will be called to obtain an Undo Segment. Here, trx_undo_t in Cached List on trx_rseg_t will be reused first. In other words, Undo Segment has been allocated but is not being used. If not, trx_undo_create will be called to create a new Undo Segment. In the trx_undo_create, the Slot available in the current Rollback Segment will be polled to select, which is also the Slot with the value FIL_NUL. Then, apply for a new Undo Page, initialize information (such as Undo Page Header and Undo Segment Header), create a new trx_undo_t memory structure, and add it to the corresponding list of trx_rseg_t.

After the available Undo Segment is obtained, the transaction will initialize its own Undo Log Header at the appropriate location. After that, the Undo Records generated by all its modifications will be written in the current Undo Log through the trx_undo_report_row_operation sequence. It will call the trx_undo_page_report_insert or trx_undo_page_report_modify, respectively, according to whether it is of the Insert or Update type. We introduced the specific Undo Record content at the beginning of this article. Simply put, the insert type records the primary key of the inserted Record. In addition to recording the primary key, the Update type has update fields to record the diff between this historical value and index value. Rollptr that points to the current Undo Record position is returned on the Record written to the index.

When a Page is full, trx_undo_add_page will be called to add a new Page to the current Undo Segment. After the new Page is written to the Undo Page Header, it will continue to be used for transactions to write to the Undo Record. There is a restriction that a single Undo Record does not span the page for maintenance convenience. If the current Page cannot be put down, the entire Undo Record will be written to the next Page.

After the transaction ends (commit or rollback), if only one Undo Page is occupied and the current Undo Page usage space is less than 75% of the page, the Undo Segment is retained and added to the corresponding insert/update cached list. Otherwise, the Undo Segment of the insert type is recycled, while the Undo Segment of the Update type is recycled after the Purge in the background is finished. According to different situations, the state in the Undo Segment Header will be changed from TRX_UNDO_ACTIVE to TRX_UNDO_TO_FREE, TRX_UNDO_TO_PURGE, or TRX_UNDO_CACHED. This modification is a sign of the end of the transaction of InnoDB. Whether it is Rollback or Commit, the user result can be returned after the corresponding Redo of this modification is written to the disk, and no rollback processing will be performed after Crash Recovery.

6. Undo for Rollback

The user may actively trigger Rollback in the transactions of InnoDB. It may also be Rollback due to deadlock exceptions. If Crash occurs, uncommitted transactions will be rolled back after restart. These rollback operations are consistent at the Undo level. The basic process is to read the Undo Record backward from the Undo Log of the transaction and perform reverse operations according to the contents to restore the index record.

The rollback entry is the row_undo function, in which the trx_roll_pop_top_rec_of_trx is first called to obtain and delete the last Undo Record of the transaction. The Undo Log in the following example includes three Undo Records, of which Record 1 is in Undo Page 1, and Records 2 and 3 are in Undo Page 2. First, find the Header of the last Undo Page of the current transaction through the Page List recorded in the Undo Segment Header. Next, locate the position of the end of the last Undo Record according to the Free Space Offset recorded on the Header of Undo Page 2. During actual running, these two values are cached in the top_page_no and top_offset of trx_undo_t. Use the Prev Record Offset to find Undo Record 3. After the corresponding rollback operation is completed, find the previous Undo Record through the preceding pointer Prev Record Offset and process them in sequence. After processing all Undo Records on the current Page, find the previous Undo Page along with the List in the Undo Page Header. Then, repeat the previous process and complete the rollback of all Undo Records of all Pages on a transaction.

After getting an Undo Record, it is natural to analyze its content. row_undo_ins_parse_undo_rec will be called here to obtain the table of the modified row from Undo Record and to parse the primary key information recorded in it. If it is an Update type, an update vector will also be obtained to record its changes relative to the updated version.

TRX_UNDO_INSERT_REC types of Undo rollback are carried out in row_undo_ins. The reverse operation of insert is delete. According to the primary key parsed from Undo Record, row_undo_search_clust_to_pcur is used to locate the corresponding ROW. row_undo_ins_remove_sec_rec and row_undo_ins_remove_clust_rec are called respectively to delete the current row on the secondary index and the primary index.

Undo of the Update type includes TRX_UNDO_UPD_EXIST_REC, TRX_UNDO_DEL_MARK_REC, and TRX_UNDO_UPD_DEL_REC. Their Undo rollback is carried out in row_undo_mod. First, row_undo_mod_del_unmark_sec_and_undo_update will be called, in which the impact of this operation on all secondary indexes will be rolled back according to the update vector parsed from Undo Record. This may include reinserting deleted secondary index records, removing Delete Mark, and the previous value using the diff information in the update vector to modify the secondary index record. After that, the call of row_undo_mod_clust also uses the diff information recorded in the update vector to modify the main index record back to the previous value.

After the Undo Log part of the rollback is completed, the trx_roll_try_truncate is called for recycling, and the disk space is returned to the Undo Segment by calling trx_undo_free_last_page to the page that is no longer used. This is an inverse operation of trx_undo_add_page during writing.

7. Undo for MVCC

The purpose of multi-version is to avoid waiting for each other between write and read transactions. Each read transaction needs to find the corresponding historical version that should be seen without adding Lock to Record. The historical version is assumed to take a snapshot of the entire DB at the beginning of the read-only transaction, and then all read requests of the transaction are obtained from this snapshot. It cannot take a snapshot of each transaction for implementation. The time and space costs are too high. The approach of InnoDB is to obtain a ReadView when reading transactions for the first time and hold it all the time, which records all currently active write transaction IDs. Since the ID of write transactions is self-increasing allocation, through this ReadView, we can know which transactions have been committed and which are still running at this moment. According to the requirements of Reading Committed, the modification of uncommitted transactions should not be seen. Correspondingly, modifications to transactions that have been committed should be seen.

As the Undo Record stores historical versions, the trx_id of the record is used to make this visibility judgment. This value is also available on the Record of the corresponding primary index. When a read transaction takes its ReadView to access a record on a table index, it will determine whether it is a visible version by comparing the trx_id on the Record. If it is invisible, it will look for an earlier historical version along the Rollptr recorded in Record or Undo Record. The following figure shows that transaction R needs to query the record with id 1 on table t at the beginning. Transaction I has been committed at the beginning of R. Transaction J is still running. Transaction K has not started. This information is recorded in the ReadView of transaction R. Transaction R finds the corresponding record [1, C] from the index. The corresponding trx_id is K and is invisible. The previous version [1, B] in Undo is found along Rollptr. The corresponding trx_id is J and is invisible. [1, A] is found along Rollptr, whose trx_id is I, which is visible, and the result is returned.

As mentioned earlier, as Logical Log, Undo records the diff information of the two versions before and after, and the read operation needs to obtain the complete Record content. That is to say, in the process of searching along with the Rollptr pointer, the diff content in Undo Record needs to be used to construct the corresponding historical version in turn. This process is in the row_search_mvcc function, where the trx_undo_prev_version_build will find the corresponding Undo Record position according to the current Rollptr. Here, if Rollptr points to insert type or finds the location where Purge has already been operated, it means it is over and will return failure. Otherwise, the corresponding Undo Record is parsed, and the information (such as trx_id), Rollptr pointing to the next Undo Record, primary key information, and the update vector of diff information is restored. After that, use the update vector to modify the information in the currently held Record copy through row_upd_rec_in_place to obtain this historical version of Record. Then, call changes_visible of your ReadView to determine visibility and return to the user if visible. Finally, complete the reading of this history version.

8. Undo for Crash Recovery

With Crash Recovery, you need to use the information in Undo to roll back all the effects of uncommitted transactions to ensure the Failure Atomic of the database. As mentioned earlier, Undo in InnoDB is processed like data. It can also be seen from the preceding organizational structure that Undo has a much more complicated organizational structure than Redo Log, which is distributed according to transactions rather than written sequentially. Its Durability, like other data in InnoDB, needs to be guaranteed by Redo, as described in An In-Depth Analysis of REDO Logs in InnoDB. In addition to common MLOG_2BYTES and MLOG_4BYTES types, Undo a corresponding Redo Log type. MLOG_UNDO_INIT type records initialization when Undo Page is comfortable. During the allocation of Undo Log, Undo Log Header needs to be reused, or when a new Undo Log Header needs to be created, Redo Record of MLOG_UNDO_HDR_REUSE and MLOG_UNDO_HDR_CREATE types will be recorded, respectively. MLOG_UNDO_INSERT is the most common one. Writing a new Undo Record in the Undo Log corresponds to writing all the contents in the Undo log entry. Finally, MLOG_UNDO_ERASE_END corresponds to the operation of erasing the last incomplete Undo Record when the Undo Log crosses the Undo Page.

For example, in the ARIES process mentioned in Past and Present of the Recovery Mechanism of Database Failure, all Redo logs will be replayed first in the Crash Recovery process. The entire Undo disk organization structure will also be restored as a data type through the replay of these Redo types mentioned before. After that, the disk structure of Undo is scanned in the trx_sys_init_at_db_start, and all Rollback Segments and Undo Segments are traversed. You can know the transaction state of the Undo Segment before Crash by reading the State in the Undo Segment Header. If it is TRX_UNDO_ACTIVE, the transaction needs to be rolled back at that time. Otherwise, the transaction has ended, and the logic of Undo Segment can be cleaned up. After that, the memory organization mode of Undo Log can be restored, including the memory structure trx_t of active transactions, the memory structure trx_rseg_t of Rollback Segment, and the four linked lists of trx_undo_t.

Before Crash Recovery is completed, an asynchronous rollback thread trx_recovery_rollback_thread will be started in the srv_dict_recover_on_restart. Transactions that were active before Crash are rolled back through trx_rollback_active. This process is consistent with the Undo for Rollback mentioned before.

9. Undo Cleanup

We know that InnoDB has saved multiple historical versions in Undo Log to implement MVCC. When a historical version has been confirmed that it will not be seen by any existing and future transactions, it should be cleaned up. Therefore, there needs to be a way to determine which Undo Logs will not be seen again. At the end of each write transaction in InnoDB, an incremental number trx_no is taken as the commit sequence number of the transaction, and each read transaction records the maximum trx_no seen at the beginning in its ReadView as the m_low_limit_no. Then, if the trx_no of a transaction is less than m_low_limit_no in all currently active read transaction Readview, it means the transaction has been committed before all read starts, and its modified new version is visible. Thus, it is no longer necessary to build the previous version through Undo, and the Undo Log of this transaction can be cleaned up. As shown in the following figure, since Transaction J has been already Committed when the earliest ReadView in the ReadView List is obtained, all read transactions can be satisfied by the version in Index or the first Undo version history, so no earlier Undo is needed. The Undo Log of the entire Transaction J can be cleaned up.

The cleanup of Undo is scanned and distributed by a dedicated background thread srv_purge_coordinator_thread and is cleaned up by multiple srv_worker_thread. The coordinator first scans the innodb_purge_batch_size in the function trx_purge_attach_undo_recs and configures Undo Records, which is distributed to the worker as a round of cleanup tasks.

9.1 Scan a Batch of Undo Records to be Cleaned up

At the end of the transaction, Undo Logs of the Update type that require Purge are attached to the History List of the Rollback Segment Header in the order (trx_no) in which the transaction is committed. The basic idea of Undo Log recycling is to traverse all Undo Logs from small to large according to trx_no to clean up. As mentioned earlier, if there are multiple Rollback Segments in InnoDB, there will be multiple History Lists. Each History List has orderly internal transactions, but it is also necessary to find a trx_no globally ordered sequence from multiple History Lists, as shown in the following figure:

The transaction number in the figure is based on InnoDB. A heap structure purge_queue is introduced here to find the next transaction with the smallest trx_no from all History lists in turn. Purge_queue records all Rollback Segments waiting for Purge and transactions with the smallest trx_no in History. The trx_purge_choose_next_log pops the Undo Log with the global smallest trx_no from purge_queue. Call the trx_purge_get_next_rec to traverse the corresponding Undo Log and process each Undo Record. After that, continue to call the trx_purge_rseg_get_next_history_log to obtain the next Undo Log with the smallest trx_no from purge_queue. Continue to push the next Undo Log on the current Rollback Segment into purge_queue, waiting for subsequent sequential processing. The corresponding processing procedure and function calls in the preceding figure are shown in the following figure:

Among them, the trx_purge_get_next_rec will traverse all Undo Records in an Undo Log from top to bottom, which is opposite to the traversal direction from bottom to top when Rollback was mentioned earlier. Let's take the same scenario as an example. An Undo Log to be Purged spans two Undo Pages: Undo Record 1 on Page 1 and Undo Records 2 and 3 on Page 2. The following figure shows that the location Log Start Offset of the first Undo Record will be found from the current Undo Log Header. After processing Undo Record 1, the next Undo Record will be found along with the Next Record Offset. When the end of the Page is found, the next Page will be found through the Page List Node. The first Undo Record on the Page will be found, and the preceding process will be repeated until all Undo Records are found.

It may need to process the information on the index before it is deleted for each Undo Record to be Purged. This is because, when a Record needs to be deleted during normal operation, we have to ensure that its previous version can also be found through Rollptr and that Record is not deleted, which is only marked with Delete Mark, and Undo Record is recorded as a special Update operation. Before the corresponding TRX_UNDO_DEL_MARK_REC type of Undo Record is cleared, the Delete Mark record needs to be deleted from the index. Therefore, the cleanup of Undo Record will be divided into two processes:

The actual deletion of the record corresponding to the Undo Record of the TRX_UNDO_DEL_MARK_REC type is called Undo Purge.
The deletion of Undo Record itself from old to new is called Undo Truncate.

In addition, when the configured independent Undo Tablespace is greater than two, InnoDB supports rebuilding to reduce the Undo Tablespace that exceeds the configured size.

The reconstruction of Undo Tablespace is reduced, which is called Undo Tablespace Truncate.

9.2 Undo Purge

This step is mainly aimed at TRX_UNDO_DEL_MARK_REC type Undo Record, which is used to delete the Record marked Delete Mark on the index. In the row_purge function, the worker thread processes each Undo Record allocated by the coordinator in a loop. First, the type, table_id, Rollptr, primary key information of the corresponding record, and update vector are parsed from the Undo Record by row_purge_parse_undo_rec. After that, for the TRX_UNDO_DEL_MARK_REC type, call row_purge_remove_sec_if_poss to delete the records to be deleted from all secondary indexes and call row_purge_remove_clust_if_poss to delete them from the primary index. Although the Undo record of the TRX_UNDO_UPD_EXIST_REC type does not involve the deletion of the primary index, the deletion of the secondary index may be required, which is also processed here.

9.3 Undo Truncate

The coordinator thread will wait for all workers to complete the Purge work of a batch of Undo Records and try to clean up the Undo Logs that are no longer needed. The trx_purge_truncate function will traverse all Undo Segments in all Rollback Segments. If its state is TRX_UNDO_TO_PURGE, call trx_purge_free_segment to release the occupied disk space and delete them from the History List. Otherwise, it means the Undo Segment is being used or is still being cached (TRX_UNDO_CACHED type). Then, it is only deleted from the History List by trx_purge_remove_log_hd.

Note: The action of Undo Truncate is not performed every time. The parameter innodb_rseg_truncate_frequency controls its frequency. In other words, it takes innodb_rseg_truncate_frequency batch to perform it once. As mentioned earlier, innodb_purge_batch_size Undo Records will be processed in each batch, so the shortening of the Undo History List we see from the show engine InnoDB status is hopping.

9.4 Undo Tablespace Truncate

If innodb_trx_purge_truncate is enabled, the function trx_purge_truncate will also try to rebuild Undo Tablespaces to reduce the file space occupied. After Undo Truncate, all Undo Tablespace will be scanned in the function trx_purge_mark_undo_for_truncate. The Tablespace whose file size is greater than the configured innodb_max_undo_log_size will be marked as inactive. At most one Tablespace will be inactive at each moment. All Rollback Segments on inactive Undo Tablespace will not participate in the allocation of new things. After all active transactions on the file exit and all Undo Log have completed Purge, this Tablespace will be rebuilt through trx_purge_initiate_truncate, including the rebuilding of the file structure and memory structure in Undo Tablespace, and re-marked as active to participate in allocation to new transactions.

10. Summary

This article first briefly introduces the role of Undo Log, the content of an Undo Record, and its logical, physical, file, and memory organization modes. This article also describes the relationship and hierarchy among Undo Tablespace, Rollback Segment, Undo Segment, Undo Log, and Undo Record. These are organized to use and maintain Undo information better. Finally, on this basis, the role and implementation of Undo in various important database functions are introduced, including transaction rollback, MVCC, Crash Recovery, and Purge.

References

[1] MySQL 8.0.11 Source Code Documentation: Format of redo log
https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html

[2] MySQL Source Code
https://github.com/mysql/mysql-server

[3] The basics of the InnoDB undo logging and history system
https://blog.jcole.us/2014/04/16/the-basics-of-the-innodb-undo-logging-and-history-system/

[4] MySQL • Engine Features • InnoDB Undo Log
http://mysql.taobao.org/monthly/2015/04/01/

[5] Past and Present of the Recovery Mechanism of Database Failure
http://catkang.github.io/2019/01/16/crash-recovery.html

[6] Mechanism of Database Concurrency Control
http://catkang.github.io/2018/09/19/concurrency-control.html

[7] An In-Depth Analysis of REDO Logs in InnoDB
https://www.alibabacloud.com/blog/an-in-depth-analysis-of-redo-logs-in-innodb_598965

Community