[MySQL]MySQL engine features - about InnoDB transaction locks (3)
Created#More Posted time:Nov 4, 2016 13:49 PM
Manage transaction locks
All the transaction lock objects in InnoDB are mounted on the global object lock_sys. At the same time, every transaction object also keeps its own transaction lock. Every table object (dict_table_t) keeps the table lock objects constructed on it,
as shown in the figure below:
Apply a table lock
• First, check whether a table lock of an equal or higher level has been applied in the trx_lock_t::table_locks of the current transaction. If yes, success is returned directly. (lock_table_has)
• Check whether a table lock object that is in conflict with the locking mode currently applied for exists at present (lock_table_other_has_incompatible)
o Directly traverse the chain table dict_table_t::locks
• If a conflicting lock object exists, the lock request needs to enter the waiting queue (lock_table_enqueue_waiting)
o Create a wait lock object (lock_table_create)
o Check whether a deadlock exists (DeadlockChecker::check_and_resolve). When a deadlock exists: if the current session is selected as a victim, remove the lock request (lock_table_remove_low), reset the wait_lock of the current transaction to empty, and return the error code DB_DEADLOCK; if the current session is selected as a victor, the lock wait is removed. It can be deemed that the current session has obtained the lock and success is returned.
o If no deadlock occurs, after you set relevant variables of the transaction object, the error code DB_LOCK_WAIT will be returned and the lock request enters the lock wait state.
• If no conflicting lock exists, the lock object will be directly created (lock_table_create) and added to the queue.
lock_table_create: Create a lock object
• When an AUTO-INC lock is currently requested
o Increment dict_table_t::n_waiting_or_granted_auto_inc_locks. We have mentioned earlier that when this value is not 0, insert operations to the auto increment columns will regress to the old style locking mode.
o The lock object directly references the pre-created dict_table_t::autoinc_lock, and adds it to the trx_t::autoinc_locks set.
• For non-AUTO-INC locks, the lock object will be allocated from a pool.
o In the transaction object trx_t::lock, two pools are available. One is trx_lock_t::rec_pool which pre-allocates a group of lock objects for record lock distribution. The other is trx_lock_t::table_pool, used for allocation of table lock objects. Through the re-allocated memory approach, the resource-consuming memory allocation operations can be avoided when a global lock is held (lock_sys->mutex). The pre-allocated sizes of rec_pool and table_pool are both eight lock objects. (lock_trx_alloc_locks)
o If table_pool is used up, the memory will be used for allocation to create a lock object.
• The constructed lock objects will be added to the trx_t::lock.trx_locks chain table of the transaction and the dict_table_t::locks chain table of the table object.
• The constructed lock objects are added to the trx_t::lock.table_locks set of the current transaction.
We can see that the lock objects will be added to different sets or chain tables. By mounting lock objects to transaction objects, we can quickly check for any table locks held by the current transaction; by mounting lock objects to the lock chain tables of the table object, we can check the global conflict situation on the table.
Apply a row lock
The entry function of row locking is lock_rec_lock. In it, if the first parameter impl is TRUE, and the existing lock on the current record is not in conflict with LOCK_X | LOCK_REC_NOT_GAP, no lock object needs to be created. (See previous descriptions about record lock LOCK_X.) For clearer descriptions, the impl is false by default in the process descriptions below.
• First, try fast lock. For a scenario with few lock conflicts, this is a general locking mode (lock_rec_lock_fast). Fast lock can apply when the following conditions are met:
o There are no record locks on the page where the record is located. The lock object will be directly created and added to rec_hash, and success will be returned.
o When there is only one record lock on the page where the record is located, and the lock belongs to the current transaction, and the pre-allocated bitmap of this record lock can describe the current heap no (The number of pre-allocated bits = number of records on the page when the lock object is created + 64. See the RecLock::lock_size function.), the number of bits is directly set and returned.
• When the fast lock is not applicable, call the slow lock logic (lock_rec_lock_slow).
o Judge whether the current transaction has held a lock of a higher priority. If yes, success is returned directly (lock_rec_has_expl).
o Check whether a lock in conflict with the lock mode currently requested exists (lock_rec_other_has_conflicting). If yes, a lock object is created (RecLock::RecLock) and added to the waiting queue (RecLock::add_to_waitq). Here the deadlock detection will be performed.
o If no conflicting locks exist, the lock will be added to the queue (lock_rec_add_to_queue): When a lock object on the same page already exists and there is no other session waiting for the same heap no, you can set the corresponding bitmap (lock_rec_find_similar_on_page); otherwise, you need to create a new lock object.
• Returned error code such as DB_LOCK_WAIT and DB_DEADLOCK will be handled in the upper layer.
Wait and deadlock identification
When a conflicting lock is discovered, you can call the RecLock::add_to_waitq function for identification.
• If the thread holding the conflicting lock is an internal background thread (such as the background dict_state thread), the thread will not be canceled by a transaction of a higher priority, because internal threads are always prioritized for normal execution.
• Compare the transaction priority of the current session and the session holding the lock, and call the trx_arbitrate function to return the transaction selected as the victim.
o The requesting session is a background thread. But when the session holding the lock is set with a higher priority, the current thread is selected as the victim.
o When the lock-holding thread is a background thread, the identification has been performed in Step 1 and it will not be selected as the victim.
o If both sessions are set with priority, the lower-priority session will be selected as the victim. When their priority levels are equal, the requester will be selected as the victim (thd_tx_arbitrate).
o P.S.: At present, the latest V5.7 does not support setting the thread priority on the user end yet (but it is very easy to add an interface for configuring session variables).
• When the priority level of the current session is low, or the other lock-holding session is a background thread, if the current session has a priority level set at this time, an error will be reported directly and the error code DB_DEADLOCK will be returned.
o When no priority is set by default, the lock-requesting session will also be selected as the victim_trx. But in this case, only the lock wait object will be created, and no error is returned directly.
• When a lock-holding session is selected as the victim, it indicates that the current session must have set a higher priority level. At this time, the RecLock::enqueue_priority logic will apply.
o If the lock-holding session is waiting for another different lock, or the lock-holding transaction is not read-only, the current session will be rolled back.
o The jump queue will start until the current session meets the condition for locking (RecLock::jump_queue).
The requested lock object jumps over the lock objects that block its way. The hash chain table is directly manipulated to move the lock object forward.
Traverse the chain tables forward from the current lock and sequentially identify whether any other sessions hold the lock on the same record (RecLock::is_on_row). If such sessions exist, they will be marked for rollback (mark_trx_for_rollback). At the same time, these transaction objects will be collected for subsequent handling. (But transactions that block the current session will be immediately rolled back.)
o High-priority sessions are highly lethal since even other sessions have obtained the lock, they will be killed by high-priority sessions.
But in real scenarios, we don't have many opportunities to set the priority of a transaction. This topic can be skipped here and we only consider the default scenario, that is, all the transactions do not have a priority level set.
After a lock object in the wait state is created, we need to perform the deadlock check (RecLock::deadlock_check). The deadlock check adopts Depth-First Traversal and constructs the wait-for graph of the transaction for identification through the trx_t::lock.wait_lock on the transaction object. When it finally discovers a closed loop of lock request wait, it identifies that a deadlock has occurred. The other case is that if the check depth is too deep (that is, the check chain formed by lock wait sessions is very long), the deadlock check also deems that a deadlock has occurred. The maximum depth is LOCK_MAX_DEPTH_IN_DEADLOCK_CHECK by default, and the value is 200.
When a deadlock occurs, you need to select a victim (DeadlockChecker::select_victim()) to remove the deadlock. Usually a transaction with a low weight will be rolled back (trx_weight_ge).
• Sessions that have modified non-transaction tables enjoy higher weights;
• If the sessions have modified both the transaction and non-transaction tables, or none of them have modified transaction tables, the weights are determined according to the volume of undo of the transaction and the number of transaction locks they hold. (TRX_WEIGHT)
• Transactions with low weights will be rolled back, and those with high weights obtain the lock objects.
Tips: For a well-designed application, we can avoid deadlocks from the business layer. The deadlock detection is performed by holding the global lock, which is quite costly. Among Alibaba's internal applications, there are professional teams to ensure the business SQL quality, so we can selectively disable the deadlock detection feature to improve performance, especially in the scenario of hotspot updating which witnesses significant performance enhancement. In extremely high concurrency, the performance may be improved by several times.
When a lock cannot be obtained immediately, the error code will be passed to the upper layer for processing (row_mysql_handle_errors).
o The transaction with a higher priority has collected the chain table of transactions that will block it. At this time, these transactions will be rolled back in a uniform way (trx_kill_blocking);
o The current thread will be suspended (lock_wait_suspend_thread). The wait timeout duration is dependent on the level configuration of the session (innodb_lock_wait_timeout). The value is 50 seconds by default.
o If the current session status is set to running: One case is that it is selected as the victim of deadlock detection, and the current transaction needs to be rolled back. The other case is that it has obtained the transaction lock before it enters the wait status, so it does not need to wait.
o Obtain an idle slot of the waiting queue. (lock_wait_table_reserve_slot)
When the system is started, it has created enough number of slot arrays for use. The slot type is srv_slot_t, mounted on lock_sys->waiting_threads.
When the slots are allocated, the traversal starts from the first element of the slot array until it finds an idle slot. Note: A performance issue exists here: if there are many threads suspended, every new suspended or waiting thread should be traversed until an idle slot is found. In fact, if every traversal starts from the last allocation position, and continues to the array beginning after it reaches the end of the array, the performance will be improved to some extent in a high-concurrency scenario with many lock conflicts.
o If the session is in the InnoDB layer (usually true), it will be forced to exit from the InnoDB layer to make sure it doesn’t occupy the slots for innodb_thread_concurrency. And the session will enter the wait state. After awakened, it will be forced to enter the InnoDB layer again.
o After the session is awakened, the slot will be released (lock_wait_table_release_slot).
o If the session is selected as the victim to solve the deadlock, return to the upper lay to roll back the transaction; if the wait times out, according to the configuration of the innodb_rollback_on_timeout parameter (OFF by default), only the current SQL statement will be rolled back. When the parameter is set to ON, it indicates the whole transaction will be rolled back.
• DB_DEADLOCK: Roll back the current transaction directly.
Release and awake locks
In most cases, the transaction lock is released when the transaction is committed. But there are two exceptions:
• The AUTO-INC lock is directly released when the SQL statement is completed (innobase_commit --> lock_unlock_table_autoinc).
• When you execute DML statements at the RC isolation level, records returned from the engine layer to the server layer should be immediately unlocked if they do not meet the where condition. (ha_innobase::unlock_row)
Except the two cases above, all other transaction locks are released when the transaction is committed. (lock_trx_release_locks --> lock_release). All the locks held by the transaction are maintained in the trx_t::lock.trx_locks chain table. You can traverse and release them in order.
After a row lock is deleted from the global hash, you also need to identify whether there are any other waiting sessions that can be awakened (lock_rec_dequeue_from_page). If the lock currently released is an X lock of a record, all the sessions requesting S locks can be awakened.
Here, the logic overhead for removing locks and checks is huge, especially when a large number of threads are waiting for a few locks. When removing a lock from the hash chain, InnoDB actually traverses all the waiting locks on the same page and identifies whether these waiting locks can be awakened. However, the logic for identifying whether to awaken the locks goes through another traversal, because the maintenance of the current chain table is not constructed based on the heap no. Discussions about this topic can be found in bug#53825. It is also officially mentioned that although Sunny was developed to construct the chain table, it will waste more memory to remove Bitmap. But it boasts a high performance and the memory is not as expensive as it was in the past.
If the type of a table lock is not LOCK_IS, and the current transaction has modified the data, the dict_table_t::query_cache_inv_id of the table object should be set to the maximum transaction ID currently. The value will be used for judgment when you check whether the query cache of the table can be used (row_search_check_if_query_cache_permitted). If the low_limit_id (that is, the maximum transaction ID visible) of a transaction object of the user session is smaller than this value, it indicates that it should not use the content in the current table cache, nor should it be stored into the query cache.
To release a table lock object, call the lock_table_dequeue function.
Note: When you release a lock, if the transaction holds too many lock objects, the lock_sys->mutex will be released temporarily and then held again after every 1,000 lock objects are released (LOCK_RELEASE_INTERVAL), in a bid to prevent InnoDB from being hung.
Two interesting cases
In this section, we will analyze several interesting deadlock cases.
A deadlock caused by general concurrent inserts
create table t1 (a int primary key);
Enable three sessions and execute: insert into t1(a) values (2);
The table above describes a scenario of mutual waits, because inserting the intention X lock and inserting the S lock are not compatible. This is also a typical deadlock caused by lock upgrading. If session1 executes COMMIT, both of the other two threads will fail because of the duplicate key.
Here I need to explain why we want to apply for inserting the intension lock. Because during rollback, the original record is marked to be deleted. The record we tried to insert and the record marked to be deleted are adjacent (with the same key value). According to the rules for inserting intention locks, when the next record of the inserted location holds a lock in conflict with the inserted intention X lock, the inserted intention X lock should be obtained.
Another similar (but with different causes for the deadlock) scenario is that on a table with both clustered indexes and unique indexes, insert conflicting unique keys through replace into, and a deadlock may occur. In the March monthly report, I have specifically described this issue.
Another deadlock caused by concurrent inserts
Two sessions are involved. At the RR isolation level
The example table is as follows:
create table t1 (a int primary key ,b int);
insert into t1 values (1,2),(2,3),(3,4),(11,22);
The cause of this deadlock is that the GAP X lock with no insert intension and the X lock with insert intention are conflicted.