Distributed transaction middleware Fescar - global write exclusive lock interpretation

foreword
Generally, the isolation level of database transactions will be set to read committed, which meets business requirements, so the isolation level of branch (local) transactions in Fescar is read committed, then the isolation level of global transactions in Fescar is what? If you have carefully read the source code interpretation of the distributed transaction middleware Txc/Fescar-RM module, you should be able to infer that Fescar defines the default isolation of global transactions as read and uncommitted. As for the impact of the read-uncommitted isolation level on the business, everyone must be relatively clear, and dirty data will be read. The classic example is the bank transfer example, and the problem of data inconsistency occurs. For Fescar, if no other technical means are taken, there will be serious problems, such as:


As shown in the figure above, what state should global transaction A roll back to resource R1 in the end? Obviously, if you roll back according to UndoLog, a serious problem will occur: the changes to resource R1 by global transaction B will be overwritten. So how does Fescar solve this problem? The answer is Fescar's global write exclusive lock solution. During the execution of global transaction A, global transaction B will be in a waiting state because it cannot acquire the global lock.

For the isolation level of Fescar, cite an official paragraph to explain:

The isolation of global transactions is based on the local isolation level of branch transactions.

Under the premise of the database local isolation level read committed or above, Fescar designed a global write exclusive lock maintained by the transaction coordinator to ensure write isolation between transactions, and the global transaction is defined at the read uncommitted isolation level by default.

Our consensus on the isolation level is that most applications work under the read-committed isolation level without problems. In fact, there are the vast majority of application scenarios, in fact, there is no problem working under the isolation level of read uncommitted.

In extreme scenarios, if the application needs to achieve global read-committed, Fescar also provides a corresponding mechanism to achieve the goal. By default, Fescar works under the read-uncommitted isolation level to ensure the efficiency of most scenarios.

Next, this article will go to the source code level to interpret the Fescar global write exclusive lock implementation scheme. The Fescar global write exclusive lock implementation scheme is maintained in the TC (Transaction Coordinator) module. The RM (Resource Manager) module will request the TC module where the lock is required to acquire the global lock to ensure write isolation between transactions. The following is divided into two parts. : TC-global write exclusive lock implementation scheme, RM-global write exclusive lock use

1. TC—Global Write Exclusive Lock Implementation Scheme

First, let's take a look at the entrance of the TC module interacting with the outside world. The following figure is the main function of the TC module:


The above figure shows that RpcServer handles the logic related to the communication protocol, and the real processor for the TC module is DefaultCoordiantor, which contains all the functions exposed by the TC to the outside world, such as doGlobalBegin (global transaction creation), doGlobalCommit (global transaction commit), doGlobalRollback (global transaction submission) transaction rollback), doBranchReport (branch transaction status reporting), doBranchRegister (branch transaction registration), doLockCheck (global write exclusive lock check), etc., among which doBranchRegister, doLockCheck, doGlobalCommit are the entrances of the global write exclusive lock implementation scheme.

/**
* Branch transaction registration, during the registration process, the global lock resource of the branch transaction will be acquired
*/
@Override
protected void doBranchRegister(BranchRegisterRequest request, BranchRegisterResponse response,
RpcContext rpcContext) throws TransactionException {
response.setTransactionId(request.getTransactionId());
response.setBranchId(core.branchRegister(request.getBranchType(), request.getResourceId(), rpcContext.getClientId(),
XID.generateXID(request.getTransactionId()), request.getLockKey()));
}
/**
* Check whether the global lock can be acquired
*/
@Override
protected void doLockCheck(GlobalLockQueryRequest request, GlobalLockQueryResponse response, RpcContext rpcContext)
throws TransactionException {
response.setLockable(core.lockQuery(request.getBranchType(), request.getResourceId(),
XID.generateXID(request.getTransactionId()), request.getLockKey()));
}
/**
* The global transaction commit will release the lock occupied records of all branch transactions under the global transaction
*/
@Override
protected void doGlobalCommit(GlobalCommitRequest request, GlobalCommitResponse response, RpcContext rpcContext)
throws TransactionException {
response.setGlobalStatus(core.commit(XID.generateXID(request.getTransactionId())));
}
The above code logic will finally be proxied to DefualtCore for execution


As shown in the figure above, whether it is acquiring the lock or verifying the lock status logic, it will eventually be taken over by LockManger, and the logic of LockManager is implemented by DefaultLockManagerImpl, and all designs related to global write exclusive locks are maintained in DefaultLockManagerImpl.

First, let's take a look at the structure of the global write exclusive lock:

private static final ConcurrentHashMap>>> LOCK_MAP = new ConcurrentHashMap<~>();

On the whole, the lock structure is designed with Map, the first half of which is ConcurrentHashMap, and the second half of which is HashMap. In the end, it is actually a lock occupation mark: corresponding to a primary key in a Tabel on a ResourceId (database source ID). Which global transaction holds the global write exclusive lock on the row record. Next, let's take a look at the source code for the specific lock acquisition:


As noted in the figure above, the entire acquireLock logic is still very clear. For the lock resources required by the branch transaction, either all of them are successfully acquired at one time, or all of them fail. There is no partial success or partial failure. Through the above explanation, there may be two questions:

Why use ConcurrentHashMap in the first half of the lock structure and HashMap in the second half?
It is easy to understand that the first half uses ConcurrentHashMap: in order to support better concurrent processing; the question is why the second half does not directly use ConcurrentHashMap, but HashMap? The possible reason is that the second half needs to determine whether the current global transaction occupies the lock resources corresponding to the PK. It is a composite operation. Even if ConcurrentHashMap is used, it is unavoidable to use Synchronized lock for judgment. It is better to directly use a more lightweight one. HashMap.

Why does BranchSession store the held lock resources?
This is relatively simple. The entire lock structure does not reflect which lock records are occupied by the branch transaction, so if the global transaction commits, how does the branch transaction release the occupied lock resources? Therefore, the lock resources occupied by branch transactions are saved in BranchSession.

The following figure shows the logic to verify whether the global lock resource can be acquired:


The following figure shows the logic of the branch transaction releasing the global lock resource


The above is the implementation principle of the global write exclusive lock in the TC module: when the branch transaction is registered, the RM will pass the lock resources required by the current branch transaction together, and the TC acquisition is responsible for the acquisition of the global lock resource (or all successful at one time). , or all fail, there is no partial success or partial failure); when the global transaction is submitted, the TC module automatically releases the lock resources held by all branch transactions under the global transaction; at the same time, in order to reduce the probability of global write exclusive lock acquisition failure , the TC module exposes the interface to verify whether the lock resource can be acquired, and the RM module can verify it in an appropriate position to reduce the failure probability of branch transaction registration.

2. Use of RM-global write exclusive lock

In the RM module, two functions of the global lock of the TC module are mainly used, one is to verify whether the global lock can be acquired, and the other is to register the branch transaction to occupy the global lock. Automatically released when the transaction commits. Before the branch transaction is registered, the global lock state verification logic will be done to ensure that there will be no lock conflict in the branch registration.

When executing Update, Insert, and Delete statements, data snapshots are generated before and after SQL execution to organize UndoLog, and the way to generate snapshots is basically in the form of Select...For Update. RM tries to verify whether the global lock can be used. The acquired logic is in the executor that executes the statement: SelectForUpdateExecutor, as shown in the following figure:



The basic logic is as follows:

Execute the Select ... For update statement, so that the local transaction occupies the corresponding row lock of the database, and other local transactions cannot preempt the local database row lock, and thus will not preempt the global lock.
The loop master checks whether the global lock can be acquired. Since the global lock may be acquired before the current global transaction, it is necessary to wait for the previous global transaction to release the global lock resource; if the verification can acquire the global lock, then due to the steps The reason for 1 is that before the current local transaction ends, other local transactions will not acquire the global lock, thereby ensuring that the branch transaction registration before the current local transaction is committed will not fail due to global lock conflicts.
Note: Careful students may find that for UpdateExecutor and DeleteExecutor corresponding to Update and Delete statements, the Select..For Update statement will be executed due to the acquisition of beforeImage, and then the global lock resource status will be verified, while for the InsertExecutor corresponding to the Insert statement However, there is no relevant global lock verification logic. The reason may be: because it is an Insert, the corresponding inserted row PK is newly added, and the global lock resource must not be occupied, and then the corresponding global lock is registered when the branch transaction before the local transaction is committed. Resources are definitely available.

Next, let's look at how branch transactions are committed, and how the global lock resources that need to be occupied in branch transactions are generated and saved. First, after executing the business SQL, UndoLog will be generated according to beforeImage and afterImage. At the same time, the global lock resource identifier that the current local transaction needs to occupy will also be generated and stored in the ConnectionContext of ContentoionProxy, as shown in the following figure.



In ContentoionProxy.commit, when the branch transaction is registered, the global lock identifier that needs to be occupied saved in the context in the ConnectionProxy will be passed to the TC to acquire the global lock.


The above is the use logic of the global write exclusive lock in the RM module, because before the actual acquisition of the global lock resource, the state of the global lock resource will be checked cyclically to ensure that the actual acquisition of the lock resource will not fail due to lock conflict. But in fact, the disadvantage is also obvious: when the lock conflict is serious, it will increase the time occupied by the local transaction database lock, which will bring a certain performance loss to the business interface.

3. Summary

This article introduces in detail the global write exclusive lock implemented by Fescar to achieve write isolation under the read uncommitted isolation level, including the implementation principle of the global write exclusive lock in the TC module and how the global write exclusive lock is implemented in the RM module. Use logic. In the process of understanding the source code, the author also left two questions:

The global write exclusive lock data structure is stored in memory. What if the server restarts/downtime, that is, what is the high availability solution of the TC module?
What if a lock conflict occurs between a Fescar-managed global transaction and a non-Fescar-managed local transaction? The specific question is as shown in the figure below. The question is: How to roll back global transaction A?

Question 1 needs to be further studied; there is an answer to question 2, but Fescar has not implemented it yet. Specifically, an error will be reported when global transaction A is rolled back, and when branch transaction A1 in global transaction A is rolled back, it will check afterImage and current Whether the corresponding row data in the table is consistent, if it is consistent, it is allowed Rollback, if inconsistent, the rollback fails and the corresponding business party is notified by an alarm, and the business party handles it by itself.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us