An Interpretation of PolarDB-X Source Codes (6): Distributed Deadlock Detection

This article focuses on the source codes of the distributed deadlock detection function in PolarDB-X.

Lifecycle of Deadlock Detection Task

The deadlock detection function belongs to the transaction module. The deadlock detection task is mounted in the TransactionManager. Please see the PolarDB-X CN startup process during TransactionManager initialization. The code entry calls the corresponding transactionManager#init method in the MatrixConfigHolder#doInit ), and the TransactionManager initializes a scheduled task for deadlock detection. Each compute node (CN) only has one task, and deadlock detection is performed at regular intervals (default is one second).

Next, let's take a look at what has been completed in this deadlock detection task.

Code Logic: DeadlockDetectionTask

The code of the deadlock detection task is in the DeadlockDetectionTask. Each time the task is scheduled, the code entry is the run method.

This method determines whether the current CN is a leader. Only the leader node executes the deadlock detection task.

if (!hasLeadership()) {
    return;
}

One-time deadlock detection involves three steps.

The first step is to obtain information about all distributed transactions.

// Get all global transaction information
final TrxLookupSet lookupSet = fetchTransInfo();

This step requires sync to all CN nodes to obtain information about all distributed transactions. The fetchTransInfo method calls the FetchTransForDeadlockDetectionSyncAction#sync method of sync. Only distributed transactions greater than one second from the beginning to the present are returned to avoid returning excessive transaction information.

final long beforeTimeMillis = System.currentTimeMillis() - 1000L;
final long beforeTxid = IdGenerator.assembleId(beforeTimeMillis, 0, 0);

for (ITransaction tran : transactions) {
    if (!tran.isDistributed()) {
        continue;
    }
    // Do deadlock detection only for transactions that take longer than 1s.
    if (tran.getId() >= beforeTxid) {
        continue;
    }
    // Get information from this tran.
    ......
}

The returned TrxLookupSet records the information of all transactions, mainly including the transaction id, the id of the frontend connection of the transaction, the shard involved in the transaction, and the connection id of the branch transaction (transactions in MySQL) on the shard. The preceding information will be used in subsequent deadlock detection.

The second step is to obtain the lock wait information of each data node (DN) and update the global transaction waiting graph based on the distributed transaction information of the previous steps.

First, the data source for all DNs is obtained. In the data returned by the following Map, the key is the ID of the DN, and the value is a list that stores the data sources corresponding to all schemas. We can use any data source to access the DN. We will also use the physical shard names stored in these data sources and the connection IDs of branch transactions to determine the corresponding distributed transactions. Note: The corresponding distributed transaction cannot be determined using the connection id of the branch transaction because the same connection id can exist on different DNs at the same time.

// Get all group data sources, and group by DN's ID (host:port)
final Map<String, List<TGroupDataSource>> instId2GroupList = ExecUtils.getInstId2GroupList(allSchemas);

Then, for each DN, we will obtain the lock and branch transaction information on the physical shard and update a global transaction waiting graph (DiGraph in code) in combination with the distributed transaction information in TrxLookupSet obtained in the previous steps. The diGraph is a global transaction waiting graph. Each point in the graph is a transaction, and each directed edge represents the waiting relationship of the transaction.

final DiGraph<TrxLookupSet.Transaction> graph = new DiGraph<>();
for (List<TGroupDataSource> groupDataSources : instId2GroupList.values()) {
    if (CollectionUtils.isNotEmpty(groupDataSources)) {
        // Since all data sources are in the same DN, any data source is ok.
        final TGroupDataSource groupDataSource = groupDataSources.get(0);

        // Get all group names in this DN.
        final Set<String> groupNames =
            groupDataSources.stream().map(TGroupDataSource::getDbGroupKey).collect(Collectors.toSet());

        // Fetch lock-wait information for this DN,
        // and update the lookup set and the graph with the information.
        fetchLockWaits(groupDataSource, groupNames, lookupSet, graph);
    }
}

The fetchLockWaits method queries the three views (information_schema.innodb_locks/innodb_trx/innodb_lock_waits) of DN to obtain the specific row lock information. The content obtained includes the connection id of the branch transaction where the lock is waiting, and some information finally outputs to the deadlock log. Then, the corresponding distributed transaction is found according to the connection id of this branch transaction. The waiting relationship of this branch transaction is converted into the waiting relationship of the distributed transaction and added to the waiting graph.

The last step is to detect whether a ring exists in the graph. If so, roll back a transaction in the ring.

graph.detect().ifPresent((cycle) -> {
    // If a ring is detected, keep a deadlock log.
    DeadlockParser.parseGlobalDeadlock(cycle);
    // Then, select a transaction to roll back. The first transaction is a default choice currently.
    killByFrontendConnId(cycle.get(0));
});

Among them, the graph.detect() is the algorithm of the detection ring, which is implemented in some codes of the DiGraph. Simply put, it performs a depth-first search on the directed graph to detect whether there is a loop. In addition, retained deadlock logs can be viewed using SHOW GLOBAL DEADLOCKS.

Finally, one transaction in the loop will be rolled back to resolve the deadlock. Currently, the first transaction is the default in the rollback ring, which is equivalent to rolling back a transaction randomly.

The code for MDL deadlock detection is similar to the body part in MdlDeadlockDetectionTask.

Other Details

1. How is the sync mechanism implemented in the first step?

In simple terms, the semantics of sync is to let all CN nodes execute the same action and obtain the result of the action. The node that initiates the sync will serialize the action object and send it to all CNs. After the CN is deserialized, the sync method of this object is executed. The result obtained by concurrency is returned. The logic of the behavior above is mainly implemented in the ClusterSyncManager#doSync method. Reading this part of the codes, we can know that sending sync requests to other CN is executing a SYNC schema_name serializedAction of SQL statements. The returned results are similar to those returned by ordinary queries.

2. How is the transaction rolled back in Step 3?

When a deadlock occurs, the statement currently executed by the transaction to be rolled back is stuck on a lock wait. Therefore, the deadlock detection task will initiate a sync request for thr kill query, kill this statement first, and set the error code of the kill query to ERR_TRANS_DEADLOCK. The entry of the kill query is in KillSyncAction, and the code that runs the kill query logic finally is the method doCancel in ServerConnection.CancelQueryTask.

private void doCancel() throws SQLException {
    // The futureCancelErrorCode is used in the following error judgment,
    // Deadlock-caused kill. The error codes are all ERR_TRANS_DEADLOCK.
    futureCancelErrorCode = this.errorCode;

    // Kill all SQL statements that are running in physical connections.
    if (conn != null) {
        conn.kill();
    }

    // Here, f is the task that is running a logical SQL statement.
    Future f = executingFuture;
    if (f != null) {
        f.cancel(true);
    }
}

This method calls the logic of the kill query for each physical connection (the connection between CN and DN) and kills all physical statements corresponding to the logical statement. The physical statement waiting for the lock will be killed, and finally, the thread executing the logical statement will be interrupted.

The interrupted thread will handle the exception in the handleError method in the ServerConnection. If the error code is found to be ERR_TRANS_DEADLOCK, it will roll back the current transaction and send an error message with Deadlock found when trying to get lock; try restarting the transaction.

// Handle deadlock error.
if (isDeadLockException(t)) {
    // Prevent this transaction from committing.
    this.conn.getTrx().setCrucialError(ERR_TRANS_DEADLOCK);

    // Rollback this trx.
    try {
        innerRollback();
    } catch (SQLException exception) {
        logger.warn("rollback failed when deadlock found", exception);
    }
}

Summary

This article introduces the source codes of the PolarDB-X distributed deadlock detection function. Anyone interested can combine the source code interpretation, make a breakpoint in the DeadlockDetectionTask#run method, and observe the results of each step, which can make it easier to understand the implementation of this function.

Community

An Interpretation of PolarDB-X Source Codes (6): Distributed Deadlock Detection

Lifecycle of Deadlock Detection Task

Code Logic: DeadlockDetectionTask

Other Details

Summary

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for Xscale

PolarDB for PostgreSQL

PolarDB for MySQL

LedgerDB