×
Community Blog MySQL Deep Dive - Implementation and Acquisition Mechanism of Metadata Locking

MySQL Deep Dive - Implementation and Acquisition Mechanism of Metadata Locking

This article introduces the commonly used data structures and meanings in MDL systems and discusses the acquisition mechanism and deadlock detection of MDL.

By Boge

1. Background

The Metadata Locking (MDL) mechanism is implemented at the server layer to meet the transaction isolation and consistency requirements of databases under concurrent requests. It provides a variety of storage engines that can be used as plug-ins for MySQL. For example, it can restrict other concurrent transactions to delete a resource of the database when the transaction accesses the resource. This is a lock in the logical sense. Unlike the limited type of mutex provided by the operating system kernel, MDL can flexibly customize the object of the lock, the type of the lock, and the priority of different lock types. It even can be used in different systems. Dynamic adjustment of the compatibility of different lock types facilitates the database to perform reasonable concurrency control on various query requests.

This article introduces the commonly used data structures and meanings in MDL systems and discusses the acquisition mechanism and deadlock detection of MDL from the perspective of implementation. Finally, it explains how to monitor MDL status in practice.

2. Basic Concepts

2.1 MDL_key

Objects of MDL are described through key-value pairs. Each key value uniquely represents a lock object. (Value represents a resource of a database.) The key is represented by MDL_key, which represents the name of the object in the form of a string.

1

A complete string consists of namespaces and names at each level of the hierarchy. Multiple namespaces can distinguish different types of objects with the same name. Namespaces include different object types that can be created in databases such as GLOBAL, SCHEMA, TABLE, FUNCTION, and PROCEDURE. The name of an object consists of a variety of layers depending on the type. For example, the table object is uniquely described by the database name and the table name. If it is a SCHEMA object, there is only one level of the database name. The names are separated by the string terminator '0'. Therefore, the whole string composed of these parts can be used as a unique key to represent a certain object in the database.

2.2 enum_mdl_type

Different queries have different access modes for the same database object. For example, the SELECT statement is to read the content of the object, the INSERT/UPDATE statement is to modify the content of the object, and the DDL statement is to modify the structure and definition of the object. These statements have different requirements for object impact and concurrency isolation. Therefore, MySQL defines different types of MDL and their compatibility to control concurrent access to these statements.

The type of MDL is represented by enum_mdl_type. The most commonly used types include:

  • MDL_SHARED(S): It can share metadata of an access object, such as SHOW CREATE TABLE statements.
  • MDL_SHARED_READ(SR): It can share the data of the access object, such as the SELECT statement.
  • MDL_SHARED_WRITE(SW): You can modify the data of an object, such as the INSERT / UPDATE statement.
  • MDL_SHARED_UPGRADABLE(SU): It is a shared lock that can be upgraded and later can be upgraded to a stronger lock (such as X lock, blocking concurrent access), such as the first stage of DDL.
  • MDL_EXCLUSIVE(X): It is an exclusive lock that blocks concurrent access from other threads to the object. You can modify the metadata of the object, such as in the second phase of DDL.

Different query statements can request different types of MDLs and flexibly customize compatibility between different types of MDLs to control the concurrency of conflicting statements. The default compatibility among different types of MDLs for the same object is described below.

Compatibility of Different Types of MDL

MySQL divides lock types into range locks and object locks.

(1) Range Lock

Range lock has a few types (IX, S, X), mainly used for objects in the GLOBAL, COMMIT, TABLESPACE, BACKUP_LOCK, and SCHEMA namespaces. The compatibility of these types is simple, which restricts concurrent operations as a whole. For example, blocking transaction commit by global read locks and DDL update table object metadata blocks SCHEMA-level modification operations by requesting SCHEMA scope intention exclusive lock (IX).

The MDL compatibility relationship of these types is defined by two matrices. For the same object, one is the compatibility of the MDL type that has been obtained with the new request type. The other is the compatibility of the MDL request type that has not been obtained and is waiting for the new request type. Since IS (INTENTION_SHARE) is compatible with other locks in all cases, it can be ignored in MDL systems.

        | Type of active   |
Request |   scoped lock    |
 type   | IS(*)  IX   S  X |
---------+------------------+
IS       |  +      +   +  + |
IX       |  +      +   -  - |
S        |  +      -   +  - |
X        |  +      -   -  - |
        
         |    Pending      |
 Request |  scoped lock    |
  type   | IS(*)  IX  S  X |
 ---------+-----------------+
IS       |  +      +  +  + |
IX       |  +      +  -  - |
S        |  +      +  +  - |
X        |  +      +  +  + |
        
Here: "+" -- means that request can be satisfied
 "-" -- means that request can't be satisfied and should wait

(2) Object Lock

Object lock contains a wide range of MDL types and applies to most basic objects in the database. Their compatibility matrix is listed below:

  Request  |  Granted requests for lock            |
   type    | S  SH  SR  SW  SWLP  SU  SRO  SNW  SNRW  X  |
 ----------+---------------------------------------------+
 S         | +   +   +   +    +    +   +    +    +    -  |
 SH        | +   +   +   +    +    +   +    +    +    -  |
 SR        | +   +   +   +    +    +   +    +    -    -  |
 SW        | +   +   +   +    +    +   -    -    -    -  |
 SWLP      | +   +   +   +    +    +   -    -    -    -  |
 SU        | +   +   +   +    +    -   +    -    -    -  |
 SRO       | +   +   +   -    -    +   +    +    -    -  |
 SNW       | +   +   +   -    -    -   +    -    -    -  |
 SNRW      | +   +   -   -    -    -   -    -    -    -  |
 X         | -   -   -   -    -    -   -    -    -    -  |
        
  Request  |         Pending requests for lock          |
   type    | S  SH  SR  SW  SWLP  SU  SRO  SNW  SNRW  X |
 ----------+--------------------------------------------+
 S         | +   +   +   +    +    +   +    +     +   - |
 SH        | +   +   +   +    +    +   +    +     +   + |
 SR        | +   +   +   +    +    +   +    +     -   - |
 SW        | +   +   +   +    +    +   +    -     -   - |
 SWLP      | +   +   +   +    +    +   -    -     -   - |
 SU        | +   +   +   +    +    +   +    +     +   - |
 SRO       | +   +   +   -    +    +   +    +     -   - |
 SNW       | +   +   +   +    +    +   +    +     +   - |
 SNRW      | +   +   +   +    +    +   +    +     +   - |
 X         | +   +   +   +    +    +   +    +     +   + |
        
  Here: "+" -- means that request can be satisfied
        "-" -- means that request can't be satisfied and should wait

During the acquisition process of MDL, the two compatibility matrices can be used to determine whether there is an MDL in the granted/pending state that is incompatible with the requested MDL. Then, it can determine whether the request can be satisfied. If not, enter the pending waiting state.

The MDL system also uses the compatibility matrix to determine the strength of the lock type. The method is listed below:

/**
  Check if ticket represents metadata lock of "stronger" or equal type
  than specified one. I.e. if metadata lock represented by ticket won't
  allow any of locks which are not allowed by specified type of lock.

  @return true  if ticket has stronger or equal type
          false otherwise.
*/
bool MDL_ticket::has_stronger_or_equal_type(enum_mdl_type type) const {
  const MDL_lock::bitmap_t *granted_incompat_map =
      m_lock->incompatible_granted_types_bitmap();

  return !(granted_incompat_map[type] & ~(granted_incompat_map[m_type]));
}

The expression can be understood that if the type is not compatible with the MDL compatible with a certain m_type, the type is stronger. Otherwise, the m_type is the same or stronger. In another case, if weaker types are incompatible with MDL types, stronger MDLs are incompatible.

3. Important Data Structures

3.1 Relationship Diagram

2

3.2 MDL_request

It represents a statement request for MDL, consisting of MDL_key, enum_mdl_type, and enum_mdl_duration. MDL_key and enum_mdl_type determine the MDL object and lock type.

There are three types of enum_mdl_duration, which represent the holding period of MDL, including a single statement-level period, transaction-level period, and explicit period.

The lifecycle of MDL_request is outside the MDL system and is controlled by the user. It can be a temporary variable. However, the lifecycle of MDL obtained through this request is persistent, controlled by the MDL system, and will not be released with the destruction of MDL_request.

3.3 MDL_lock

For an object in a database, only one lock object named MDL_lock that corresponds to its name (MDL_key) exists. When an object of the database is accessed for the first time, the MDL_lock is created and managed by the lock-free HASH in its memory. When subsequent accesses come, accesses to the same object will refer to the same MDL_lock.

MDL_lock contains both the m_waiting queue that is currently waiting for the lock object and the m_granted queue that has been granted to the object. The elements in the queue are represented by MDL_ticket.

An MDL_lock_strategy composed of static bitmap objects is used to store the compatibility matrix between the range lock and object lock above. The compatibility of the lock can be obtained according to the namespace of MDL_lock.

3.4 MDL_ticket

MDL_lock and enum_mdl_type form MDL_ticket, which represents the current thread's access to database objects. MDL_ticket is created when each query requests an MDL lock. Memory is allocated by the MDL system and destroyed at the end of the transaction.

MDL_ticket contains two sets of pointers to connect all tickets obtained by the thread and the tickets in the waiting state or granted state of the lock object the ticket participates in.

3.5 MDL_context

The context in which a thread obtains an MDL lock. Each connection corresponds to a context in which all MDL_tickets obtained by the connection are included. They are stored in their linked lists according to different lifecycles and managed by MDL_ticket_store.

All locks obtained by a connection can be divided into three types according to the lifecycle: statement-level, transaction-level, and explicit lock. Both statement-level and transaction-level locks have automatic lifecycle and scope of action, and they accumulate in a transaction process. Statement-level locks are automatically released after the outermost statement ends. Transaction-level locks are released after COMMIT, ROLLBACK, and ROLLBACK TO SAVEPOINT. They are not manually released. Tickets with explicit lifecycle are acquired for locks across transactions and checkpoints, including HANDLER SQL locks, LOCK TABLES locks, and user-level locks GET_LOCK()/RELEASE_LOCK(). Statement-level and transaction-level locks will be added to the front of the corresponding linked list in reverse chronological order. When we roll back to a certain checkpoint, the corresponding ticket will be released from the front of the linked list until the last ticket obtained before the checkpoint is created. When a thread wants to acquire an MDL lock, it will first find out in its MDL_ticket_store whether a stronger type of MDL_ticket with the same lock object has been acquired in the transaction.

Therefore, the MDL_ticket_store provides an interface to find MDL_ticket based on MDL_request requests. One is to find MDL_ticket linked lists in the different lifecycle. If the number of MDL_ticket obtained by the current thread exceeds the threshold (256 by default), all MDL_ticket will be maintained in additional std::unordered_multimap to accelerate the search.

MDL_ticket_store::MDL_ticket_handle MDL_ticket_store::find(
    const MDL_request &req) const {
#ifndef DBUG_OFF
  if (m_count >= THRESHOLD) {
    MDL_ticket_handle list_h = find_in_lists(req);
    MDL_ticket_handle hash_h = find_in_hash(req);

    DBUG_ASSERT(equivalent(list_h.m_ticket, hash_h.m_ticket, req.duration));
  }
#endif /*! DBUG_OFF */
  return (m_map == nullptr || m_count < THRESHOLD) ? find_in_lists(req)
                                                   : find_in_hash(req);
}

4. Acquisition of MDL

Almost all query statements (including DML and DDL statements) are in the parse phase. LEX and YACC initialize MDL lock requests for tables that need to be accessed based on the type of the statement. For example, the SELECT statement is SR, the INSERT statement is SW, and the ALTER TABLE statement is SU. This process is in the following call stack:

PT_table_factor_table_ident::contextualize()
  |--SELECT_LEX::add_table_to_list()
    |--MDL_REQUEST_INIT -> MDL_request::init_with_source()

Before the statement is executed, all tables that need to be accessed are opened by the open_tables_for_query function to obtain the TABLE table object. In this process, MDL locks are obtained first. Then, table resources are obtained. It prevents concurrent reads and writes to the metadata of the same table. Requests for MDL locks are made by the current thread's context MDL_context call MDL_context::acquire_lock. The call stack is listed below:

open_tables_for_query()
  |--open_table() // Open each table in a loop.
    |--open_table_get_mdl_lock()
      |--MDL_context::acquire_lock() // Obtain the lock. If a lock conflict occurs, wait for the lock to be released.
        |--MDL_context::try_acquire_lock_impl()

4.1 MDL_context::try_acquire_lock_impl

Next, let's focus on the process of MDL_context::try_acquire_lock_impl. This function includes the acquisition of various types of locks (with good compatibility and poor compatibility) and lock conflict detection. The input parameter is the current MDL_request, and the output parameter is the obtained MDL_ticket.

First, it will look for tickets of stronger type, the same lifecycle, or different types in the same object MDL_ticket held by the current thread according to MDL_request. If you already have the same lifecycle, return it directly. If you have different lifecycles, you can clone a return for the same period based on the ticket.

We mentioned earlier it can be divided into unobtrusive and obtrusive locks according to the compatibility of lock types. In the lock acquisition process, it also corresponds to the fast path and slow path (respectively), which means the acquisition difficulty is different.

Unobtrusive (Fast Path)

For some MDL requests of weak types (unobtrusive, such as SR/SW), these requests account for the vast majority and have good compatibility. You do not need to record the specific MDL_ticket but only record the number of requests that have been obtained. Therefore, the integer atomic variable std::atomic m_fast_path_state is used in MDL_lock to count the number of all unobtrusive lock types granted by the lock. Each unobtrusive lock has different numerical values, leaving a fixed bit range to store the accumulated results of the lock types, which is equivalent to counting the number of all unobtrusive locks granted with one longlong type and can be modified without locks by CAS. In addition, there are three status indication bits in the m_fast_path_state high bit, which are IS_DESTROYED/HAS_OBTRUSIVE/HAS_SLOW_PATH.

/**
   Array of increments for "unobtrusive" types of lock requests for
   per-object locks.

   @sa MDL_lock::get_unobtrusive_lock_increment().

   For per-object locks:
   - "unobtrusive" types: S, SH, SR and SW
   - "obtrusive" types: SU, SRO, SNW, SNRW, X

   Number of locks acquired using "fast path" are encoded in the following
   bits of MDL_lock::m_fast_path_state:

   - bits 0 .. 19  - S and SH (we don't differentiate them once acquired)
   - bits 20 .. 39 - SR
   - bits 40 .. 59 - SW and SWLP (we don't differentiate them once acquired)

   Overflow is not an issue as we are unlikely to support more than 2^20 - 1
   concurrent connections in foreseeable future.

   This encoding defines the below contents of increment array.
*/
{0, 1, 1, 1ULL << 20, 1ULL << 40, 1ULL << 40, 0, 0, 0, 0, 0},

Obtain the increment value of the unobtrusive integer type based on the request type of MDL_request. If the increment value is 0, the value is an obtrusive lock. The slow path is required.

/**
  @returns "Fast path" increment for request for "unobtrusive" type
            of lock, 0 - if it is request for "obtrusive" type of
            lock.

  @sa Description at method declaration for more details.
*/
MDL_lock::fast_path_state_t MDL_lock::get_unobtrusive_lock_increment(
    const MDL_request *request) {
  return MDL_lock::get_strategy(request->key)
      ->m_unobtrusive_lock_increment[request->type];
}

If it is not 0, it means the lock type is unobtrusive. Then, the fast path is used, and the corresponding integer value is incremented by the MDL_lock::m_fast_path_state directly through CAS. However, it is necessary to confirm a condition that the object is not locked by other threads in the form of obtrusive because some of the lock types of unobtrusive and obtrusive are mutually exclusive. Other unobtrusive locks are compatible with each other only when there is no obtrusive lock. Only in this condition can they be directly obtained without judging the lock holding status of other threads.

MDL_lock::fast_path_state_t old_state = lock->m_fast_path_state;

do {
  /*
    Check if hash look-up returned object marked as destroyed or
    it was marked as such while it was pinned by us. If yes we
    need to unpin it and retry look-up.
  */
  if (old_state & MDL_lock::IS_DESTROYED) {
    if (pinned) lf_hash_search_unpin(m_pins);
    goto retry;
  }

  /*
    Check that there are no granted/pending "obtrusive" locks and nobody
    even is about to try to check if such lock can be acquired.

    In these cases we need to take "slow path".
  */
  if (old_state & MDL_lock::HAS_OBTRUSIVE) goto slow_path;

  } while (!lock->fast_path_state_cas(
      &old_state, old_state + unobtrusive_lock_increment));

After the CAS is complete, set the status and reference of the relevant data structure, and add the current MDL_ticket to the thread's MDL_ticket_store to return:

/*
  Since this MDL_ticket is not visible to any threads other than
  the current one, we can set MDL_ticket::m_lock member without
  protect of MDL_lock::m_rwlock. MDL_lock won't be deleted
  underneath our feet as MDL_lock::m_fast_path_state serves as
  reference counter in this case.
*/
ticket->m_lock = lock;
ticket->m_is_fast_path = true;
m_ticket_store.push_front(mdl_request->duration, ticket);
mdl_request->ticket = ticket;

mysql_mdl_set_status(ticket->m_psi, MDL_ticket::GRANTED);

Obtrusive (Slow Path)

For some MDL requests of relatively strong types (obtrusive, such as SU, SRO, and X), the corresponding MDL_ticket is stored in the m_granted linked list of the corresponding MDL_lock. Therefore, it is also necessary to traverse this linked list and other bitmaps to determine whether there is a lock conflict with MDL_ticket that other threads are waiting for or have acquired.

Before you use the slow path to obtain a lock, the current thread needs to materialize the MDL_lock::m_fast_path_state that was obtained by the current thread through the fast path, remove the lock from the bitmap, and add it to the MDL_lock::m_granted file. The bitmap contained in MDL_lock::m_fast_path_state cannot distinguish threads, and multiple locks acquired by the current thread do not constitute lock conflicts. Therefore, before judging by bitmap, you need to ensure that the tickets m_fast_path_state MDL_lock:: belong to other threads.

/**
  "Materialize" requests for locks which were satisfied using
  "fast path" by properly including them into corresponding
  MDL_lock::m_granted bitmaps/lists and removing it from
  packed counter in MDL_lock::m_fast_path_state.
*/
void MDL_context::materialize_fast_path_locks() {
  int i;

  for (i = 0; i < MDL_DURATION_END; i++) {
    MDL_ticket_store::List_iterator it = m_ticket_store.list_iterator(i);

    MDL_ticket *matf = m_ticket_store.materialized_front(i);
    for (MDL_ticket *ticket = it++; ticket != matf; ticket = it++) {
      if (ticket->m_is_fast_path) {
        MDL_lock *lock = ticket->m_lock;
        MDL_lock::fast_path_state_t unobtrusive_lock_increment =
            lock->get_unobtrusive_lock_increment(ticket->get_type());
        ticket->m_is_fast_path = false;
        mysql_prlock_wrlock(&lock->m_rwlock);
        lock->m_granted.add_ticket(ticket);
        /*
          Atomically decrement counter in MDL_lock::m_fast_path_state.
          This needs to happen under protection of MDL_lock::m_rwlock to make
          it atomic with addition of ticket to MDL_lock::m_granted list and
          to enforce invariant [INV1].
        */
        MDL_lock::fast_path_state_t old_state = lock->m_fast_path_state;
        while (!lock->fast_path_state_cas(
            &old_state, ((old_state - unobtrusive_lock_increment) |
                         MDL_lock::HAS_SLOW_PATH))) {
        }
        mysql_prlock_unlock(&lock->m_rwlock);
      }
    }
  }
  m_ticket_store.set_materialized();
}

After materialization is completed, you can determine whether the requested lock type can be obtained based on the ticket type (m_waiting), ticket type (m_granted), and lock type status (MDL_lock::m_fast_path_state) of unobtrusive. This process is mainly in the MDL_lock::can_grant_lock based on the compatibility matrix.

bool MDL_lock::can_grant_lock(enum_mdl_type type_arg,
                              const MDL_context *requestor_ctx) const {
  bool can_grant = false;
  bitmap_t waiting_incompat_map = incompatible_waiting_types_bitmap()[type_arg];
  bitmap_t granted_incompat_map = incompatible_granted_types_bitmap()[type_arg];

  /*
    New lock request can be satisfied iff:
    - There are no incompatible types of satisfied requests
    in other contexts
    - There are no waiting requests which have higher priority
    than this request.
  */
  if (!(m_waiting.bitmap() & waiting_incompat_map)) {
    if (!(fast_path_granted_bitmap() & granted_incompat_map)) {
      if (!(m_granted.bitmap() & granted_incompat_map))
        can_grant = true;
      else {
        Ticket_iterator it(m_granted);
        MDL_ticket *ticket;

        /*
          There is an incompatible lock. Check that it belongs to some
          other context.
        */
        while ((ticket = it++)) {
          if (ticket->get_ctx() != requestor_ctx &&
              ticket->is_incompatible_when_granted(type_arg))
            break;
        }
        if (ticket == NULL) /* Incompatible locks are our own. */
          can_grant = true;
      }
    }
  }
  return can_grant;
}

In m_waiting and m_granted, in addition to a linked list to connect tickets, the bitmap is used to collect all ticket types in the linked list. It is convenient for direct comparison. After finding an incompatible type in m_granted, you also need to iterate through the linked list to determine whether the ticket of the incompatible type is obtained by the current thread. Lock conflicts occur only when the ticket of the incompatible type is obtained by the non-current thread. If the unobtrusive lock can be obtained, it is directly added to the MDL_lock::m_granted linked list.

4.2 Lock Wait and Notification

In the preceding procedure, if the MDL_ticket is successfully obtained, the MDL is obtained, and the query process can be continued. If the lock cannot be obtained (whether the lock of unobtrusive is forced to take the slow path due to the existence of the obtrusive lock, or the lock of obtrusive itself cannot be obtained), lock waiting is required. The lock waiting process does not distinguish whether it is unobtrusive or obtrusive and will handle it uniformly.

The MDL_context of each thread contains one MDL_wait member because lock wait and deadlock detection are both threads as objects. Notifications are subscribed by adding the MDL_ticket of the corresponding request to the lock wait queue. There is a set of mutex, condition variable, and enumeration states used to complete waiting and notification between threads. There are five states of waiting:

// WS_EMPTY since EMPTY conflicts with #define in system headers on some
// platforms.
enum enum_wait_status { WS_EMPTY = 0, GRANTED, VICTIM, TIMEOUT, KILLED };

WS_EMPTY is the initial state. The rest is the resulting state of waiting. As seen in the command, the result of waiting may be:

  • GRANTED: The thread acquires the MDL lock that is waiting.
  • VICTIM: The thread (as a victim of a deadlock) is asked to re-execute the transaction.
  • TIMEOUT: Timeout for waiting
  • KILLED: The thread is killed during the wait.

First, the waiting thread adds the ticket it wants to obtain to the m_waiting queue of MDL_lock and then calls the MDL_wait function according to the configured waiting time to wait out:

/**
  Wait for the status to be assigned to this wait slot.
*/
MDL_wait::enum_wait_status MDL_wait::timed_wait(
    MDL_context_owner *owner, struct timespec *abs_timeout,
    bool set_status_on_timeout, const PSI_stage_info *wait_state_name) {
  enum_wait_status result;
  int wait_result = 0;

  mysql_mutex_lock(&m_LOCK_wait_status);

  while (!m_wait_status && !owner->is_killed() && !is_timeout(wait_result)) {
    wait_result = mysql_cond_timedwait(&m_COND_wait_status, &m_LOCK_wait_status,
                                       abs_timeout);
  }

  if (m_wait_status == WS_EMPTY) {
    if (owner->is_killed())
      m_wait_status = KILLED;
    else if (set_status_on_timeout)
      m_wait_status = TIMEOUT;
  }
  result = m_wait_status;

  mysql_mutex_unlock(&m_LOCK_wait_status);

  return result;
}

When other threads that hold incompatible locks complete a query or a transaction ends, all the locks that are held are released. The MDL_lock::m_fast_path_state and the MDL_lock::m_granted linked list are restored based on whether they are obtained from the fast path or slow path. In addition, if MDL_lock::m_waiting has a waiting ticket, MDL_lock::reschedule_waiters() will be called to wake up the thread that can acquire the lock and set the waiting state to GRANTED:

void MDL_lock::reschedule_waiters() {
  MDL_lock::Ticket_iterator it(m_waiting);
  MDL_ticket *ticket;

  while ((ticket = it++)) {
    if (can_grant_lock(ticket->get_type(), ticket->get_ctx())) {
      if (!ticket->get_ctx()->m_wait.set_status(MDL_wait::GRANTED)) {
        m_waiting.remove_ticket(ticket);
        m_granted.add_ticket(ticket);
 ...
/**
  Set the status unless it's already set. Return false if set,
  true otherwise.
*/
bool MDL_wait::set_status(enum_wait_status status_arg) {
  bool was_occupied = true;
  mysql_mutex_lock(&m_LOCK_wait_status);
  if (m_wait_status == WS_EMPTY) {
    was_occupied = false;
    m_wait_status = status_arg;
    mysql_cond_signal(&m_COND_wait_status);
  }
  mysql_mutex_unlock(&m_LOCK_wait_status);
  return was_occupied;
}

If the wake-up waiting thread finds that the ticket is in the GRANTED state, it will continue to execute. Otherwise, an error will be reported according to different situations.

4.3 Deadlock Detection

Before each thread enters a lock wait, deadlock detection is performed to prevent the current thread from falling to death. Before detecting deadlocks, materialize the unobtrusive locks obtained by the current thread. This way, these locks appear in the MDL_lock::m_granted linked list, and deadlock detection is possible. Set the waiting lock MDL_context::m_waiting_for of the current thread to the current ticket. Each thread that enters the waiting will set a waiting object, and deadlocks can be detected along this waiting chain.

/** Inform the deadlock detector there is an edge in the wait-for graph. */
void will_wait_for(MDL_wait_for_subgraph *waiting_for_arg) {
  /*
    Before starting wait for any resource we need to materialize
    all "fast path" tickets belonging to this thread. Otherwise
    locks acquired which are represented by these tickets won't
    be present in wait-for graph and could cause missed deadlocks.

    It is OK for context which doesn't wait for any resource to
    have "fast path" tickets, as such context can't participate
    in any deadlock.
  */
  materialize_fast_path_locks();

  mysql_prlock_wrlock(&m_LOCK_waiting_for);
  m_waiting_for = waiting_for_arg;
  mysql_prlock_unlock(&m_LOCK_waiting_for);
}

MDL_wait_for_subgraph

It represents the abstract class of an edge in a waiting graph, which is traversed by the deadlock detection algorithm. MDL_ticket is derived from MDL_wait_for_subgraph. The auxiliary detection class looks for the waiting ring along the edge by implementing the accept_visitor() function.

Deadlock_detection_visitor

The auxiliary class used to detect the waiting ring in the waiting chart contains the status information during the detection process, such as the starting thread m_start_node for deadlock detection. After a lock is detected in the search process, the victim thread m_victim is selected based on the weight. The thread depth of the search. If the waiting chain of the thread is too long and exceeds the threshold (32 by default), a deadlock occurs even if no deadlock is detected.

Implement the enter_node() and leave_node() functions to enter the next thread node and exit and use inspect_edge() to find out whether the current thread node is already the start node and determine whether it is a ring. The victim is determined by opt_change_victim_to() to compare the deadlock weight of the victim.

/**
  Inspect a wait-for graph edge from one MDL context to another.

  @retval true   A loop is found.
  @retval false  No loop is found.
*/
bool Deadlock_detection_visitor::inspect_edge(MDL_context *node) {
  m_found_deadlock = node == m_start_node;
  return m_found_deadlock;
}

/**
  Change the deadlock victim to a new one if it has lower deadlock
  weight.

  @param new_victim New candidate for deadlock victim.
*/
void Deadlock_detection_visitor::opt_change_victim_to(MDL_context *new_victim) {
  if (m_victim == NULL ||
      m_victim->get_deadlock_weight() >= new_victim->get_deadlock_weight()) {
    /* Swap victims, unlock the old one. */
    MDL_context *tmp = m_victim;
    m_victim = new_victim;
    m_victim->lock_deadlock_victim();
    if (tmp) tmp->unlock_deadlock_victim();
  }
}

Detection Process

The method of deadlock detection is to take breadth first, start from the lock that the current thread is waiting for, traverse the MDL_lock waiting queue and grant queue, and check whether any locks are not acquired by the current thread and are incompatible with the waiting lock. If this thread is the same as the starting thread traversed by the algorithm, the lock waiting chain has a deadlock. Start from the holding or waiting thread of an incompatible lock. If the thread is also waiting, recursively repeat the preceding process until the waiting starting thread is found. Otherwise, it is determined that there is no deadlock.

The code logic is listed below:

MDL_context::find_deadlock()
  |--MDL_context::visit_subgraph(MDL_wait_for_graph_visitor *) // If m_waiting_for exists, call the accept_visitor() function of the ticket.
    |--MDL_ticket::accept_visitor(MDL_wait_for_graph_visitor *) // Check the lock acquisition based on the corresponding MDL_lock.
      |--MDL_lock::visit_subgraph() // Recursively traverse the lock grant linked list (m_granted) and wait linked list (m_waiting) to determine whether there is a wait start node (deadlock).
         // Recursively grant the MDL_context of the linked list and the waiting list to find the deadlock.
        |--Deadlock_detection_visitor::enter_node() // Enter the current node first.
        |-- Traverse the granted linked list (m_granted) to determine compatibility.
          |-- If this parameter is not compatible, call Deadlock_detection_visitor::inspect_edge() to determine whether a deadlock occurs.
        |-- Traverse the waiting list (m_waiting), as above.
        |-- Traverse the granted linked list to determine compatibility.
          |--If it is not compatible, recursively call MDL_context::visit_subgraph() to find the connected subgraph. If the ticket that the thread is waiting for already has a clear state and is not WS_EMPTY, it can directly return.
        |-- Traverse the waiting list, as above.
        |--Deadlock_detection_visitor::leave_node() // Leave the current node.
/*
  We do a breadth-first search first -- that is, inspect all
  edges of the current node, and only then follow up to the next
  node. In workloads that involve wait-for graph loops this
  has proven to be a more efficient strategy [citation missing].
*/
while ((ticket = granted_it++)) {
  /* Filter out edges that point to the same node. */
  if (ticket->get_ctx() != src_ctx &&
      ticket->is_incompatible_when_granted(waiting_ticket->get_type()) &&
      gvisitor->inspect_edge(ticket->get_ctx())) {
    goto end_leave_node;
  }
}
...
/* Recurse and inspect all adjacent nodes. */
granted_it.rewind();
while ((ticket = granted_it++)) {
  if (ticket->get_ctx() != src_ctx &&
      ticket->is_incompatible_when_granted(waiting_ticket->get_type()) &&
      ticket->get_ctx()->visit_subgraph(gvisitor)) {
    goto end_leave_node;
  }
}
...

Victim Weight

When an exits along the thread wait chain after a deadlock is detected, it will choose the least weight as the victim based on the weight of each thread waiting ticket. The selected victim will give up the wait and release the lock it holds in the Deadlock_detection_visitor::opt_change_victim_to function.

In terms of weight, it is still relatively rough. It does not consider the stage of the transaction and the content of the statement executed. It only has a preset weight according to the type of lock resource and lock type in the MDL_ticket::get_deadlock_weight() function.

  • DEADLOCK_WEIGHT_DML – DML statements have a minimum weight of 0.
  • DEADLOCK_WEIGHT_ULL – The weight of the user's manual lock is set to 50.
  • DEADLOCK_WEIGHT_DDL – DDL statements have a maximum weight of 100.

It shows that when a deadlock occurs, DML statements are more likely to be rolled back to continue to execute DDL statements. When statements of the same type constitute a deadlock, the threads that enter the waiting chain are more likely to become victims. The threads that have been waiting for a long time continue to wait.

After the current thread sets the status of the victim thread on the deadlock ring to VICTIM and wakes up, the current thread can enter the waiting state.

5. MDL Monitoring

You can use the MySQL performance_schema to monitor the acquisition of the current MDL lock. The performance_schema is a read-only variable. The setting requires a restart. In the configuration file, add:

[mysqld]
performance_schema = ON

Set the MDL metric by using the performance_schema.setup_instruments table:

UPDATE performance_schema.setup_instruments
SET ENABLED = 'YES', TIMED = 'YES'
WHERE NAME = 'wait/lock/metadata/sql/mdl';

After that, you can access the performance_schema.metadata_locks table to monitor MDL acquisition. For example, two threads are in the following states:

connect-1 > BEGIN;                    |
Query OK, 0 rows affected (0.00 sec)  |
                                      |
connect-1 > SELECT * FROM t1;         |
+------+------+------+                |
| a    | b    | c    |                |       
+------+------+------+                |
|    1 |    2 |    3 |                |
|    4 |    5 |    6 |                |
+------+------+------+                |
2 rows in set (0.00 sec)              | # DDL will hang
                                      | connect-2 > ALTER TABLE t1 ADD INDEX i1(a);

Thread 1 transaction is not committed, resulting in thread 2 doing DDL hangs. The access performance_schema.metadata_locks shows that thread 1 holds the SHARED_READ lock of t1, causing thread 2 (which needs to obtain the EXCLUSIVE lock) to wait.

mysql > SELECT * FROM performance_schema.metadata_locks;
+-------------+--------------------+----------------+-------------+-----------------------+---------------------+---------------+-------------+--------------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA      | OBJECT_NAME    | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE           | LOCK_DURATION | LOCK_STATUS | SOURCE             | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+--------------------+----------------+-------------+-----------------------+---------------------+---------------+-------------+--------------------+-----------------+----------------+
| TABLE       | test               | t1             | NULL        |       140734873224192 | SHARED_READ         | TRANSACTION   | GRANTED     | sql_parse.cc:6759  |              68 |             23 |
| GLOBAL      | NULL               | NULL           | NULL        |       140734862726080 | INTENTION_EXCLUSIVE | STATEMENT     | GRANTED     | sql_base.cc:6137   |              69 |              6 |
| SCHEMA      | test               | NULL           | NULL        |       140734862726240 | INTENTION_EXCLUSIVE | TRANSACTION   | GRANTED     | sql_base.cc:6124   |              69 |              6 |
| TABLE       | test               | t1             | NULL        |       140734862726400 | SHARED_UPGRADABLE   | TRANSACTION   | GRANTED     | sql_parse.cc:6759  |              69 |              6 |
| BACKUP LOCK | NULL               | NULL           | NULL        |       140734862726560 | INTENTION_EXCLUSIVE | TRANSACTION   | GRANTED     | sql_base.cc:6144   |              69 |              6 |
| TABLESPACE  | NULL               | test/t1        | NULL        |       140734862727040 | INTENTION_EXCLUSIVE | TRANSACTION   | GRANTED     | lock.cc:811        |              69 |              6 |
| TABLE       | test               | #sql-5a52_a    | NULL        |       140734862726720 | EXCLUSIVE           | STATEMENT     | GRANTED     | sql_table.cc:17089 |              69 |              6 |
| TABLE       | test               | t1             | NULL        |       140734862726880 | EXCLUSIVE           | TRANSACTION   | PENDING     | mdl.cc:4337        |              69 |              6 |
| TABLE       | performance_schema | metadata_locks | NULL        |       140734887891904 | SHARED_READ         | TRANSACTION   | GRANTED     | sql_parse.cc:6759  |              67 |              4 |
+-------------+--------------------+----------------+-------------+-----------------------+---------------------+---------------+-------------+--------------------+-----------------+----------------+
9 rows in set (0.00 sec)

6. Optimization of PolarDB on MDL

Partition table data access operations (DML) and partition maintenance operations (DDL) are mutually blocked in the MySQL Community. The main reason is that DDL needs to obtain the MDL_EXCLUSIVE lock on the partition table. This allows partition maintenance operations only to be performed during low peak hours. It requires a partition table to create or delete partitions frequently, which limits the use of a partition table significantly.

In PolarDB, we introduced partition-level MDL locks to reduce the lock granularity of DML and DDL acquisition to the partition level, improve concurrency, and implement the online partition maintenance function. This makes partition table data access and partition maintenance not affect each other. Users can perform partition maintenance more freely without affecting partition table business traffic, which enhances the flexibility of partition table use significantly.

This feature has been released in PolarDB 8.0.2.2.0 and later. You are welcome to use it.

7. References

[1] Source code MySQL / mysql-server 8.0.18:https://github.com/mysql/mysql-server/tree/mysql-8.0.18

[2] MySQL • Source Code Analysis • MDL Locked Source Code Analysis of Common SQL Statements: http://mysql.taobao.org/monthly/2018/02/01/

[3] Online Partition Maintenance Function: https://www.alibabacloud.com/help/en/polardb-for-mysql/latest/online-partition-maintenance

1 0 0
Share on

ApsaraDB

454 posts | 98 followers

You may also like

Comments

HrimfaxiYKW July 21, 2023 at 9:46 am

Great article! Additionally, I have a question to ask the author. Will all the paths for acquiring mdl converge to acquire_lock? If so, does it mean that I only need to obtain all the mdl_key and their corresponding durations within one thread (thd), in order to construct an mdl_request in another MySQL process and fully replicate the mdl information?