Solutions to the 'failed to obtain a metadata lock' error when executing a DDL statement - PolarDB

This topic describes how to resolve the "failed to obtain a metadata lock" error that occurs when you execute a DDL operation in PolarDB for MySQL.

Problem description

When you execute a DDL operation in a PolarDB for MySQL database, an error message is returned, indicating that a metadata lock cannot be obtained. The following error messages may be returned:

ERROR HY000: Fail to get MDL on replica during DDL synchronize

ERROR HY000: Fail to get table lock on replica; you can 'set polar_support_mdl_sync_preemption = ON' and try restarting transaction

Causes

This error occurs because a read-only node of the PolarDB for MySQL cluster has ongoing queries or uncommitted transactions.

Solutions

Choose one of the following solutions to resolve the issue:

Enable the preemptive DDL feature for the read-only node. For more information, see Preemptive DDL.
Commit or roll back the uncommitted transactions on the read-only node.
Enable the polar_slave_work_on_nonblock_mdl_mode parameter on the read-only node to prevent uncommitted long-running transactions from blocking DDL operations. For more information, see Prevent long-running transactions on read-only nodes from blocking DDL operations.
Log on to the PolarDB console. In the navigation pane on the left, choose Diagnostics and Optimization > Quick Diagnostics. On the Session Manager tab, check for abnormal sessions. If abnormal sessions exist, click View Details in the anomaly section to view the session details. End the sessions that have long-running uncommitted transactions. For other abnormal sessions, you can optimize or end them as needed. For more information, see Session Manager.
Use the Polar Performance Schema feature to query the metadata lock (MDL) status of the target table. Then, stop the threads related to uncommitted transactions on the read-only node. The steps are as follows:
1. Check the status of the Polar Performance Schema feature for the cluster. You can check the status in one of the following two ways:
  - Log on to the PolarDB console. In the navigation pane on the left, choose Settings and Management > Parameter Settings to view the value of the loose_polar_performance_schema parameter.
    To ensure compatibility with MySQL configuration files, all cluster parameters in the console are prefixed with loose_.
  - Execute the following SQL statement:
```
SHOW VARIABLES LIKE 'polar_performance_schema';
```
```
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| polar_performance_schema | ON    |
+--------------------------+-------+
```
2. The procedure varies depending on whether the parameter is enabled or disabled.
  Enabled (ON)
  Query the performance_schema.metadata_locks table to check the MDL status on the target table and determine whether any transactions are uncommitted. Then, use the kill command to stop the threads for the uncommitted transactions.
  Use a hint to specify a read-only node and execute the following SQL statement to query the MDL status on the target table.
  /*force_node='pi-bp10k7631d6k3****'*/ SELECT t.PROCESSLIST_ID, m.OBJECT_TYPE, m.OBJECT_SCHEMA, m.OBJECT_NAME, m.LOCK_TYPE, m.LOCK_DURATION, m.LOCK_STATUS FROM performance_schema.metadata_locks m LEFT JOIN performance_schema.threads t ON m.owner_thread_id=t.thread_id;
  Sample result:
  The preceding figure shows that the test01/t1 table holds a lock with LOCK_TYPE set to SHARED_READ due to a large query or an uncommitted transaction. The test/t1 table also holds a lock in the PENDING state with LOCK_TYPE set to EXCLUSIVE.
  Use a hint to specify a read-only node and execute the kill command to stop the thread that corresponds to the connection.
  Note
  If the uncommitted transaction on the read-only node is important, do not use the kill command to stop it directly. Wait for the transaction to complete before you execute the DDL operation.
  If you cannot stop the thread using the kill command, the thread ID does not exist, or the following error occurs, contact us for assistance.
  ERROR 1094 (HY000): Unknown thread id: xxx
  /*force_node='pi-bp10k7631d6k3****'*/ kill 536976473;
  Disabled (OFF)
  Query the information_schema.innodb_trx table to check the MDL status on the target table and determine whether any transactions are uncommitted. Then, use the kill command to stop the threads for the uncommitted transactions.
  Use a hint to specify a read-only node and execute the following SQL statement to query the information_schema.innodb_trx table.
  /*force_node='pi-bp10k7631d6k3****'*/ SELECT * FROM information_schema.innodb_trx\G
  Check the result to determine whether the issue is caused by a large query or a large transaction:
  DDL execution failed due to a large query. Sample result:
  The preceding figure shows that a large query exists on the t1 table. This indicates that the current connection holds the MDL on the t1 table.
  DDL execution failed due to a large transaction. Sample result:
  The preceding figure shows that transaction 537247177 is uncommitted. Because the trx_query field is empty, you cannot determine whether this transaction holds the MDL on the current table. In this case, check the trx_started field. If the time in the trx_started field is much earlier than the current time, it is highly likely that transaction 537247177 holds the MDL.
  Use a hint to specify a read-only node and execute the kill command to stop the trx_mysql_thread_id thread that corresponds to the connection.
  /*force_node='pi-bp10k7631d6k3****'*/ kill 537247177

Contact us

If you have any questions about DDL operations, please contact technical support.

More information

The DDL operation process in the cloud-native database PolarDB is as follows:

During a DDL operation, if the table schema needs to be changed, the primary node first obtains an MDL and then writes a redo log entry before the change.
When a read-only node parses this redo log entry, it attempts to obtain the MDL on the same table. One of the following two situations occurs:
- The retrieval was successful. You can now proceed to the next step.
- The lock cannot be obtained. The read-only node sends feedback to the primary node.
The primary node waits for all read-only nodes to synchronize with the latest replication offset. Within a specific period, the primary node determines whether all read-only nodes have parsed the redo log entry and successfully obtained the lock. One of the following two situations occurs:
- All read-only nodes synchronize with the latest replication offset. The DDL operation succeeds.
- Some read-only nodes fail to synchronize with the latest replication offset. The DDL operation is rolled back and an error is reported. Two types of errors can be reported: a synchronization timeout error or the lock acquisition failure error described in this topic. Lock acquisition fails on a read-only node primarily because of ongoing queries or uncommitted transactions.

FAQ

Why can't the kill command stop a specific session?

When you execute the kill command, the system marks the session for termination and waits for the kernel to complete the stop operation. If the session involves a large transaction, the session is stopped only after the transaction rollback is complete. You can query the information_schema.innodb_trx table to check whether the session is in a rollback state. The query statement is:

SELECT TRX_ID, TRX_STATE, TRX_STARTED, TRX_QUERY FROM information_schema.innodb_trx WHERE trx_mysql_thread_id = <Session ID>;

If the TRX_STATE value in the query result is ROLLING BACK, you must wait for the transaction to finish rolling back. The kill operation is then completed automatically.

Problem description

Causes

Solutions

Enabled (ON)

Disabled (OFF)

Contact us

More information

FAQ