This topic describes how to troubleshoot the error message "ERROR HY000: Fail to get MDL on replica during DDL synchronize" when you execute a DDL statement in PolarDB for MySQL.
Problem description
The following error message is returned when I execute a DDL statement in PolarDB for MySQL:
ERROR HY000: Fail to get MDL on replica during DDL synchronize
Cause
A read-only node of the PolarDB for MySQL cluster has ongoing queries or uncommitted transactions.
Solution
To resolve this issue, PolarDB for MySQL 8.0.1 and 8.0.2 support the preemptible DDL feature. For more information about the feature, see Preemptible DDL.
Commit (commit) or roll back (rollback) the transactions.
You can also set the
polar_slave_work_on_nonblock_mdl_mode
parameter to ON on the read-only node to prevent long-running transactions that are not committed from blocking DDL operations. For more information, see Prevent long-running transactions on read-only nodes from blocking DDL operations.Execute the DDL statement.
If the DDL statement is executed, the task ends.
If the DDL statement fails to be executed, proceed to the next step.
If the Polar performance schema feature is enabled for your cluster, you can use the feature to query the MDL status in a specified table and then kill the threads related to uncommitted transactions on the read-only nodes. Perform the following operations:
NoteYou can execute the following SQL statement to check whether the Polar performance schema feature is enabled for the cluster:
SHOW VARIABLES LIKE 'polar_performance_schema';
If the uncommitted transaction on the read-only node is very important, we recommend that you do not directly kill the transaction. Wait until the transaction is complete before you execute the DDL statement.
Enable the Polar performance schema feature. For more information about how to enable the Polar performance schema feature, see Procedure.
If the threads cannot be killed, the thread IDs do not exist, or the following error occurs. Submit a ticket to contact technical support.
ERROR 1094 (HY000): Unknown thread id: xxx
Use the hint to specify a read-only node on which you can execute the following SQL statement to query the MDL status on the specified table.
/*force_node='pi-bp10k7631d6k3****'*/ SELECT t.PROCESSLIST_ID, m.OBJECT_TYPE, m.OBJECT_SCHEMA, m.OBJECT_NAME, m.LOCK_TYPE, m.LOCK_DURATION, m.LOCK_STATUS FROM performance_schema.metadata_locks m LEFT JOIN performance_schema.threads t ON m.owner_thread_id=t.thread_id;
Sample result:
The result indicates that the
test/t
table holds the lock withLOCK_TYPE
set toSHARED_READ
because a large query or uncommitted transactions occur on the table. Thetest/t
table also holds a lock in thePENGING
state withLOCK_TYPE
set toEXCLUSIVE
. Use the hint to specify a read-only node on which you can execute the/*force_node='pi-bp10k7631d6k3****'*/ kill 14
statement to kill the thread.If the Polar performance schema feature is disabled for the cluster, you can query the MDL status on the
information_schema.innodb_trx
table to determine whether any transactions are not committed. If yes, kill the threads. Perform the following operations:Use the hint to specify a read-only node on which you can execute the following SQL statement to query the MDL status on the
information_schema.innodb_trx
table./*force_node='pi-bp10k7631d6k3****'*/ SELECT * FROM information_schema.innodb_trx\G
Sample result for the failed DDL execution caused by large queries:
The result indicates that the
t1
table holds the MDL because a large query occurs on thet1
table. In this case, use the hint to specify a read-only node on which you can execute the/*force_node='pi-bp10k7631d6k3****'*/ kill 6
statement to kill thetrx_mysql_thread_id
thread.Sample result for the failed DDL execution caused by long-running transactions:
The result indicates that transaction 14 is not committed. However, it cannot be determined that the current transaction causes the table to hold the MDL because the
trx_query
field is empty. You can perform operations based on thetrx_started
field value. If the time in thetrx_started
field is significantly different from the current time, it is very likely that transaction 14 causes the table to hold the MDL. In this case, use the hint to specify a read-only node on which you can execute the/*force_node='pi-bp10k7631d6k3****'*/ kill 14
statement to kill thetrx_mysql_thread_id
thread.
Additional information
The execution process of a DDL statement in PolarDB:
If you want to change the schema of a table in the execution process of a DDL statement, the primary node obtains the MDL and then writes a redo log.
When a read-only node parses the redo log, it tries to obtain the MDL in the same table. The following situations may occur:
If the read-only node obtains the MDL, the next step starts.
Otherwise, the read-only node returns the error to the primary node.
The primary node waits until all read-only nodes are synchronized to the latest log sequence number (LSN). Within a period of time, the primary node determines whether all read-only nodes have parsed the redo log and obtained the MDL. The following situations may occur:
All read-only nodes are synchronized to the latest LSN, the DDL statement is executed as expected.
Some read-only nodes are not synchronized to the latest LSN. The DDL statement is rolled back and an error is reported. Two types of errors may occur: one is due to synchronization timeout, and the other is due to failure to obtain the MDL. The main cause for failure to obtain the MDL is that ongoing queries or uncommitted transactions occur on the read-only node.