Diagnose causes of excessive active threads on RDS for MySQL - ApsaraDB RDS

On a healthy ApsaraDB RDS for MySQL instance, the active thread count is typically below 10—or up to 20 to 30 on high-spec, high-QPS instances. When the count exceeds 100 or 1,000, SQL queries pile up, response times degrade, and in severe cases the instance stops processing queries entirely.

This topic covers the four most common root causes and how to resolve each one.

Check the active thread count

Log on to the ApsaraDB RDS console and use one of the following methods:

Monitoring and Alerts: In the left-side navigation pane, click Monitoring and Alerts. On the Standard Monitoring tab, click Standard View.
DAS: In the left-side navigation pane, choose Autonomy Service > Dashboard. On the Performance Trends tab, a high thread count indicates that some sessions are blocked.

Slow SQL queries

Symptom: Run SHOW PROCESSLIST. If many sessions are scanning an excessively large number of rows, slow queries are driving up the active thread count. To see a detailed log, navigate to Autonomy Service > Slow Query Logs.

Resolution: Enable SQL throttling or terminate the offending sessions to stop the pile-up. For details, see SQL throttling.

Table cache exhaustion

Symptom: Sessions switch to the Opening table state when the table cache is too small to handle the current queries per second (QPS) or the total number of open tables.

Resolution: Increase table_open_cache and table_open_cache_instances.

Changing table_open_cache takes effect immediately without restarting the instance. Changing table_open_cache_instances requires a restart.

Metadata lock contention

Symptom: Sessions show the Waiting for table metadata lock state. In the Prepare and Commit phases, DDL statements must acquire metadata locks on tables. If a table is involved in an uncommitted transaction or a slow query, the DDL statement is blocked—which in turn blocks all subsequent queries on that table.

Resolution: Abort all uncommitted transactions, slow queries, and ongoing DDL statements on the affected table.

Row lock conflicts

Symptom: High values for Innodb_row_lock_waits and Innodb_row_lock_time indicate row lock contention. To view these metrics, navigate to Autonomy Service > Dashboard, click the Performance Trends tab, and check the Row Lock section.

Diagnosis: Run SHOW ENGINE INNODB STATUS; and look for sessions in the Lock wait state. A large number of such sessions confirms severe row lock conflicts.

Resolution: Apply the following measures to reduce contention:

Optimize hot data updates: Spread write load across rows instead of concentrating updates on the same rows.
Reduce transaction sizes: Break large transactions into smaller ones so locks are held for shorter periods.
Commit promptly: Commit transactions as quickly as possible to release row locks and unblock waiting sessions.