Redundant or pre-computable subqueries in SQL statements increase the overhead of query analysis and execution, which degrades database performance. This issue is common when Object-Relational Mapping (ORM) frameworks automatically generate complex nested queries.PolarDB for MySQL provides a subquery optimization feature that rewrites SQL statements during the optimizer stage. It automatically removes unnecessary subqueries and pre-computes constant subqueries to simplify execution plans and significantly improve query performance.
Scope
Product series: Cluster Edition and Standard Edition.
Kernel version: MySQL 8.0.2, revision 8.0.2.2.19 or later.
Enable subquery optimization
You can set the loose_simplify_subq_mode parameter to control the behavior of this optimization feature.
The method for modifying PolarDB cluster parameters differs between the console and a database session. The differences are as follows:
In the PolarDB console
Compatibility: Some cluster parameters in the PolarDB console have the loose_ prefix for compatibility with MySQL configuration files.
Procedure: Find and modify the parameters that have the
loose_prefix.
In a database session (using the command line or a client)
Procedure: When you connect to the database and use the
SETcommand to modify a parameter, remove theloose_prefix and use the original parameter name.
Parameter | Level | Description |
| Global/Session | The main switch for this feature. Valid values are:
|
Optimization scenarios and examples
Scenario 1: Eliminate redundant SELECT nesting
If a subquery only wraps an aggregate function or expression and has no other complex logic, such as in SELECT (SELECT SUM(a) FROM t2) FROM dual, the optimizer removes the outer SELECT and runs the core expression directly.
-- In a projection column
SELECT (SELECT SUM(a) FROM t2) FROM dual;
-- After optimization
SELECT SUM(`test`.`t2`.`a`) AS `sum(a)` FROM `test`.`t2`
-- In a HAVING clause
SELECT SUM(a) FROM t2 HAVING (SELECT(SELECT(SELECT count(b))));
-- After optimization
SELECT SUM(`testdb`.`t2`.`a`) AS `SUM(a)` from `testdb`.`t2` HAVING (0 <> count(`testdb`.`t2`.`b`))Prepare test data:
DROP TABLE IF EXISTS t2; CREATE TABLE t2 ( id INT PRIMARY KEY AUTO_INCREMENT, a INT, b INT ); INSERT INTO t2 (a, b) VALUES (10, 100), (20, NULL), (50, 200), (120, NULL);Disable optimization and run the query: First, disable the subquery optimization feature in the session to view the execution plan before optimization.
SET simplify_subq_mode = 'OFF'; EXPLAIN SELECT SUM(a) FROM t2 HAVING (SELECT(SELECT(SELECT count(b))));SHOW warnings; -- Result /* select#1 */ select sum(`testdb`.`t2`.`a`) AS `SUM(a)` from `testdb`.`t2` having (0 <> (/* select#2 */ select (/* select#3 */ select (/* select#4 */ select count(`testdb`.`t2`.`b`)))))Enable optimization and run the query: Now, enable the subquery optimization feature in the session and run the same query.
SET simplify_subq_mode = 'ON'; EXPLAIN SELECT SUM(a) FROM t2 HAVING (SELECT(SELECT(SELECT count(b))));SHOW warnings; -- Result /* select#1 */ select sum(`testdb`.`t2`.`a`) AS `SUM(a)` from `testdb`.`t2` having (0 <> count(`testdb`.`t2`.`b`))
Scenario 2: Pre-evaluate [NOT] EXISTS subqueries
If the optimizer determines that a [NOT] EXISTS subquery will always evaluate to true (non-empty result) or false (empty result), it replaces the [NOT] EXISTS clause with TRUE or FALSE. This replacement prevents the subquery from being run.
-- Non-empty set
SELECT * FROM t1 WHERE EXISTS(SELECT MAX(a) FROM t2);
-- After optimization
SELECT * FROM t1
-- Empty set, WHERE/HAVING clause=false, or LIMIT 0
SELECT * FROM t1 WHERE EXISTS(SELECT max(a) FROM t2 HAVING 1=2 );
-- After optimization
SELECT * FROM t1 WHERE falsePrepare test data:
DROP TABLE IF EXISTS t1; DROP TABLE IF EXISTS t2; CREATE TABLE t1 (id INT); CREATE TABLE t2 (val INT); INSERT INTO t1 VALUES (1), (2);Disable optimization and run the query: First, disable the subquery optimization feature in the session to view the execution plan before optimization.
SET simplify_subq_mode = 'OFF'; EXPLAIN SELECT * FROM t1 WHERE EXISTS(SELECT MAX(a) FROM t2);SHOW warnings; -- Result /* select#1 */ select `testdb`.`t1`.`id` AS `id` from `testdb`.`t1` where exists(/* select#2 */ select max(`testdb`.`t1`.`id`) from `testdb`.`t2`)Enable optimization and run the query: Now, enable the subquery optimization feature in the session and run the same query.
SET simplify_subq_mode = 'ON'; EXPLAIN SELECT * FROM t1 WHERE EXISTS(SELECT MAX(a) FROM t2);SHOW warnings; -- Result /* select#1 */ select `testdb`.`t1`.`id` AS `id` from `testdb`.`t1`
Scenario 3: Add LIMIT 1 to constant projections in ANY/ALL subqueries
If a subquery in an ANY or ALL clause queries only a constant and does not involve any columns, retrieving multiple identical rows is unnecessary. The optimizer automatically adds LIMIT 1 to these subqueries to avoid unnecessary full table scans on the table in the subquery.
-- Before optimization
SELECT * FROM t1 WHERE a > ANY (SELECT 1 FROM t2);
-- After optimization
SELECT * FROM t1 WHERE a > ANY (SELECT 1 FROM t2 LIMIT 1);Recommendations for production environments
Test thoroughly: Although this optimization guarantees logical equivalence in most cases, a risk exists in rare scenarios that depend on a specific execution order or the number of subquery executions. Before you use this feature in a production environment for core business operations, enable it in a staging environment and perform full regression testing.
Update statistics regularly: Some optimizations, such as inferring empty or non-empty sets, rely on table statistics. Outdated statistics can cause the optimizer to make suboptimal choices. To prevent this, run
ANALYZE TABLEon your tables regularly.