When a query combines an outer WHERE condition with an IN subquery that uses GROUP BY, PolarDB for MySQL can push the outer condition directly into the subquery before materialization. This reduces the number of rows materialized and improves query performance.
Prerequisites
Before you begin, ensure that your cluster runs one of the following engine versions. To check the version, see Query the engine version.
MySQL 8.0.1, revision version 8.0.1.1.42 or later
MySQL 8.0.2, revision version 8.0.2.19 or later
How it works
When a query matches the following pattern, the optimizer pushes the outer WHERE condition into the subquery before materialization:
SELECT ... FROM t
WHERE <condition on t.col>
AND (t.col, ...) IN (SELECT col, ... FROM t2 GROUP BY ...);Because every row in the result set must satisfy t.col = subquery_col, the condition on t.col implies an equivalent constraint on subquery_col. The optimizer rewrites the subquery to apply this constraint earlier, so fewer rows are materialized and scanned.
Example of the rewrite: The query
SELECT * FROM t WHERE a < 5 AND (a, b) IN (SELECT c, MAX(d) FROM t2 GROUP BY c);is internally rewritten to:
SELECT * FROM t WHERE a < 5 AND (a, b) IN (SELECT c, MAX(d) FROM t2 WHERE c < 5 GROUP BY c);Where the pushed condition lands depends on whether the correlated column appears in GROUP BY:
| Column position | Pushed to |
|---|---|
In GROUP BY | Subquery WHERE clause |
Not in GROUP BY (for example, MAX(col)) | Subquery HAVING clause |
Limitations
The following cases do not support condition pushdown:
LIMITclause in the subquery — the subquery result is already bounded, so applying the filter afterLIMITwould change query semantics.SELECT * FROM t WHERE a > 1 AND (a, b) IN (SELECT c, MAX(d) FROM t2 GROUP BY c LIMIT 2);Non-deterministic outer condition or subquery reference — the outer condition column references a subquery or is non-deterministic (for example,
RAND()), potentially yielding different results under the same input conditions.SELECT * FROM t WHERE a > 5*RAND() AND (a, b) IN (SELECT c, MAX(d) FROM t2 GROUP BY c);Stored function in the outer condition — complex stored functions might not benefit from repeated executions and are therefore not considered for pushdown.
CREATE FUNCTION f1() RETURNS INT BEGIN ... END; SELECT * FROM t WHERE a > f1() AND (a, b) IN (SELECT c, MAX(d) FROM t2 GROUP BY c);
Enable condition pushdown
Configure the parameter
Set the loose_subquery_cond_pushdown_mode parameter at the Global level. For instructions, see Configure cluster and node parameters.
| Parameter | Level | Description |
|---|---|---|
loose_subquery_cond_pushdown_mode | Global | Controls condition pushdown from WHERE clauses to IN subqueries. Valid values: OFF (default, feature disabled), ON (enables condition pushdown), REPLICA_ON (enabled on read-only nodes only). |
Use optimizer hints
To control pushdown for individual queries without changing loose_subquery_cond_pushdown_mode, use optimizer hints with QB_NAME to name the target query block:
SUBQUERY_CONDITION_PUSHDOWN(@<qb_name>)— forces pushdown for the named query block.NO_SUBQUERY_CONDITION_PUSHDOWN(@<qb_name>)— suppresses pushdown for the named query block.
CREATE TABLE t1 (a INT, b INT, c INT, d INT);
CREATE TABLE t2 (e INT, f INT, g INT);
-- Suppress pushdown for the subquery named subq1
SELECT /*+ NO_SUBQUERY_CONDITION_PUSHDOWN(@subq1) */ * FROM t1
WHERE t1.c < 25
AND (t1.a, t1.c) IN (SELECT /*+ QB_NAME(subq1) */ t2.e, MAX(t2.g) FROM t2 WHERE t2.e < 5 GROUP BY t2.e);
-- Force pushdown for the subquery named subq1
SELECT /*+ SUBQUERY_CONDITION_PUSHDOWN(@subq1) */ * FROM t1
WHERE t1.c < 25
AND (t1.a, t1.c) IN (SELECT /*+ QB_NAME(subq1) */ t2.e, MAX(t2.g) FROM t2 WHERE t2.e < 5 GROUP BY t2.e);Examples
The examples below use the following tables:
CREATE TABLE t1 (a INT, b INT, c INT, d INT);
CREATE TABLE t2 (e INT, f INT, g INT);Example 1: Condition pushed to HAVING
Run EXPLAIN FORMAT=TREE to compare execution plans before and after enabling condition pushdown:
EXPLAIN FORMAT=TREE SELECT * FROM t1
WHERE t1.c < 25
AND (t1.a, t1.c) IN (SELECT t2.e, MAX(t2.g) FROM t2 WHERE t2.e < 5 GROUP BY t2.e);Output:
-- Before enabling loose_subquery_cond_pushdown_mode
-> Filter: ((t1.c < 25) and <in_optimizer>((t1.a,t1.c),(t1.a,t1.c) in (select #2))) (cost=*** rows=***)
-> Table scan on t1 (cost=*** rows=***)
-> Select #2 (subquery in condition; run only once)
-> Filter: ((t1.a = `<materialized_subquery>`.e) and (t1.c = `<materialized_subquery>`.`MAX(t2.g)`))
-> Limit: 1 row(s)
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (e=t1.a, MAX(t2.g)=t1.c)
-> Materialize with deduplication
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Filter: (t2.e < 5) (cost=*** rows=***)
-> Table scan on t2 (cost=*** rows=***)
-- After enabling loose_subquery_cond_pushdown_mode
-> Filter: ((t1.c < 25) and <in_optimizer>((t1.a,t1.c),(t1.a,t1.c) in (select #2))) (cost=*** rows=***)
-> Table scan on t1 (cost=*** rows=***)
-> Select #2 (subquery in condition; run only once)
-> Filter: ((t1.a = `<materialized_subquery>`.e) and (t1.c = `<materialized_subquery>`.`MAX(t2.g)`))
-> Limit: 1 row(s)
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (e=t1.a, MAX(t2.g)=t1.c)
-> Materialize with deduplication
-> Filter: (max(t2.g) < 25)
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Filter: (t2.e < 5) (cost=*** rows=***)
-> Table scan on t2 (cost=*** rows=***)t1.c is equivalent to MAX(t2.g) in the subquery, so the outer condition t1.c < 25 implies MAX(t2.g) < 25. Because g is not in GROUP BY, the optimizer pushes the condition as a HAVING filter — visible as Filter: (max(t2.g) < 25) inside the materialization step.
Example 2: Condition pushed to WHERE
EXPLAIN FORMAT=TREE SELECT * FROM t1
WHERE ((t1.a < 2 OR t1.a = 5) AND t1.b > 3)
AND (t1.a, t1.b, t1.c) IN (
SELECT t2.e, t2.f, MAX(t2.g)
FROM t2
WHERE t2.e < 5
GROUP BY t2.e, t2.f
);Output:
-- Before enabling loose_subquery_cond_pushdown_mode
-> Filter: ((t1.b > 3) and ((t1.a < 2) or (t1.a = 5)) and <in_optimizer>((t1.a,t1.b,t1.c),(t1.a,t1.b,t1.c) in (select #2))) (cost=*** rows=***)
-> Table scan on t1 (cost=*** rows=***)
-> Select #2 (subquery in condition; run only once)
-> Filter: ((t1.a = `<materialized_subquery>`.e) and (t1.b = `<materialized_subquery>`.f) and (t1.c = `<materialized_subquery>`.`MAX(t2.g)`))
-> Limit: 1 row(s)
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (e=t1.a, f=t1.b, MAX(t2.g)=t1.c)
-> Materialize with deduplication
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Filter: (t2.e < 5) (cost=*** rows=***)
-> Table scan on t2 (cost=*** rows=***)
-- After enabling loose_subquery_cond_pushdown_mode
-> Filter: ((t1.b > 3) and ((t1.a < 2) or (t1.a = 5)) and <in_optimizer>((t1.a,t1.b,t1.c),(t1.a,t1.b,t1.c) in (select #2))) (cost=*** rows=***)
-> Table scan on t1 (cost=*** rows=***)
-> Select #2 (subquery in condition; run only once)
-> Filter: ((t1.a = `<materialized_subquery>`.e) and (t1.b = `<materialized_subquery>`.f) and (t1.c = `<materialized_subquery>`.`MAX(t2.g)`))
-> Limit: 1 row(s)
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (e=t1.a, f=t1.b, MAX(t2.g)=t1.c)
-> Materialize with deduplication
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Filter: ((t2.e < 5) and (t2.f > 3) and ((t2.e < 2) or (t2.e = 5))) (cost=*** rows=***)
-> Table scan on t2 (cost=*** rows=***)t1.a and t1.b map to t2.e and t2.f, both of which are in GROUP BY. The optimizer pushes the compound condition ((t1.a < 2 OR t1.a = 5) AND t1.b > 3) directly to the subquery's WHERE clause as ((t2.e < 2 OR t2.e = 5) AND t2.f > 3).
Key rule: Whether a pushed condition becomes a WHERE filter or a HAVING filter depends on where the correlated column appears. Columns in GROUP BY produce a WHERE filter (Example 2); columns outside GROUP BY, such as aggregate results, produce a HAVING filter (Example 1).