All Products
Search
Document Center

PolarDB:Push down WHERE clauses to IN subqueries

Last Updated:Jun 13, 2025

PolarDB allows you to push down WHERE clauses to IN subqueries to optimize complex queries and enhancing query performance, particularly for nested subqueries.

Prerequisites

This feature is supported in clusters that use the following database engines. For more information about how to query the database engine version, see Query the engine version.

  • MySQL 8.0.1 with revision version 8.0.1.1.42 and later.

  • MySQL 8.0.2 with revision version 8.0.2.19 and later.

Scenarios

In SQL queries, subqueries that are used with GROUP BY often undergo materialization. Consider the case where the subquery is an IN subquery, as in: (FIELD[1],...) IN (SELECT col[1], ... FROM ... GROUP BY ...), and it forms part of the WHERE condition within the main query's AND clause, exemplified by: COND(FIELD[1])... AND (FIELD[1],...) IN (SELECT col[1], ... FROM ... GROUP BY ...). Each row in the result set must satisfy FIELD[1] = col[1]. As a result, the condition COND(FIELD[1]) can be pushed down into the subquery.

Limits

  • Condition pushdown is not supported if the subquery has a clause.

    SELECT * FROM t WHERE a > 1 AND (a, b) IN (SELECT c, MAX(d) FROM t2 GROUP BY c LIMIT 2);
  • Condition pushdown is not supported if the columns in the outer condition and the columns in the materialized table do not meet certain criteria, such as when the column references a subquery or is non-deterministic, or when the column is part of a stored procedure or function.

    • The column may reference a subquery or be non-deterministic, potentially yielding different results under the same input conditions. For instance:

      SELECT * FROM t WHERE a > 5*RAND() AND (a, b) IN (SELECT c, MAX(d) FROM t2 GROUP BY c);
    • The column could be associated with a stored procedure or a function. However, complex stored functions might not benefit from repeated executions and are therefore not considered for pushdown. Example:

      CREATE FUNCTION f1() RETURNS INT
      BEGIN
      ...
      END;
      
      SELECT * FROM t WHERE a > f1() AND (a, b) IN (SELECT c, MAX(d) FROM t2 GROUP BY c);

Usage

Preparations

To utilize condition pushdown, configure the loose_subquery_cond_pushdown_mode parameter based on your requirements. For more information, see Configure cluster and node parameters.

The following table describes the parameter.

Parameter name

Level

Description

loose_subquery_cond_pushdown_mode

Global

The control switch for the condition pushdown feature from the WHERE clause to the IN subquery. Valid values:

  • REPLICA_ON: enables the condition pushdown feature only on read-only nodes.

  • ON: enables the condition pushdown feature.

  • OFF (default): disables the feature of condition pushdown.

Alternatively, control pushdown for specific subqueries using OPTIMIZE HINT (SUBQUERY_CONDITION_PUSHDOWN or NO_SUBQUERY_CONDITION_PUSHDOWN) without altering loose_subquery_cond_pushdown_mode.

Example:

CREATE TABLE t1 (a INT, b INT, c INT, d INT);
CREATE TABLE t2 (e INT, f INT, g INT);
--This query statement indicates that the current query block prohibits pushing conditions down to the subquery @subq1--
SELECT  /*+ NO_SUBQUERY_CONDITION_PUSHDOWN(@subq1) */ * FROM t1
WHERE t1.c<25 AND
(t1.a,t1.c) IN (SELECT /*+ QB_NAME(subq1) */ t2.e,MAX(t2.g) FROM t2 WHERE t2.e<5 GROUP BY t2.e);

--This query statement indicates that the current query block will push conditions that meet the rule down to the subquery @subq1--
SELECT  /*+ SUBQUERY_CONDITION_PUSHDOWN(@subq1) */ * FROM t1
WHERE t1.c<25 AND
(t1.a,t1.c) IN (SELECT /*+ QB_NAME(subq1) */ t2.e,MAX(t2.g) FROM t2 WHERE t2.e<5 GROUP BY t2.e);

Example 1

To compare the execution plan before and after enabling the feature for pushing downWHERE clauses to IN subqueries, execute the follwing code:

EXPLAIN FORMAT=TREE SELECT * FROM t1
WHERE t1.c<25 AND
(t1.a,t1.c) IN (SELECT t2.e,MAX(t2.g) FROM t2 WHERE t2.e<5 GROUP BY t2.e);

Sample result:

-- Before enabling loose_subquery_cond_pushdown_mode
-> Filter: ((t1.c < 25) and <in_optimizer>((t1.a,t1.c),(t1.a,t1.c) in (select #2)))  (cost=*** rows=***)
    -> Table scan on t1  (cost=*** rows=***)
    -> Select #2 (subquery in condition; run only once)
        -> Filter: ((t1.a = `<materialized_subquery>`.e) and (t1.c = `<materialized_subquery>`.`MAX(t2.g)`))
            -> Limit: 1 row(s)
                -> Index lookup on <materialized_subquery> using <auto_distinct_key> (e=t1.a, MAX(t2.g)=t1.c)
                    -> Materialize with deduplication
                        -> Table scan on <temporary>
                            -> Aggregate using temporary table
                                -> Filter: (t2.e < 5)  (cost=*** rows=***)
                                    -> Table scan on t2  (cost=*** rows=***)
                                
-- After enabling loose_subquery_cond_pushdown_mode
-> Filter: ((t1.c < 25) and <in_optimizer>((t1.a,t1.c),(t1.a,t1.c) in (select #2)))  (cost=*** rows=***)
    -> Table scan on t1  (cost=*** rows=***)
    -> Select #2 (subquery in condition; run only once)
        -> Filter: ((t1.a = `<materialized_subquery>`.e) and (t1.c = `<materialized_subquery>`.`MAX(t2.g)`))
            -> Limit: 1 row(s)
                -> Index lookup on <materialized_subquery> using <auto_distinct_key> (e=t1.a, MAX(t2.g)=t1.c)
                    -> Materialize with deduplication
                        -> Filter: (max(t2.g) < 25)
                            -> Table scan on <temporary>
                                -> Aggregate using temporary table
                                    -> Filter: (t2.e < 5)  (cost=*** rows=***)
                                        -> Table scan on t2  (cost=*** rows=***)                        
Note

Parsing the SQL semantics reveals that the query result set for table t1 must adhere to the equivalence relationship between t1.c and the projected column MAX(t2.g) within the subquery. Consequently, given the main query's WHERE condition t1.c < 25, the subquery is also bound to fulfill MAX(t2.g)<25. The optimizer leverages the capability to push down WHERE conditions into subqueries, allowing the condition t1.c<25 to cascade to the subquery. However, because the GROUP BY clause does not encompass the g column, this condition can only be applied to the subquery's HAVING clause.

Example 2

To compare the execution plan before and after enabling the feature for pushing downWHERE clauses to IN subqueries, execute the follwing code:

EXPLAIN FORMAT=TREE SELECT * FROM t1
WHERE ((t1.a<2 OR t1.a=5) AND t1.b>3) AND
(t1.a,t1.b,t1.c) IN
(
SELECT t2.e,t2.f,MAX(t2.g)
FROM t2
WHERE t2.e<5
GROUP BY t2.e,t2.f
);

Sample result:

-- Before enabling loose_subquery_cond_pushdown_mode
-> Filter: ((t1.b > 3) and ((t1.a < 2) or (t1.a = 5)) and <in_optimizer>((t1.a,t1.b,t1.c),(t1.a,t1.b,t1.c) in (select #2)))  (cost=*** rows=***)
    -> Table scan on t1  (cost=*** rows=***)
    -> Select #2 (subquery in condition; run only once)
        -> Filter: ((t1.a = `<materialized_subquery>`.e) and (t1.b = `<materialized_subquery>`.f) and (t1.c = `<materialized_subquery>`.`MAX(t2.g)`))
            -> Limit: 1 row(s)
                -> Index lookup on <materialized_subquery> using <auto_distinct_key> (e=t1.a, f=t1.b, MAX(t2.g)=t1.c)
                    -> Materialize with deduplication
                        -> Table scan on <temporary>
                            -> Aggregate using temporary table
                                -> Filter: (t2.e < 5)  (cost=*** rows=***)
                                    -> Table scan on t2  (cost=*** rows=***)

-- After enabling loose_subquery_cond_pushdown_mode
-> Filter: ((t1.b > 3) and ((t1.a < 2) or (t1.a = 5)) and <in_optimizer>((t1.a,t1.b,t1.c),(t1.a,t1.b,t1.c) in (select #2)))  (cost=*** rows=***)
    -> Table scan on t1  (cost=*** rows=***)
    -> Select #2 (subquery in condition; run only once)
        -> Filter: ((t1.a = `<materialized_subquery>`.e) and (t1.b = `<materialized_subquery>`.f) and (t1.c = `<materialized_subquery>`.`MAX(t2.g)`))
            -> Limit: 1 row(s)
                -> Index lookup on <materialized_subquery> using <auto_distinct_key> (e=t1.a, f=t1.b, MAX(t2.g)=t1.c)
                    -> Materialize with deduplication
                        -> Table scan on <temporary>
                            -> Aggregate using temporary table
                                -> Filter: ((t2.e < 5) and (t2.f > 3) and ((t2.e < 2) or (t2.e = 5)))  (cost=*** rows=***)
                                    -> Table scan on t2  (cost=*** rows=***)
                                                                                                               
Note

In the main query of an SQL query, the WHERE condition is slightly more complex. However, by analyzing the principle of pushing down subqueries under the WHERE condition, it can be found that the columns t1.a, t1.b, t1.c in the main query table t1 are equivalent to the projection columns t2.e, t2.f, MAX(t2.g) in the subquery. Therefore, for the WHERE condition ((t1.a<2 OR t1.a=5) AND t1.b>3) connected by AND in the main query, it can be deduced that the corresponding columns in the subquery should meet the condition ((t2.e<2 OR t2.e=5) AND t2.f.b>3). After enabling the feature of pushing down subqueries under the WHERE condition, the optimizer can push down the WHERE condition ((t1.a<2 OR t1.a=5) AND t1.b>3) to the subquery. The analysis shows that the columns in the subquery GROUP BY include t2.e, t2.f, so it is pushed down to the WHERE clause of the subquery.