All Products
Search
Document Center

PolarDB:Correlated subquery pull-up

Last Updated:Mar 28, 2026

When an IN or ANY subquery references variables from the outer query, the standard PostgreSQL optimizer cannot pull it up. The subquery runs once per row in the outer result set—execution time scales linearly with table size. PolarDB for PostgreSQL adds the polar_enable_pullup_with_lateral parameter to remove this restriction: the optimizer rewrites the subquery as a semi-join and processes both tables in a single pass.

Prerequisites

Before you begin, make sure you have:

  • A PolarDB for PostgreSQL cluster running PostgreSQL 14

  • Minor engine version 2.0.14.8.11.0 or later

To check your minor engine version, run SHOW polardb_version; or view the minor engine version in the console. If your cluster does not meet the version requirement, upgrade the minor engine version.

How it works

The PostgreSQL optimizer uses SubLink to represent a subquery and its associated operators in an expression. The three types relevant to pull-up are:

TypeImplements
EXISTS_SUBLINKEXISTS (SELECT ...)
ALL_SUBLINKALL (SELECT ...)
ANY_SUBLINKANY (SELECT ...) or IN (SELECT ...)

The optimizer normally pulls up correlated subqueries that use ANY, IN, EXISTS, or NOT EXISTS, rewriting them as semi-joins or anti-joins so both the subquery and the outer query are optimized together.

For ANY_SUBLINK, standard PostgreSQL cannot pull up the subquery when it references variables from the outer query. In that case, the subquery runs as a correlated SubPlan—once per row produced by the outer query—and filter conditions inside the subquery have no effect on the outer query's result set. Execution time grows linearly with the outer table size.

With polar_enable_pullup_with_lateral enabled, PolarDB for PostgreSQL lifts this restriction. Correlated IN or ANY subqueries that reference outer-query variables can be pulled up, merged with the outer query into a single semi-join plan, and optimized together.

When pull-up applies

Pull-up is triggered when all of the following conditions hold:

  • The subquery uses IN or ANY (that is, it is an ANY_SUBLINK)

  • The subquery references at least one variable from the outer query (a correlated subquery)

  • polar_enable_pullup_with_lateral is ON

If your subquery uses EXISTS or NOT EXISTS, the standard optimizer already handles pull-up without this parameter.

Enable or disable correlated subquery pull-up

The polar_enable_pullup_with_lateral parameter controls whether the optimizer pulls up correlated ANY_SUBLINK subqueries.

ValueBehavior
ON (default)Enables pull-up of correlated ANY_SUBLINK subqueries
OFFDisables pull-up of correlated ANY_SUBLINK subqueries

Set the parameter at the session level:

SET polar_enable_pullup_with_lateral TO ON;

Example

The following example shows the performance difference for the same query with pull-up disabled and enabled.

Prepare the test tables

CREATE TABLE t1 (a INT, b INT);
INSERT INTO t1 SELECT i, 1 FROM generate_series(1, 100000) i;
CREATE TABLE t2 AS SELECT * FROM t1;

Run the query with pull-up disabled

SET polar_enable_pullup_with_lateral TO OFF;

EXPLAIN (COSTS OFF, ANALYZE)
SELECT * FROM t1
WHERE t1.a IN (SELECT a FROM t2 WHERE t2.b = t1.b AND t2.b = 1);
                                    QUERY PLAN
---------------------------------------------------------------------------------
 Seq Scan on t1 (actual time=67.631..1641827.119 rows=100000 loops=1)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  Result (actual time=0.005..13.124 rows=50000 loops=100000)
           One-Time Filter: (t1.b = 1)
           ->  Seq Scan on t2 (actual time=0.005..7.718 rows=50000 loops=100000)
                 Filter: (b = 1)
 Planning Time: 0.145 ms
 Execution Time: 1641847.702 ms

SubPlan 1 runs 100,000 times—once for each row in t1. Because the subquery cannot filter the outer query's rows, all 100,000 rows pass through and the subquery executes in full for each one. Execution time is approximately 27 minutes.

Run the query with pull-up enabled

SET polar_enable_pullup_with_lateral TO ON;

EXPLAIN (COSTS OFF, ANALYZE)
SELECT * FROM t1
WHERE t1.a IN (SELECT a FROM t2 WHERE t2.b = t1.b AND t2.b = 1);
                                  QUERY PLAN
----------------------------------------------------------------------------
 Hash Semi Join (actual time=64.783..173.482 rows=100000 loops=1)
   Hash Cond: (t1.a = t2.a)
   ->  Seq Scan on t1 (actual time=0.016..25.440 rows=100000 loops=1)
         Filter: (b = 1)
   ->  Hash (actual time=64.550..64.551 rows=100000 loops=1)
         Buckets: 131072  Batches: 2  Memory Usage: 2976kB
         ->  Seq Scan on t2 (actual time=0.010..30.330 rows=100000 loops=1)
               Filter: (b = 1)
 Planning Time: 0.195 ms
 Execution Time: 178.050 ms

With pull-up enabled, the optimizer rewrites the query as a Hash Semi Join. Both tables are scanned once, the filter b = 1 is pushed down to each scan, and the join discards non-matching rows efficiently. Execution time drops from approximately 27 minutes to under 200 milliseconds.