When an IN or ANY subquery references variables from the outer query, the standard PostgreSQL optimizer cannot pull it up. The subquery runs once per row in the outer result set—execution time scales linearly with table size. PolarDB for PostgreSQL adds the polar_enable_pullup_with_lateral parameter to remove this restriction: the optimizer rewrites the subquery as a semi-join and processes both tables in a single pass.
Prerequisites
Before you begin, make sure you have:
A PolarDB for PostgreSQL cluster running PostgreSQL 14
Minor engine version 2.0.14.8.11.0 or later
To check your minor engine version, run SHOW polardb_version; or view the minor engine version in the console. If your cluster does not meet the version requirement, upgrade the minor engine version.
How it works
The PostgreSQL optimizer uses SubLink to represent a subquery and its associated operators in an expression. The three types relevant to pull-up are:
| Type | Implements |
|---|---|
EXISTS_SUBLINK | EXISTS (SELECT ...) |
ALL_SUBLINK | ALL (SELECT ...) |
ANY_SUBLINK | ANY (SELECT ...) or IN (SELECT ...) |
The optimizer normally pulls up correlated subqueries that use ANY, IN, EXISTS, or NOT EXISTS, rewriting them as semi-joins or anti-joins so both the subquery and the outer query are optimized together.
For ANY_SUBLINK, standard PostgreSQL cannot pull up the subquery when it references variables from the outer query. In that case, the subquery runs as a correlated SubPlan—once per row produced by the outer query—and filter conditions inside the subquery have no effect on the outer query's result set. Execution time grows linearly with the outer table size.
With polar_enable_pullup_with_lateral enabled, PolarDB for PostgreSQL lifts this restriction. Correlated IN or ANY subqueries that reference outer-query variables can be pulled up, merged with the outer query into a single semi-join plan, and optimized together.
When pull-up applies
Pull-up is triggered when all of the following conditions hold:
The subquery uses
INorANY(that is, it is anANY_SUBLINK)The subquery references at least one variable from the outer query (a correlated subquery)
polar_enable_pullup_with_lateralisON
If your subquery uses EXISTS or NOT EXISTS, the standard optimizer already handles pull-up without this parameter.
Enable or disable correlated subquery pull-up
The polar_enable_pullup_with_lateral parameter controls whether the optimizer pulls up correlated ANY_SUBLINK subqueries.
| Value | Behavior |
|---|---|
ON (default) | Enables pull-up of correlated ANY_SUBLINK subqueries |
OFF | Disables pull-up of correlated ANY_SUBLINK subqueries |
Set the parameter at the session level:
SET polar_enable_pullup_with_lateral TO ON;Example
The following example shows the performance difference for the same query with pull-up disabled and enabled.
Prepare the test tables
CREATE TABLE t1 (a INT, b INT);
INSERT INTO t1 SELECT i, 1 FROM generate_series(1, 100000) i;
CREATE TABLE t2 AS SELECT * FROM t1;Run the query with pull-up disabled
SET polar_enable_pullup_with_lateral TO OFF;
EXPLAIN (COSTS OFF, ANALYZE)
SELECT * FROM t1
WHERE t1.a IN (SELECT a FROM t2 WHERE t2.b = t1.b AND t2.b = 1); QUERY PLAN
---------------------------------------------------------------------------------
Seq Scan on t1 (actual time=67.631..1641827.119 rows=100000 loops=1)
Filter: (SubPlan 1)
SubPlan 1
-> Result (actual time=0.005..13.124 rows=50000 loops=100000)
One-Time Filter: (t1.b = 1)
-> Seq Scan on t2 (actual time=0.005..7.718 rows=50000 loops=100000)
Filter: (b = 1)
Planning Time: 0.145 ms
Execution Time: 1641847.702 msSubPlan 1 runs 100,000 times—once for each row in t1. Because the subquery cannot filter the outer query's rows, all 100,000 rows pass through and the subquery executes in full for each one. Execution time is approximately 27 minutes.
Run the query with pull-up enabled
SET polar_enable_pullup_with_lateral TO ON;
EXPLAIN (COSTS OFF, ANALYZE)
SELECT * FROM t1
WHERE t1.a IN (SELECT a FROM t2 WHERE t2.b = t1.b AND t2.b = 1); QUERY PLAN
----------------------------------------------------------------------------
Hash Semi Join (actual time=64.783..173.482 rows=100000 loops=1)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t1 (actual time=0.016..25.440 rows=100000 loops=1)
Filter: (b = 1)
-> Hash (actual time=64.550..64.551 rows=100000 loops=1)
Buckets: 131072 Batches: 2 Memory Usage: 2976kB
-> Seq Scan on t2 (actual time=0.010..30.330 rows=100000 loops=1)
Filter: (b = 1)
Planning Time: 0.195 ms
Execution Time: 178.050 msWith pull-up enabled, the optimizer rewrites the query as a Hash Semi Join. Both tables are scanned once, the filter b = 1 is pushed down to each scan, and the join discards non-matching rows efficiently. Execution time drops from approximately 27 minutes to under 200 milliseconds.