Boost Query Performance with Sublink Pushdown for IN and ANY - PolarDB

When a subquery containing a GROUP BY clause is referenced by an IN or ANY clause, standard PostgreSQL cannot push the join condition into the subquery. This forces a full scan of large tables, with execution time growing proportionally to table size. PolarDB for PostgreSQL solves this with sublink pushdown: the optimizer rewrites the query to move the IN or ANY clause inside the subquery, enabling an index-based parameterized path that dramatically reduces the rows scanned.

Prerequisites

Before enabling sublink pushdown, verify that your PolarDB for PostgreSQL cluster runs one of the following engine versions:

PolarDB for PostgreSQL 14, revision version 2.0.14.13.28.0 or later
PolarDB for PostgreSQL 11, revision version 2.0.11.15.44.0 or later

To check your revision version, run SHOW polardb_version; or view it in the console. To upgrade, see Version management.

How sublink pushdown works

In standard PostgreSQL, sublinks of the ANY type — including IN and ANY clauses — are pulled up as semi-joins. When the referenced table is a subquery that cannot be pulled up (for example, one with a GROUP BY clause), PostgreSQL cannot generate a parameterized path and the subquery runs independently, triggering a full table scan.

The following example shows the problem. The GROUP BY clause prevents pull-up, so PostgreSQL scans and sorts all 1,000,000 rows in t_big:

EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Semi Join  (cost=0.55..59523.15 rows=10000 width=12) (actual time=0.064..1237.621 rows=2 loops=1)
   Merge Cond: (t_big.a = t_small.a)
   ->  GroupAggregate  (cost=0.42..46910.99 rows=1000000 width=12) (actual time=0.033..1113.615 rows=1000000 loops=1)
         Group Key: t_big.a
         ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..31910.99 rows=1000000 width=8) (actual time=0.024..420.575 rows=1000000 loops=1)
   ->  Index Only Scan using t_small_a_idx on t_small  (cost=0.13..12.16 rows=2 width=4) (actual time=0.028..0.030 rows=2 loops=1)
         Heap Fetches: 2
 Planning Time: 0.256 ms
 Execution Time: 1237.700 ms
(9 rows)

When sublink pushdown is enabled, the optimizer moves a IN (SELECT a FROM t_small) inside the subquery. The planner generates a parameterized path for t_big and uses the index, scanning only 2 rows instead of 1,000,000. Execution time drops from ~1,237 ms to ~0.14 ms.

Use cases

Sublink pushdown is most effective when:

A GROUP BY subquery is referenced by an IN or ANY clause
The subquery scans a large table

Pushing the IN or ANY clause into the subquery lets the planner use the large table's index, reducing the rows scanned from the full table to only those matching the subquery result.

Limitations

Limitation	Details
Requires `GROUP BY` in subquery	The `IN` or `ANY` clause must reference a subquery that contains a `GROUP BY` clause. Without `GROUP BY`, PostgreSQL generates a parameterized path directly and skips sublink pushdown.
Column must appear in `GROUP BY`	The columns in the `IN` or `ANY` clause must be included in the `GROUP BY` columns. Otherwise, the rewritten query is semantically different from the original.
No outer joins allowed	The current query block must not contain outer joins, as they make the rewritten query semantically different from the original.
Single-column reference only	Only one column can be referenced — for example, `a IN (SELECT a FROM t)` or `a = ANY(SELECT a FROM t)`.
Supported statement types	Only `SELECT` and `CREATE TABLE AS` statements are supported.

Configure sublink pushdown

Use the polar_cbqt_pushdown_sublink parameter to control sublink pushdown behavior.

Parameter	Default	Allowed values	Description
`polar_cbqt_pushdown_sublink`	`OFF`	`OFF`, `ON`, `FORCE`	Controls sublink pushdown. `OFF` disables it. `ON` enables it through cost-based query transformation (CBQT) — the original query cost must exceed `polar_cbqt_cost_threshold`. `FORCE` bypasses the CBQT cost check and always pushes down sublinks; use this as a per-query hint rather than a global setting.

Examples

Prepare sample data

CREATE TABLE t_small(a int);
CREATE TABLE t_big(a int, b int, c int);

CREATE INDEX ON t_big(a);

INSERT INTO t_big SELECT i, i, i FROM generate_series(1, 1000000)i;
INSERT INTO t_small VALUES(1), (1000000);

ANALYZE t_small, t_big;

Original query (no pushdown)

Without pushdown, the join condition t_big.a = t_small.a cannot become a parameterized path, so the planner performs a Merge Semi Join that scans all 1,000,000 rows of t_big:

EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Semi Join  (cost=1.46..59510.27 rows=10000 width=12) (actual time=0.049..1239.128 rows=2 loops=1)
   Merge Cond: (t_big.a = t_small.a)
   ->  GroupAggregate  (cost=0.42..46909.23 rows=1000000 width=12) (actual time=0.034..1113.324 rows=1000000 loops=1)
         Group Key: t_big.a
         ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..31909.23 rows=1000000 width=8) (actual time=0.025..412.650 rows=1000000 loops=1)
   ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.012..0.013 rows=2 loops=1)
         Sort Key: t_small.a
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1)
 Planning Time: 0.219 ms
 Execution Time: 1239.208 ms
(11 rows)

Enable pushdown with CBQT

Enable both CBQT and sublink pushdown. The optimizer moves a IN (SELECT a FROM t_small) into the subquery and generates an index-based parameterized path for t_big.

Note

The original query plan cost must exceed polar_cbqt_cost_threshold for pushdown to take effect.

-- Enable CBQT
SET polar_enable_cbqt TO on;

-- Enable sublink pushdown
SET polar_cbqt_pushdown_sublink TO on;

EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=17.96..17.99 rows=2 width=12) (actual time=0.056..0.059 rows=2 loops=1)
   Group Key: t_big.a
   ->  Sort  (cost=17.96..17.96 rows=2 width=8) (actual time=0.051..0.052 rows=2 loops=1)
         Sort Key: t_big.a
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=1.46..17.95 rows=2 width=8) (actual time=0.032..0.045 rows=2 loops=1)
               ->  Unique  (cost=1.03..1.04 rows=2 width=4) (actual time=0.014..0.018 rows=2 loops=1)
                     ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.014..0.015 rows=2 loops=1)
                           Sort Key: t_small.a
                           Sort Method: quicksort  Memory: 25kB
                           ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4) (actual time=0.007..0.008 rows=2 loops=1)
               ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=2)
                     Index Cond: (a = t_small.a)
 Planning Time: 0.518 ms
 Execution Time: 0.141 ms
(15 rows)

The plan switches from a Merge Semi Join scanning 1,000,000 rows to a Nested Loop scanning 2 rows, reducing execution time from 1,239 ms to 0.141 ms.

Verify the transformation

To confirm that pushdown is active, inspect the query plan:

Plan element	Pushdown active	Pushdown inactive
Join type	`Nested Loop`	`Merge Semi Join`
Rows scanned on `t_big`	2	1,000,000
`t_big` access method	`Index Scan` with `Index Cond`	`Index Scan` (full)

When pushdown is active, the plan shows a Nested Loop with an Index Cond on t_big. When inactive, the plan shows a Merge Semi Join with a GroupAggregate processing all rows.

Enable pushdown with a hint

Use a hint to enable pushdown for a single query without changing the global setting. This is the recommended approach for the FORCE value.

-- Global setting remains off
SET polar_cbqt_pushdown_sublink TO off;

EXPLAIN ANALYZE /*+ Set(polar_cbqt_pushdown_sublink force) */ SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=17.96..17.99 rows=2 width=12) (actual time=0.073..0.076 rows=2 loops=1)
   Group Key: t_big.a
   ->  Sort  (cost=17.96..17.96 rows=2 width=8) (actual time=0.067..0.069 rows=2 loops=1)
         Sort Key: t_big.a
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=1.46..17.95 rows=2 width=8) (actual time=0.026..0.040 rows=2 loops=1)
               ->  Unique  (cost=1.03..1.04 rows=2 width=4) (actual time=0.011..0.015 rows=2 loops=1)
                     ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.010..0.011 rows=2 loops=1)
                           Sort Key: t_small.a
                           Sort Method: quicksort  Memory: 25kB
                           ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1)
               ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..8.44 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=2)
                     Index Cond: (a = t_small.a)
 Planning Time: 0.788 ms
 Execution Time: 0.156 ms
(15 rows)

What's next

CBQT — learn about other cost-based query transformations in PolarDB for PostgreSQL
Version management — upgrade to a supported revision version