PolarDB PostgreSQL版支援子串連下推的查詢改寫功能,能夠提升含有IN/ANY子句的SQL的執行效率。
背景
在PostgreSQL中,ANY類型的子串連(對應IN和ANY子句)會被嘗試上拉為半串連(semi join)。然而,如果與之串連的表是一個無法上拉的子查詢,PostgreSQL不支援為其產生參數化路徑,只能作為一個整體獨立執行,當子查詢中的資料量較大時,可能導致整個SQL執行效率顯著降低。
例如,以下SQL由於子查詢中存在GROUP BY語句導致無法上拉,其執行時間主要在t_big表的掃描和排序上,並且隨著t_big表規模的增大,執行時間亦相應延長。
EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Merge Semi Join (cost=0.55..59523.15 rows=10000 width=12) (actual time=0.064..1237.621 rows=2 loops=1)
Merge Cond: (t_big.a = t_small.a)
-> GroupAggregate (cost=0.42..46910.99 rows=1000000 width=12) (actual time=0.033..1113.615 rows=1000000 loops=1)
Group Key: t_big.a
-> Index Scan using t_big_a_idx on t_big (cost=0.42..31910.99 rows=1000000 width=8) (actual time=0.024..420.575 rows=1000000 loops=1)
-> Index Only Scan using t_small_a_idx on t_small (cost=0.13..12.16 rows=2 width=4) (actual time=0.028..0.030 rows=2 loops=1)
Heap Fetches: 2
Planning Time: 0.256 ms
Execution Time: 1237.700 ms
(9 rows)如果能夠將ANY類型的子串連下推至子查詢中,則可以利用子查詢中的索引,從而提升執行效率。
EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big WHERE a IN (SELECT a FROM t_small) GROUP BY a)v;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=17.96..17.99 rows=2 width=12) (actual time=0.061..0.064 rows=2 loops=1)
Group Key: t_big.a
-> Sort (cost=17.96..17.96 rows=2 width=8) (actual time=0.054..0.056 rows=2 loops=1)
Sort Key: t_big.a
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1.46..17.95 rows=2 width=8) (actual time=0.031..0.045 rows=2 loops=1)
-> Unique (cost=1.03..1.04 rows=2 width=4) (actual time=0.014..0.017 rows=2 loops=1)
-> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.013..0.014 rows=2 loops=1)
Sort Key: t_small.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1)
-> Index Scan using t_big_a_idx on t_big (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.011 rows=1 loops=2)
Index Cond: (a = t_small.a)
Planning Time: 0.527 ms
Execution Time: 0.143 ms
(15 rows)前提條件
支援的PolarDB PostgreSQL版的版本如下:
PostgreSQL 14(核心小版本2.0.14.13.28.0及以上)
PostgreSQL 11(核心小版本2.0.11.15.44.0及以上)
應用情境
子串連下推功能適用於IN/ANY子句引用的包含GROUP BY語句的子查詢,並且該子查詢涉及一個大表。將IN/ANY子句下推至子查詢中,能夠利用大表的索引,從而減少對大表資料的訪問。
功能限制
子串連下推功能存在以下限制:
IN/ANY子句必須引用一個有GROUP BY語句的子查詢,否則當前原生PostgreSQL將直接產生參數化路徑,不使用子串連下推功能。
IN/ANY子句的列必須包含在GROUP BY語句的列中,否則下推後的SQL不等價。
當前查詢塊不能存在外串連,否則下推後的SQL不等價。
目前只支援單列的情境。例如:
a in (select a from t)或者a = any(select a from t)。目前只支援
SELECT和CREATE TABLE AS語句。
使用說明
子串連下推功能受參數控制,相關的參數名稱及其作用如下所示:
參數名稱 | 描述 |
polar_cbqt_pushdown_sublink | 用於控制子串連下推功能的開關,取值如下:
|
樣本
資料準備
CREATE TABLE t_small(a int);
CREATE TABLE t_big(a int, b int, c int);
CREATE INDEX ON t_big(a);
INSERT INTO t_big SELECT i, i, i FROM generate_series(1, 1000000)i;
INSERT INTO t_small VALUES(1), (1000000);
ANALYZE t_small, t_big;原始查詢
在原始查詢計劃中,t_big.a = t_small.a串連條件無法作為參數化路徑進行下推,導致t_big表需要進行全表掃描,執行效率很低。
EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);返回結果如下:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Merge Semi Join (cost=1.46..59510.27 rows=10000 width=12) (actual time=0.049..1239.128 rows=2 loops=1)
Merge Cond: (t_big.a = t_small.a)
-> GroupAggregate (cost=0.42..46909.23 rows=1000000 width=12) (actual time=0.034..1113.324 rows=1000000 loops=1)
Group Key: t_big.a
-> Index Scan using t_big_a_idx on t_big (cost=0.42..31909.23 rows=1000000 width=8) (actual time=0.025..412.650 rows=1000000 loops=1)
-> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.012..0.013 rows=2 loops=1)
Sort Key: t_small.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1)
Planning Time: 0.219 ms
Execution Time: 1239.208 ms
(11 rows)通過CBQT開啟子串連下推
開啟CBQT和子串連下推功能後,將a in (select a from t_small)子句下推到子查詢中,可以利用串連條件為t_big表產生參數化路徑,掃描的資料大大減少,執行時間明顯縮短。
原計劃的代價需要大於參數polar_cbqt_cost_threshold,子串連下推查詢改寫才會應用。
-- 開啟CBQT功能
SET polar_enable_cbqt to on;
-- 開啟子串連下推功能
SET polar_cbqt_pushdown_sublink to on;
EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);返回結果如下:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=17.96..17.99 rows=2 width=12) (actual time=0.056..0.059 rows=2 loops=1)
Group Key: t_big.a
-> Sort (cost=17.96..17.96 rows=2 width=8) (actual time=0.051..0.052 rows=2 loops=1)
Sort Key: t_big.a
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1.46..17.95 rows=2 width=8) (actual time=0.032..0.045 rows=2 loops=1)
-> Unique (cost=1.03..1.04 rows=2 width=4) (actual time=0.014..0.018 rows=2 loops=1)
-> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.014..0.015 rows=2 loops=1)
Sort Key: t_small.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) (actual time=0.007..0.008 rows=2 loops=1)
-> Index Scan using t_big_a_idx on t_big (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=2)
Index Cond: (a = t_small.a)
Planning Time: 0.518 ms
Execution Time: 0.141 ms
(15 rows)通過HINT開啟子串連下推
通過HINT的方式,在SQL層級強制開啟子串連下推功能,同樣將a in (select a from t_small)子句下推到子查詢中,可以利用串連條件為t_big表產生參數化路徑,掃描的資料大大減少,執行時間明顯縮短。
-- polar_cbqt_pushdown_sublink參數預設關閉
SET polar_cbqt_pushdown_sublink to off;
EXPLAIN ANALYZE /*+ Set(polar_cbqt_pushdown_sublink force) */ SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);返回結果如下:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=17.96..17.99 rows=2 width=12) (actual time=0.073..0.076 rows=2 loops=1)
Group Key: t_big.a
-> Sort (cost=17.96..17.96 rows=2 width=8) (actual time=0.067..0.069 rows=2 loops=1)
Sort Key: t_big.a
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1.46..17.95 rows=2 width=8) (actual time=0.026..0.040 rows=2 loops=1)
-> Unique (cost=1.03..1.04 rows=2 width=4) (actual time=0.011..0.015 rows=2 loops=1)
-> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.010..0.011 rows=2 loops=1)
Sort Key: t_small.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1)
-> Index Scan using t_big_a_idx on t_big (cost=0.42..8.44 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=2)
Index Cond: (a = t_small.a)
Planning Time: 0.788 ms
Execution Time: 0.156 ms
(15 rows)