PolarDB PostgreSQL版實現了基於代價的查詢變換(CBQT,Cost Based Query Transformation)架構,能夠基於代價選擇是否應用某種查詢變換,大幅提升某些複雜查詢的執行效率。
背景
查詢變換是指依據等價規則,將一個查詢語句重寫為語義上等價的另一種形式。在社區PostgreSQL中,常見的查詢變換包括子查詢上拉、外串連消除、運算式預先處理、消除無用串連、謂詞下推等,這些變換均基於等價規則進行。經過這些查詢變換後,產生的執行計畫必然更優,因此PostgreSQL一定會嘗試應用這些變換。
但另一些查詢變換,如子串連下推、OR轉UNION ALL等,應用後的計劃不一定更優。對此,PolarDB PostgreSQL版實現基於代價的查詢變換(CBQT)架構,能夠基於執行代價選擇是否做這些變換。
對於複雜查詢,CBQT會收集該查詢在各查詢塊中能夠進行的基於代價的查詢變換,這些變換將匯總成一個狀態空間。CBQT將依據指定的策略對狀態空間進行搜尋,以選擇代價最低的狀態。
如下圖所示,對於輸入的SQL語句,CBQT在Query Block 1和Query Block 2中收集到基於代價的查詢變換A和B,這兩個變換組成的狀態空間有:
None:均不作變換。
[1,A]:Query Block 1做變換A。
[1,B]:Query Block 1做變換B。
[2,A]:Query Block 2做變換A。
CBQT會依次嘗試應用狀態空間的各種變換,在預設的搜尋策略linear中,狀態[1,B]能使計劃更優,因此在產生Plan4時,也會保持[1,B]的應用。
前提條件
支援的PolarDB PostgreSQL版的版本如下:
PostgreSQL 14(核心小版本2.0.14.13.28.0及以上)
PostgreSQL 11(核心小版本2.0.11.15.44.0及以上)
使用說明
PolarDB PostgreSQL版提供以下參數用於控制CBQT行為:
參數名 | 描述 |
polar_enable_cbqt | 是否開啟CBQT功能,取值如下:
|
polar_cbqt_cost_threshold | 設定開啟CBQT代價的閾值,取值範圍: |
polar_cbqt_strategy | 設定CBQT狀態空間的搜尋策略,取值如下:
|
polar_cbqt_iteration_limit | 設定CBQT迭代次數。取值範圍: 迭代次數越多選擇出最優計劃的可能性越大,但最佳化時間會更長。反之,選擇出最優計劃的可能性越小,最佳化時間更短。 |
目前已經支援基於代價的查詢改寫功能如下:
功能開關 | 描述 |
polar_cbqt_convert_or_to_union_all_mode | OR子句轉UNION ALL:嘗試將合適的OR子句轉換成UNION ALL的形式,提升查詢的效率。 |
polar_cbqt_pushdown_sublink | 子串連下推:嘗試將子串連下推到子查詢中,利用子查詢中的索引產生參數化路徑,提升查詢的效率。 |
樣本
此處以子串連下推功能為例,介紹CBQT功能及各個參數的使用。
準備測試表並插入資料。
CREATE TABLE t_small(a int); CREATE TABLE t_big(a int, b int, c int); CREATE INDEX ON t_big(a); INSERT INTO t_big SELECT i, i, i FROM generate_series(1, 1000000)i; INSERT INTO t_small VALUES(1), (1000000); ANALYZE t_small, t_big;關閉CBQT功能,並開啟子串連下推功能。此時子串連下推功能並不會開啟,
t_big表仍然需要全表掃描,執行效率很低。-- 關閉CBQT功能 SET polar_enable_cbqt to off; -- 開啟子串連下推功能 SET polar_cbqt_pushdown_sublink to on; EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);返回結果如下:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Merge Semi Join (cost=1.46..59511.17 rows=10000 width=12) (actual time=0.052..1274.435 rows=2 loops=1) Merge Cond: (t_big.a = t_small.a) -> GroupAggregate (cost=0.42..46910.13 rows=1000000 width=12) (actual time=0.033..1151.005 rows=1000000 loops=1) Group Key: t_big.a -> Index Scan using t_big_a_idx on t_big (cost=0.42..31910.13 rows=1000000 width=8) (actual time=0.022..433.821 rows=1000000 loops=1) -> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.015..0.016 rows=2 loops=1) Sort Key: t_small.a Sort Method: quicksort Memory: 25kB -> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1) Planning Time: 0.904 ms Execution Time: 1274.539 ms (11 rows)開啟CBQT功能,並開啟子串連下推功能,子串連下推功能正確開啟。
a in (select a from t_small)子句被下推到子查詢中,可以利用串連條件為t_big表產生參數化路徑,掃描的資料大大減少,執行時間明顯縮短。-- 開啟CBQT功能 SET polar_enable_cbqt to on; -- 開啟子串連下推功能 SET polar_cbqt_pushdown_sublink to on; EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);返回結果如下:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=17.96..17.99 rows=2 width=12) (actual time=0.060..0.063 rows=2 loops=1) Group Key: t_big.a -> Sort (cost=17.96..17.96 rows=2 width=8) (actual time=0.052..0.053 rows=2 loops=1) Sort Key: t_big.a Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=1.46..17.95 rows=2 width=8) (actual time=0.032..0.046 rows=2 loops=1) -> Unique (cost=1.03..1.04 rows=2 width=4) (actual time=0.014..0.018 rows=2 loops=1) -> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.013..0.014 rows=2 loops=1) Sort Key: t_small.a Sort Method: quicksort Memory: 25kB -> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) (actual time=0.006..0.007 rows=2 loops=1) -> Index Scan using t_big_a_idx on t_big (cost=0.42..8.44 rows=1 width=8) (actual time=0.009..0.010 rows=1 loops=2) Index Cond: (a = t_small.a) Planning Time: 0.644 ms Execution Time: 0.150 ms (15 rows)當SQL語句不滿足CBQT的代價閾值時,不會啟用CBQT功能,仍然會選擇原始的計劃。例如,原始計劃的代價為59511.17,設定
polar_cbqt_cost_threshold參數為500000時:-- 開啟CBQT功能 SET polar_enable_cbqt to on; --- 設定CBQT代價閾值 SET polar_cbqt_cost_threshold to 500000; -- 開啟子串連下推功能 SET polar_cbqt_pushdown_sublink to on; EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);返回結果如下:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Merge Semi Join (cost=1.46..59511.17 rows=10000 width=12) (actual time=0.059..1253.452 rows=2 loops=1) Merge Cond: (t_big.a = t_small.a) -> GroupAggregate (cost=0.42..46910.13 rows=1000000 width=12) (actual time=0.041..1127.255 rows=1000000 loops=1) Group Key: t_big.a -> Index Scan using t_big_a_idx on t_big (cost=0.42..31910.13 rows=1000000 width=8) (actual time=0.029..414.488 rows=1000000 loops=1) -> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.014..0.015 rows=2 loops=1) Sort Key: t_small.a Sort Method: quicksort Memory: 25kB -> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1) Planning Time: 0.280 ms Execution Time: 1253.558 ms (11 rows)設定CBQT狀態空間的搜尋策略。以下樣本中有兩個能夠下推的子串連,但只有第二個子串連下推才是更優的。
設定polar_cbqt_strategy為linear,即線性搜尋策略,CBQT選擇了最優的計劃。
-- 開啟CBQT功能 SET polar_enable_cbqt to on; --- 設定CBQT狀態空間的搜尋策略 SET polar_cbqt_strategy to linear; -- 開啟子串連下推功能 SET polar_cbqt_pushdown_sublink to on; EXPLAIN SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_big) UNION ALL SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);返回結果如下:
QUERY PLAN ------------------------------------------------------------------------------------------------------------- Append (cost=0.85..105692.60 rows=500002 width=12) -> Merge Semi Join (cost=0.85..98174.56 rows=500000 width=12) Merge Cond: (t_big_1.a = t_big.a) -> GroupAggregate (cost=0.42..46910.13 rows=1000000 width=12) Group Key: t_big_1.a -> Index Scan using t_big_a_idx on t_big t_big_1 (cost=0.42..31910.13 rows=1000000 width=8) -> Index Only Scan using t_big_a_idx on t_big (cost=0.42..26264.42 rows=1000000 width=4) -> GroupAggregate (cost=17.96..17.99 rows=2 width=12) Group Key: t_big_2.a -> Sort (cost=17.96..17.96 rows=2 width=8) Sort Key: t_big_2.a -> Nested Loop (cost=1.46..17.95 rows=2 width=8) -> Unique (cost=1.03..1.04 rows=2 width=4) -> Sort (cost=1.03..1.03 rows=2 width=4) Sort Key: t_small.a -> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) -> Index Scan using t_big_a_idx on t_big t_big_2 (cost=0.42..8.44 rows=1 width=8) Index Cond: (a = t_small.a) (18 rows)設定polar_cbqt_strategy為twophase,即兩遍搜尋策略,兩個子串連都選擇了下推。
-- 開啟CBQT功能 SET polar_enable_cbqt to on; --- 設定CBQT狀態空間的搜尋策略 SET polar_cbqt_strategy to twophase; -- 開啟子串連下推功能 SET polar_cbqt_pushdown_sublink to on; EXPLAIN SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_big) UNION ALL SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);返回結果如下:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Append (cost=0.85..113192.60 rows=1000002 width=12) -> GroupAggregate (cost=0.85..88174.56 rows=1000000 width=12) Group Key: t_big.a -> Merge Semi Join (cost=0.85..73174.56 rows=1000000 width=8) Merge Cond: (t_big.a = t_big_1.a) -> Index Scan using t_big_a_idx on t_big (cost=0.42..31910.13 rows=1000000 width=8) -> Index Only Scan using t_big_a_idx on t_big t_big_1 (cost=0.42..26264.42 rows=1000000 width=4) -> GroupAggregate (cost=17.96..17.99 rows=2 width=12) Group Key: t_big_2.a -> Sort (cost=17.96..17.96 rows=2 width=8) Sort Key: t_big_2.a -> Nested Loop (cost=1.46..17.95 rows=2 width=8) -> Unique (cost=1.03..1.04 rows=2 width=4) -> Sort (cost=1.03..1.03 rows=2 width=4) Sort Key: t_small.a -> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) -> Index Scan using t_big_a_idx on t_big t_big_2 (cost=0.42..8.44 rows=1 width=8) Index Cond: (a = t_small.a) (18 rows)
限制CBQT迭代次數。設定
polar_cbqt_iteration_limit參數為1,限制CBQT的迭代次數為1。對於上述情境,即使已知第二個子串連下推會使計劃更優,但由於迭代次數的限制,並不會進行嘗試。-- 開啟CBQT功能 SET polar_enable_cbqt to on; --- 設定CBQT狀態空間的搜尋策略 SET polar_cbqt_strategy to twophase; --- 設定CBQT迭代次數 SET polar_cbqt_iteration_limit to 1; -- 開啟子串連下推功能 SET polar_cbqt_pushdown_sublink to on; EXPLAIN SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_big) UNION ALL SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);返回結果如下:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Append (cost=0.85..113192.60 rows=1000002 width=12) -> GroupAggregate (cost=0.85..88174.56 rows=1000000 width=12) Group Key: t_big.a -> Merge Semi Join (cost=0.85..73174.56 rows=1000000 width=8) Merge Cond: (t_big.a = t_big_1.a) -> Index Scan using t_big_a_idx on t_big (cost=0.42..31910.13 rows=1000000 width=8) -> Index Only Scan using t_big_a_idx on t_big t_big_1 (cost=0.42..26264.42 rows=1000000 width=4) -> GroupAggregate (cost=17.96..17.99 rows=2 width=12) Group Key: t_big_2.a -> Sort (cost=17.96..17.96 rows=2 width=8) Sort Key: t_big_2.a -> Nested Loop (cost=1.46..17.95 rows=2 width=8) -> Unique (cost=1.03..1.04 rows=2 width=4) -> Sort (cost=1.03..1.03 rows=2 width=4) Sort Key: t_small.a -> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) -> Index Scan using t_big_a_idx on t_big t_big_2 (cost=0.42..8.44 rows=1 width=8) Index Cond: (a = t_small.a) (18 rows)