當MySQL最佳化器處理包含OR/IN的複雜查詢(特別是多表JOIN)時,可能無法有效利用索引,導致全表掃描,影響查詢效能。PolarDB MySQL版的查詢改寫最佳化功能,通過將合格OR/IN運算式改寫為UNION ALL結構,並基於代價選擇最優執行路徑,從而讓查詢能夠充分利用索引,顯著提升執行效能。
工作原理
在MySQL中,最佳化器對OR子句的處理能力有限。特別是當OR條件涉及多個表時,最佳化器往往只能將其作為串連後的過濾條件,無法有效利用單個條件上的索引,最終可能退化為全表掃描,導致查詢效能急劇下降。
例如,對於以下查詢,最佳化器無法使用t1.b或t3.c1列上的索引,只能執行全表掃描和雜湊串連,效率很低。
-- 最佳化前,執行計畫為全表掃描,耗時較長
EXPLAIN ANALYZE SELECT * FROM t1,t3 WHERE t3.c1 > 98 OR t1.b <= 0;
-> Filter: ((t3.c1 > 98) or (t1.b <= 0)) ... (actual time=115.259..5416.434 ...)
-> Inner hash join ...
-> Table scan on t3 ...
-> Hash
-> Table scan on t1 ...從邏輯上講,這個OR查詢等價於將兩個獨立查詢的結果合并,即UNION ALL。如果手動改寫,查詢可以利用各自的索引,效能會大幅提升。
-- 手動改寫為 UNION ALL,執行計畫可以使用索引,耗時顯著縮短
EXPLAIN ANALYZE
SELECT * FROM t1 ,t3 WHERE t1.b <= 0
UNION ALL
SELECT * FROM t1,t3 WHERE t3.c1 > 98 AND (t1.b > 0 OR (t1.b <= 0) IS NULL);
-> Append (actual time=58.272..302.546 ...)
...
-> Index range scan on t3 using idx_c1 ...PolarDB的OR/IN轉UNION ALL功能正是將上述手動最佳化過程自動化。最佳化器會在計劃產生階段評估OR改寫為UNION ALL的潛在收益,並與原始執行計畫進行代價對比,最終選擇成本更低的方案來執行,從而在不修改SQL的情況下實現查詢加速。
適用範圍
產品系列:叢集版、標準版。
核心版本:MySQL 8.0.2,且修訂版本需為8.0.2.2.32及以上版本。
當前功能處於灰階階段,在唯讀(RO)節點上預設開啟,讀寫(RW)節點上需要額外設定。如您有相關需求,請提交工單聯絡我們處理。
開啟與配置查詢改寫最佳化
通過設定相關參數來控制此最佳化功能的行為。
PolarDB叢集參數在控制台與會話中修改方式存在差異,詳細區別如下:
在PolarDB控制台上修改:
相容性說明:部分叢集參數在PolarDB控制台上均已添加MySQL設定檔的相容性首碼loose_。
操作方法:找到並修改這些帶
loose_首碼的參數。
在資料庫會話中修改(使用命令列或用戶端):
操作方法:當您串連到資料庫,使用
SET命令修改參數時,請去掉loose_首碼,直接使用參數的原始名稱進行修改。
參數名稱 | 層級 | 描述與建議 |
| Global/Session | 控制該功能的主開關。
|
| Global/Session | 控制最佳化的觸發閾值,只有當原始查詢的估算代價(可通過 取值範圍:0~18446744073709551615。 預設值:100000。 說明 建議保持預設值。若設定為 |
使用限制
該功能僅在滿足以下所有條件時觸發:
通用限制:
OR子句或IN-LIST中的參數個數不能超過10個。查詢塊(Query Block)中不能包含子查詢、
GROUP BY、視窗函數、DISTINCT或彙總函式。
通用
UNION ALL轉換(主要針對多表JOIN):OR子句:OR條件必須涉及2張及以上的表。OR子句均採用field=const模式,或均能有效利用索引。field=const模式:field是指表中的某個列,const是指一個常量值。有效利用索引:例如,
t1.f1=t2.f2中f1是t1某個索引首碼且f2是t2某個索引首碼。
IN-LIST:無需轉換為UNION ALL,因為range訪問方式更優。
Top-K轉換(主要針對單表ORDER BY...LIMIT):OR子句:OR條件必須作用於同一列,且該列與ORDER BY排序列必須是同一個索引的首碼。例如,索引為(c2, c3),查詢為WHERE c2=... OR c2=... ORDER BY c3。IN-LIST:IN-LIST的左運算式列與ORDER BY排序列必須是同一個索引的首碼。
樣本:驗證最佳化效果
資料準備
-- 建立並填充表t1
CREATE TABLE `t1` (
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
KEY `idx_a` (`a`)
) ENGINE=InnoDB;
-- 插入資料
INSERT INTO `t1` VALUES (1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10);
-- 重複執行以增加資料量
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
-- 建立並填充表t3
CREATE TABLE `t3` (
`c1` int(11) NOT NULL,
`c2` int(11) DEFAULT NULL,
`c3` int(11) DEFAULT NULL,
`c4` int(11) DEFAULT NULL,
KEY `idx_c1`(`c1`),
KEY `idx_c2_c3` (`c2`,`c3`)
) ENGINE=InnoDB;
-- 插入大量資料
INSERT INTO `t3` VALUES (1,0,1,0),(2,0,2,0),(3,0,3,0),(4,0,4,0),(5,0,5,0),(6,0,6,0),(7,0,7,0),(8,0,8,0),(9,0,9,0),(10,0,10,0),(11,0,11,0),(12,0,12,0),(13,0,13,0),(14,0,14,0),(15,0,15,0),(16,0,16,0),(17,0,17,0),(18,0,18,0),(19,0,19,0),(20,0,20,0),(21,0,21,0),(22,0,22,0),(23,0,23,0),(24,0,24,0),(25,1,25,0),(26,1,26,0),(27,1,27,0),(28,1,28,0),(29,1,29,0),(30,1,30,0),(31,1,31,0),(32,1,32,0),(33,1,33,0),(34,1,34,0),(35,1,35,0),(36,1,36,0),(37,1,37,0),(38,1,38,0),(39,1,39,0),(40,1,40,0),(41,1,41,0),(42,1,42,0),(43,1,43,0),(44,1,44,0),(45,1,45,0),(46,1,46,0),(47,1,47,0),(48,1,48,0),(49,1,49,0),(50,1,50,1),(51,1,51,1),(52,1,52,1),(53,1,53,1),(54,1,54,1),(55,1,55,1),(56,1,56,1),(57,1,57,1),(58,1,58,1),(59,1,59,1),(60,1,60,1),(61,1,61,1),(62,1,62,1),(63,1,63,1),(64,1,64,1),(65,1,65,1),(66,1,66,1),(67,1,67,1),(68,1,68,1),(69,1,69,1),(70,1,70,1),(71,1,71,1),(72,1,72,1),(73,1,73,1),(74,1,74,1),(75,2,75,1),(76,2,76,1),(77,2,77,1),(78,2,78,1),(79,2,79,1),(80,2,80,1),(81,2,81,1),(82,2,82,1),(83,2,83,1),(84,2,84,1),(85,2,85,1),(86,2,86,1),(87,2,87,1),(88,2,88,1),(89,2,89,1),(90,2,90,1),(91,2,91,1),(92,2,92,1),(93,2,93,1),(94,2,94,1),(95,2,95,1),(96,2,96,1),(97,2,97,1),(98,2,98,1),(99,2,99,1),(100,2,100,1);
-- 重複執行以增加資料量
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
-- 分析表
ANALYZE TABLE t1, t3;情境一:最佳化多表JOIN查詢
此情境展示了OR條件跨越兩張表時,最佳化器如何通過改寫來利用索引。
關閉最佳化功能,觀察原始執行計畫。
-- 關閉最佳化功能 SET polar_optimizer_switch='or_expansion=off'; -- 分析語句 DESC SELECT * FROM t1,t3 WHERE t3.c1 >98 OR t1.a<5;結果分析:執行計畫顯示為
Hash Join,對t1和t3進行了全表掃描。最佳化器未能利用t1.a和t3.c1上的索引。+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+ | 1 | SIMPLE | t1 | NULL | ALL | idx_a | NULL | NULL | NULL | 1280 | 100.00 | NULL | | 1 | SIMPLE | t3 | NULL | ALL | idx_c1 | NULL | NULL | NULL | 6591 | 100.00 | Using where; Using join buffer (hash join) | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+開啟最佳化功能,查看改寫後的執行計畫。
-- 開啟最佳化功能 SET polar_optimizer_switch='or_expansion=on'; -- 將閾值調低以便觸發最佳化 SET cbqt_cost_threshold=1; -- 分析語句 DESC SELECT * FROM t1,t3 WHERE t3.c1 >98 OR t1.a<5;結果分析:執行計畫已調整為
UNION ALL,這使得能夠利用t1.a和t3.c1上的索引,從而實現與手動改寫為UNION ALL相同的效果。+----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+ | 1 | PRIMARY | t1 | NULL | range | idx_a | idx_a | 5 | NULL | 256 | 100.00 | Using index condition; Using MRR | | 1 | PRIMARY | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 6400 | 100.00 | Using join buffer (hash join) | | 2 | UNION | t3 | NULL | range | idx_c1 | idx_c1 | 4 | NULL | 128 | 100.00 | Using index condition; Using MRR | | 2 | UNION | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 640 | 66.67 | Using where; Using join buffer (hash join) | +----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+
情境二:最佳化Top-K查詢(OR子句)
此情境展示了對單表ORDER BY ... LIMIT查詢,最佳化器如何將OR條件改寫為UNION ALL並下推LIMIT,從而避免大範圍排序。
關閉最佳化功能,觀察原始執行計畫。
-- 關閉最佳化功能 SET polar_optimizer_switch='or_expansion=off'; -- 分析語句 DESC ANALYZE SELECT c2 FROM t3 WHERE (c2 = 2 OR c2= 0 ) ORDER BY t3.c3 DESC LIMIT 5;結果分析:執行計畫通過
Index range scan擷取所有滿足c2=2或c2=0的行後進行Sort操作,耗時約200毫秒。+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 5 row(s) (actual time=193.389..193.393 rows=5 loops=1) -> Sort: t3.c3 DESC, limit input to 5 row(s) per chunk (cost=641.82 rows=3200) (actual time=193.386..193.388 rows=5 loops=1) -> Index range scan on t3 using idx_c2_c3, with index condition: ((t3.c2 = 2) or (t3.c2 = 0)) (actual time=0.348..187.455 rows=3200 loops=1) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.20 sec)開啟最佳化功能,查看改寫後的執行計畫。
-- 關閉最佳化功能 SET polar_optimizer_switch='or_expansion=on'; -- 將閾值調低以便觸發最佳化 SET cbqt_cost_threshold=1; -- 分析語句 DESC ANALYZE SELECT c2 FROM t3 WHERE (c2 = 2 OR c2= 0 ) ORDER BY t3.c3 DESC LIMIT 5;結果分析:執行計畫變為
UNION ALL,並對每個分支(c2=2和c2=0)分別執行Index lookup並應用LIMIT 5。最佳化器合并兩個已排序的5行結果集,無需全域排序,執行耗時降至約1毫秒。+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 5 row(s) (actual time=1.249..1.254 rows=5 loops=1) -> Sort: derived_1_2.Name_exp_1 DESC, limit input to 5 row(s) per chunk (actual time=0.104..0.106 rows=5 loops=1) -> Table scan on derived_1_2 (actual time=0.006..0.013 rows=10 loops=1) -> Union materialize (actual time=1.246..1.249 rows=5 loops=1) -> Limit: 5 row(s) (actual time=0.336..0.571 rows=5 loops=1) -> Index lookup on t3 using idx_c2_c3 (c2=2; iterate backwards) (cost=0.00 rows=5) (actual time=0.333..0.566 rows=5 loops=1) -> Limit: 5 row(s) (actual time=0.215..0.431 rows=5 loops=1) -> Index lookup on t3 using idx_c2_c3 (c2=0; iterate backwards) (cost=0.00 rows=5) (actual time=0.214..0.427 rows=5 loops=1) | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec)
情境三:最佳化Top-K查詢(IN-LIST)
IN-LIST在邏輯上等同於OR,因此也支援Top-K最佳化。
關閉最佳化功能,觀察原始執行計畫。
-- 關閉最佳化功能 SET polar_optimizer_switch='or_expansion=off'; -- 分析語句 DESC ANALYZE SELECT c2 FROM t3 WHERE c2 IN (2, 0) ORDER BY t3.c3 DESC LIMIT 5;結果分析:執行計畫通過
Index range scan擷取所有滿足t3.c2 in (2,0)的行後進行Sort操作,耗時約200毫秒。+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 5 row(s) (actual time=197.497..197.501 rows=5 loops=1) -> Sort: t3.c3 DESC, limit input to 5 row(s) per chunk (cost=641.82 rows=3200) (actual time=197.494..197.496 rows=5 loops=1) -> Index range scan on t3 using idx_c2_c3, with index condition: (t3.c2 in (2,0)) (actual time=0.319..191.560 rows=3200 loops=1) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.20 sec)開啟最佳化功能,查看改寫後的執行計畫。
-- 關閉最佳化功能 SET polar_optimizer_switch='or_expansion=on'; -- 將閾值調低以便觸發最佳化 SET cbqt_cost_threshold=1; -- 分析語句 DESC ANALYZE SELECT c2 FROM t3 WHERE c2 IN (2, 0) ORDER BY t3.c3 DESC LIMIT 5;結果分析:執行計畫變為
UNION ALL,並對每個分支(c2=2和c2=0)分別執行Index lookup並應用LIMIT 5。最佳化器合并兩個已排序的5行結果集,無需全域排序。+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 5 row(s) (actual time=1.256..1.260 rows=5 loops=1) -> Sort: derived_1_2.Name_exp_1 DESC, limit input to 5 row(s) per chunk (actual time=0.090..0.093 rows=5 loops=1) -> Table scan on derived_1_2 (actual time=0.005..0.012 rows=10 loops=1) -> Union materialize (actual time=1.252..1.255 rows=5 loops=1) -> Limit: 5 row(s) (actual time=0.259..0.545 rows=5 loops=1) -> Index lookup on t3 using idx_c2_c3 (c2=2; iterate backwards) (cost=0.00 rows=5) (actual time=0.256..0.540 rows=5 loops=1) -> Limit: 5 row(s) (actual time=0.237..0.455 rows=5 loops=1) -> Index lookup on t3 using idx_c2_c3 (c2=0; iterate backwards) (cost=0.00 rows=5) (actual time=0.236..0.451 rows=5 loops=1) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
使用HINT進行手動幹預
在特定情境下,可使用HINT控制單個查詢是否啟用此最佳化。
NO_OR_EXPAND(@QB_NAME):強制對指定查詢塊禁用OR展開最佳化。DESC SELECT /*+NO_OR_EXPAND(@subq1) */ * FROM t1 WHERE EXISTS (SELECT /*+ QB_NAME(subq1) */ 1 FROM t3 WHERE (t1.a = 1 OR t1.b = 2) AND t3.c1 < 5 AND t1.b = t3.c1); +----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+ | 1 | SIMPLE | t1 | NULL | ALL | idx_a | NULL | NULL | NULL | 640 | 19.00 | Using where | | 1 | SIMPLE | t3 | NULL | ref | idx_c1 | idx_c1 | 4 | test2.t1.b | 64 | 100.00 | Using index; FirstMatch(t1) | +----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+如果
WHERE子句中存在多個OR運算式,可以使用OR_EXPAND(@QB_NAME idx)強制將特定運算式轉換為UNION ALL。其中,idx表示該運算式在WHERE子句中所處的位置,索引從0開始。樣本中,將運算式(t3.c2 = 1 OR t1.b = 2)展開為UNION ALL形式。DESC format=tree SELECT /*+OR_EXPAND(@subq1 3) */ * FROM t1 WHERE EXISTS (SELECT /*+ QB_NAME(subq1) */ 1 FROM t3 WHERE (t3.c2 = 999 OR t1.b = 999) AND t3.c1 < 5 AND t1.b = t3.c1 AND (t3.c2= 1 OR t1.b = 2)); +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Filter: exists(select #2) (cost=64.75 rows=640) -> Table scan on t1 (cost=64.75 rows=640) -> Select #2 (subquery in condition; dependent) -> Limit: 1 row(s) -> Append -> Stream results -> Filter: (t3.c2 = 1) (cost=17.45 rows=32) -> Index lookup on t3 using idx_c1 (c1=t1.b), with index condition: ((t1.b = 999) and (t3.c1 < 5)) (cost=17.45 rows=64) -> Stream results -> Filter: (t3.c1 = 2) (cost=0.51 rows=0) -> Index lookup on t3 using idx_c2_c3 (c2=999), with index condition: ((t1.b = 2) and lnnvl((t3.c2 = 1))) (cost=0.51 rows=1)OR_EXPAND(@QB_NAME):強制對指定查詢塊(qb_name)啟用OR展開最佳化。DESC SELECT /*+OR_EXPAND(@subq1) */ * FROM t1 WHERE EXISTS (SELECT /*+ QB_NAME(subq1) */ 1 FROM t3 WHERE (t3.c1 = 1 OR t1.b = 2) AND t3.c1 < 5 AND t1.b = t3.c1); +----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+ | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 640 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | t3 | NULL | ref | idx_c1 | idx_c1 | 4 | const | 64 | 100.00 | Using where; Using index | | 3 | DEPENDENT UNION | t3 | NULL | ref | idx_c1 | idx_c1 | 4 | const | 64 | 100.00 | Using index condition; Using index | +----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+