The MySQL optimizer may not use indexes effectively when processing complex queries that contain OR/IN expressions, especially in multi-table JOIN operations. This can lead to a full table scan and degrade query performance. The query rewrite optimization feature in PolarDB for MySQL rewrites eligible OR/IN expressions into a UNION ALL structure and selects the optimal execution path based on cost. This allows the query to fully utilize indexes and significantly improves execution performance.
How it works
In MySQL, the optimizer has limited capabilities when handling OR clauses. When an OR condition involves multiple tables, the optimizer often treats it only as a filter condition after the join operation. It cannot effectively use an index for a single condition. This can result in a full table scan and cause a sharp drop in query performance.
For example, in the following query, the optimizer cannot use the indexes on the t1.b or t3.c1 columns. It can only perform a full table scan and a hash join, which is very inefficient.
-- Before optimization, the execution plan is a full table scan, and the execution duration is long.
EXPLAIN ANALYZE SELECT * FROM t1,t3 WHERE t3.c1 > 98 OR t1.b <= 0;
-> Filter: ((t3.c1 > 98) or (t1.b <= 0)) ... (actual time=115.259..5416.434 ...)
-> Inner hash join ...
-> Table scan on t3 ...
-> Hash
-> Table scan on t1 ...Logically, this OR query is equivalent to merging the results of two separate queries using UNION ALL. If you rewrite the query manually, the query can use the respective indexes, and the performance improves significantly.
-- Manually rewritten to UNION ALL, the execution plan can use indexes, and the execution duration is significantly shorter.
EXPLAIN ANALYZE
SELECT * FROM t1 ,t3 WHERE t1.b <= 0
UNION ALL
SELECT * FROM t1,t3 WHERE t3.c1 > 98 AND (t1.b > 0 OR (t1.b <= 0) IS NULL);
-> Append (actual time=58.272..302.546 ...)
...
-> Index range scan on t3 using idx_c1 ...The PolarDB feature that transforms OR/IN expressions into a UNION ALL structure automates this manual optimization process. During the plan generation phase, the optimizer evaluates the potential benefits of rewriting an OR expression into a UNION ALL structure. It compares the cost with the original execution plan and chooses the lower-cost option for execution. This accelerates the query without requiring you to make SQL modifications.
Applicability
Product series: Cluster Edition, Standard Edition.
Engine version: MySQL 8.0.2, and the revision version must be 8.0.2.2.32 or later.
This feature is in canary release. It is enabled by default on read-only (RO) nodes and requires additional settings on read-write (RW) nodes. To use this feature, you can submit a ticket.
Enable and configure query rewrite optimization
You can control the behavior of this optimization feature by setting the relevant parameters.
The method for modifying PolarDB cluster parameters differs between the console and a database session. The differences are as follows:
In the PolarDB console
Compatibility: Some cluster parameters in the PolarDB console have the loose_ prefix for compatibility with MySQL configuration files.
Procedure: Find and modify the parameters that have the
loose_prefix.
In a database session (using the command line or a client)
Procedure: When you connect to the database and use the
SETcommand to modify a parameter, remove theloose_prefix and use the original parameter name.
Parameter | Level | Description and suggestions |
| Global/Session | The main switch for this feature.
|
| Global/Session | Controls the optimization trigger threshold. The optimizer attempts a rewrite only when the estimated cost of the original query, which you can view using Value range: 0 to 18446744073709551615. Default value: 100000. Note Keep the default value. If you set this parameter to |
Limits
This feature is triggered only if all the following conditions are met:
General limits:
The number of parameters in the
ORclause or IN-LIST cannot exceed 10.The query block cannot contain subqueries,
GROUP BYclauses, window functions,DISTINCTclauses, or aggregate functions.
General
UNION ALLtransformation (mainly for multi-table JOINs):ORclause:The
ORcondition must involve two or more tables.All
ORclauses must use thefield=constpattern or be able to use an index effectively.field=constpattern:fieldrefers to a column in a table, andconstrefers to a constant value.Effective index usage: For example, in
t1.f1=t2.f2,f1is a prefix of an index ont1andf2is a prefix of an index ont2.
IN-LIST: Does not need to be transformed intoUNION ALLbecause therangeaccess method is more optimal.
Top-Ktransformation (mainly for single-tableORDER BY...LIMITqueries):ORclause: TheORconditions must apply to the same column. This column and theORDER BYcolumn must be a prefix of the same index. For example, if the index is(c2, c3), the query isWHERE c2=... OR c2=... ORDER BY c3.IN-LIST: The column in the left expression of theIN-LISTand theORDER BYcolumn must be a prefix of the same index.
Example: Verify the optimization effect
Data preparation
-- Create and populate table t1
CREATE TABLE `t1` (
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
KEY `idx_a` (`a`)
) ENGINE=InnoDB;
-- Insert data
INSERT INTO `t1` VALUES (1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10);
-- Run this statement repeatedly to increase the data volume
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
-- Create and populate table t3
CREATE TABLE `t3` (
`c1` int(11) NOT NULL,
`c2` int(11) DEFAULT NULL,
`c3` int(11) DEFAULT NULL,
`c4` int(11) DEFAULT NULL,
KEY `idx_c1`(`c1`),
KEY `idx_c2_c3` (`c2`,`c3`)
) ENGINE=InnoDB;
-- Insert a large amount of data
INSERT INTO `t3` VALUES (1,0,1,0),(2,0,2,0),(3,0,3,0),(4,0,4,0),(5,0,5,0),(6,0,6,0),(7,0,7,0),(8,0,8,0),(9,0,9,0),(10,0,10,0),(11,0,11,0),(12,0,12,0),(13,0,13,0),(14,0,14,0),(15,0,15,0),(16,0,16,0),(17,0,17,0),(18,0,18,0),(19,0,19,0),(20,0,20,0),(21,0,21,0),(22,0,22,0),(23,0,23,0),(24,0,24,0),(25,1,25,0),(26,1,26,0),(27,1,27,0),(28,1,28,0),(29,1,29,0),(30,1,30,0),(31,1,31,0),(32,1,32,0),(33,1,33,0),(34,1,34,0),(35,1,35,0),(36,1,36,0),(37,1,37,0),(38,1,38,0),(39,1,39,0),(40,1,40,0),(41,1,41,0),(42,1,42,0),(43,1,43,0),(44,1,44,0),(45,1,45,0),(46,1,46,0),(47,1,47,0),(48,1,48,0),(49,1,49,0),(50,1,50,1),(51,1,51,1),(52,1,52,1),(53,1,53,1),(54,1,54,1),(55,1,55,1),(56,1,56,1),(57,1,57,1),(58,1,58,1),(59,1,59,1),(60,1,60,1),(61,1,61,1),(62,1,62,1),(63,1,63,1),(64,1,64,1),(65,1,65,1),(66,1,66,1),(67,1,67,1),(68,1,68,1),(69,1,69,1),(70,1,70,1),(71,1,71,1),(72,1,72,1),(73,1,73,1),(74,1,74,1),(75,2,75,1),(76,2,76,1),(77,2,77,1),(78,2,78,1),(79,2,79,1),(80,2,80,1),(81,2,81,1),(82,2,82,1),(83,2,83,1),(84,2,84,1),(85,2,85,1),(86,2,86,1),(87,2,87,1),(88,2,88,1),(89,2,89,1),(90,2,90,1),(91,2,91,1),(92,2,92,1),(93,2,93,1),(94,2,94,1),(95,2,95,1),(96,2,96,1),(97,2,97,1),(98,2,98,1),(99,2,99,1),(100,2,100,1);
-- Run this statement repeatedly to increase the data volume
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
-- Analyze the tables
ANALYZE TABLE t1, t3;Scenario 1: Optimize a multi-table JOIN query
This scenario shows how the optimizer uses rewriting to leverage indexes when an OR condition spans two tables.
Disable the optimization feature and observe the original execution plan.
-- Disable the optimization feature SET polar_optimizer_switch='or_expansion=off'; -- Analyze the statement DESC SELECT * FROM t1,t3 WHERE t3.c1 >98 OR t1.a<5;Result analysis: The execution plan shows a
Hash Joinand full table scans ont1andt3. The optimizer failed to use the indexes ont1.aandt3.c1.+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+ | 1 | SIMPLE | t1 | NULL | ALL | idx_a | NULL | NULL | NULL | 1280 | 100.00 | NULL | | 1 | SIMPLE | t3 | NULL | ALL | idx_c1 | NULL | NULL | NULL | 6591 | 100.00 | Using where; Using join buffer (hash join) | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+Enable the optimization feature and view the rewritten execution plan.
-- Enable the optimization feature SET polar_optimizer_switch='or_expansion=on'; -- Lower the threshold to trigger the optimization SET cbqt_cost_threshold=1; -- Analyze the statement DESC SELECT * FROM t1,t3 WHERE t3.c1 >98 OR t1.a<5;Result analysis: The execution plan is adjusted to use
UNION ALL. This allows the query to use the indexes ont1.aandt3.c1, achieving the same effect as a manual rewrite toUNION ALL.+----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+ | 1 | PRIMARY | t1 | NULL | range | idx_a | idx_a | 5 | NULL | 256 | 100.00 | Using index condition; Using MRR | | 1 | PRIMARY | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 6400 | 100.00 | Using join buffer (hash join) | | 2 | UNION | t3 | NULL | range | idx_c1 | idx_c1 | 4 | NULL | 128 | 100.00 | Using index condition; Using MRR | | 2 | UNION | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 640 | 66.67 | Using where; Using join buffer (hash join) | +----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+
Scenario 2: Optimize a Top-K query (OR clause)
This scenario shows how the optimizer rewrites an OR condition to UNION ALL and pushes down the LIMIT clause for a single-table ORDER BY ... LIMIT query. This process avoids large-scale sorting.
Disable the optimization feature and observe the original execution plan.
-- Disable the optimization feature SET polar_optimizer_switch='or_expansion=off'; -- Analyze the statement DESC ANALYZE SELECT c2 FROM t3 WHERE (c2 = 2 OR c2= 0 ) ORDER BY t3.c3 DESC LIMIT 5;Result analysis: The execution plan uses an
Index range scanto retrieve all rows that satisfy thec2=2orc2=0condition, and then performs aSortoperation. The execution duration is about 200 milliseconds.+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 5 row(s) (actual time=193.389..193.393 rows=5 loops=1) -> Sort: t3.c3 DESC, limit input to 5 row(s) per chunk (cost=641.82 rows=3200) (actual time=193.386..193.388 rows=5 loops=1) -> Index range scan on t3 using idx_c2_c3, with index condition: ((t3.c2 = 2) or (t3.c2 = 0)) (actual time=0.348..187.455 rows=3200 loops=1) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.20 sec)Enable the optimization feature and view the rewritten execution plan.
-- Disable the optimization feature SET polar_optimizer_switch='or_expansion=on'; -- Lower the threshold to trigger the optimization SET cbqt_cost_threshold=1; -- Analytic statement DESC ANALYZE SELECT c2 FROM t3 WHERE (c2 = 2 OR c2= 0 ) ORDER BY t3.c3 DESC LIMIT 5;Result analysis: The execution plan changes to use
UNION ALL. It performs anIndex lookupand appliesLIMIT 5to each branch (c2=2andc2=0). The optimizer then merges the two sorted 5-row result sets, which eliminates the need for a global sort. The execution duration drops to about 1 millisecond.+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 5 row(s) (actual time=1.249..1.254 rows=5 loops=1) -> Sort: derived_1_2.Name_exp_1 DESC, limit input to 5 row(s) per chunk (actual time=0.104..0.106 rows=5 loops=1) -> Table scan on derived_1_2 (actual time=0.006..0.013 rows=10 loops=1) -> Union materialize (actual time=1.246..1.249 rows=5 loops=1) -> Limit: 5 row(s) (actual time=0.336..0.571 rows=5 loops=1) -> Index lookup on t3 using idx_c2_c3 (c2=2; iterate backwards) (cost=0.00 rows=5) (actual time=0.333..0.566 rows=5 loops=1) -> Limit: 5 row(s) (actual time=0.215..0.431 rows=5 loops=1) -> Index lookup on t3 using idx_c2_c3 (c2=0; iterate backwards) (cost=0.00 rows=5) (actual time=0.214..0.427 rows=5 loops=1) | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec)
Scenario 3: Optimize a Top-K query (IN-LIST)
An IN-LIST is logically equivalent to an OR clause, so it also supports Top-K optimization.
Disable the optimization feature and observe the original execution plan.
-- Disable the optimization feature SET polar_optimizer_switch='or_expansion=off'; -- Analyze the statement DESC ANALYZE SELECT c2 FROM t3 WHERE c2 IN (2, 0) ORDER BY t3.c3 DESC LIMIT 5;Result analysis: The execution plan uses an
Index range scanto retrieve all rows that satisfy thet3.c2 in (2,0)condition, and then performs aSortoperation. The execution duration is about 200 milliseconds.+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 5 row(s) (actual time=197.497..197.501 rows=5 loops=1) -> Sort: t3.c3 DESC, limit input to 5 row(s) per chunk (cost=641.82 rows=3200) (actual time=197.494..197.496 rows=5 loops=1) -> Index range scan on t3 using idx_c2_c3, with index condition: (t3.c2 in (2,0)) (actual time=0.319..191.560 rows=3200 loops=1) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.20 sec)Enable the optimization feature and view the rewritten execution plan.
-- Disable the optimization feature SET polar_optimizer_switch='or_expansion=on'; -- Lower the threshold to trigger the optimization SET cbqt_cost_threshold=1; -- Analytic statement DESC ANALYZE SELECT c2 FROM t3 WHERE c2 IN (2, 0) ORDER BY t3.c3 DESC LIMIT 5;Result analysis: The execution plan changes to use
UNION ALL. It performs anIndex lookupand appliesLIMIT 5to each branch (c2=2andc2=0). The optimizer then merges the two sorted 5-row result sets, which eliminates the need for a global sort.+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 5 row(s) (actual time=1.256..1.260 rows=5 loops=1) -> Sort: derived_1_2.Name_exp_1 DESC, limit input to 5 row(s) per chunk (actual time=0.090..0.093 rows=5 loops=1) -> Table scan on derived_1_2 (actual time=0.005..0.012 rows=10 loops=1) -> Union materialize (actual time=1.252..1.255 rows=5 loops=1) -> Limit: 5 row(s) (actual time=0.259..0.545 rows=5 loops=1) -> Index lookup on t3 using idx_c2_c3 (c2=2; iterate backwards) (cost=0.00 rows=5) (actual time=0.256..0.540 rows=5 loops=1) -> Limit: 5 row(s) (actual time=0.237..0.455 rows=5 loops=1) -> Index lookup on t3 using idx_c2_c3 (c2=0; iterate backwards) (cost=0.00 rows=5) (actual time=0.236..0.451 rows=5 loops=1) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
Use HINTs for manual intervention
In specific scenarios, you can use HINTs to control whether this optimization is enabled for a single query.
NO_OR_EXPAND(@QB_NAME): Forcibly disables theORexpansion optimization for the specified query block.DESC SELECT /*+NO_OR_EXPAND(@subq1) */ * FROM t1 WHERE EXISTS (SELECT /*+ QB_NAME(subq1) */ 1 FROM t3 WHERE (t1.a = 1 OR t1.b = 2) AND t3.c1 < 5 AND t1.b = t3.c1); +----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+ | 1 | SIMPLE | t1 | NULL | ALL | idx_a | NULL | NULL | NULL | 640 | 19.00 | Using where | | 1 | SIMPLE | t3 | NULL | ref | idx_c1 | idx_c1 | 4 | test2.t1.b | 64 | 100.00 | Using index; FirstMatch(t1) | +----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+If a
WHEREclause contains multipleORexpressions, you can useOR_EXPAND(@QB_NAME idx)to force a specific expression to be transformed into aUNION ALLstructure. Theidxparameter specifies the position of the expression in theWHEREclause, where the index starts from 0. In the example, the expression(t3.c2 = 1 OR t1.b = 2)is expanded into aUNION ALLstructure.DESC format=tree SELECT /*+OR_EXPAND(@subq1 3) */ * FROM t1 WHERE EXISTS (SELECT /*+ QB_NAME(subq1) */ 1 FROM t3 WHERE (t3.c2 = 999 OR t1.b = 999) AND t3.c1 < 5 AND t1.b = t3.c1 AND (t3.c2= 1 OR t1.b = 2)); +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Filter: exists(select #2) (cost=64.75 rows=640) -> Table scan on t1 (cost=64.75 rows=640) -> Select #2 (subquery in condition; dependent) -> Limit: 1 row(s) -> Append -> Stream results -> Filter: (t3.c2 = 1) (cost=17.45 rows=32) -> Index lookup on t3 using idx_c1 (c1=t1.b), with index condition: ((t1.b = 999) and (t3.c1 < 5)) (cost=17.45 rows=64) -> Stream results -> Filter: (t3.c1 = 2) (cost=0.51 rows=0) -> Index lookup on t3 using idx_c2_c3 (c2=999), with index condition: ((t1.b = 2) and lnnvl((t3.c2 = 1))) (cost=0.51 rows=1)OR_EXPAND(@QB_NAME): Forcibly enables theORexpansion optimization for the specified query block (qb_name).DESC SELECT /*+OR_EXPAND(@subq1) */ * FROM t1 WHERE EXISTS (SELECT /*+ QB_NAME(subq1) */ 1 FROM t3 WHERE (t3.c1 = 1 OR t1.b = 2) AND t3.c1 < 5 AND t1.b = t3.c1); +----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+ | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 640 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | t3 | NULL | ref | idx_c1 | idx_c1 | 4 | const | 64 | 100.00 | Using where; Using index | | 3 | DEPENDENT UNION | t3 | NULL | ref | idx_c1 | idx_c1 | 4 | const | 64 | 100.00 | Using index condition; Using index | +----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+