PolarDB for MySQL automatically rewrites eligible OR/IN expressions into a UNION ALL structure so that queries can use indexes effectively—without requiring any changes to your SQL.
How it works
A single index scan can use AND conditions to filter on index columns, but not OR conditions that span multiple tables. When an OR condition involves columns from two or more tables, the MySQL optimizer treats it as a post-join filter and falls back to a full table scan and a hash join.
For example, the following query cannot use the indexes on t1.b or t3.c1:
-- Before optimization: full table scan, high execution time
EXPLAIN ANALYZE SELECT * FROM t1, t3 WHERE t3.c1 > 98 OR t1.b <= 0;
-> Filter: ((t3.c1 > 98) or (t1.b <= 0)) ... (actual time=115.259..5416.434 ...)
-> Inner hash join ...
-> Table scan on t3 ...
-> Hash
-> Table scan on t1 ...
This OR query is logically equivalent to two separate queries merged with UNION ALL. When rewritten, each branch can use its respective index:
-- Manually rewritten to UNION ALL: indexes used, execution time drops significantly
EXPLAIN ANALYZE
SELECT * FROM t1, t3 WHERE t1.b <= 0
UNION ALL
SELECT * FROM t1, t3 WHERE t3.c1 > 98 AND (t1.b > 0 OR (t1.b <= 0) IS NULL);
-> Append (actual time=58.272..302.546 ...)
...
-> Index range scan on t3 using idx_c1 ...
PolarDB automates this rewrite. During the plan generation phase, the optimizer evaluates whether rewriting an OR expression into a UNION ALL structure would lower the query cost. It compares both cost estimates and executes the cheaper plan. No SQL modifications are needed.
Applicability
-
Product series: Cluster Edition, Standard Edition
-
Engine version: MySQL 8.0.2, revision version 8.0.2.2.32 or later
This feature is in canary release. It is enabled by default on read-only (RO) nodes and requires additional settings on read-write (RW) nodes. To use this feature on RW nodes, submit a ticket.
Enable and configure query rewrite optimization
Control the optimization behavior using the following parameters.
Parameter names differ between the PolarDB console and a database session:
-
PolarDB console: Parameters use the
loose_prefix for compatibility with MySQL configuration files. Find and modify parameters with theloose_prefix. -
Database session (command line or client): Remove the
loose_prefix when using theSETcommand.
| Parameter | Level | Description |
|---|---|---|
loose_polar_optimizer_switch |
Global/Session | Main switch for OR expansion. Set or_expansion=on to enable (default) or or_expansion=off to disable. |
loose_cbqt_cost_threshold |
Global/Session | Cost threshold that triggers the rewrite. The optimizer attempts a rewrite only when the estimated cost of the original query (visible in EXPLAIN) exceeds this value. Value range: 0–18,446,744,073,709,551,615. Default: 100000. |
Keeploose_cbqt_cost_thresholdat its default value. Setting it to0causes the optimizer to attempt rewriting all eligible queries, which increases optimization time for simple queries and can affect business performance.
Limitations
This feature applies only when all the following conditions are met.
General limits (apply to all transformation types):
-
The number of items in the
ORclause orIN-list cannot exceed 10. -
The query block cannot contain subqueries,
GROUP BYclauses, window functions,DISTINCTclauses, or aggregate functions.
General `UNION ALL` transformation (multi-table JOINs):
-
ORclause:-
The
ORcondition must involve two or more tables. -
Each
ORclause must use thefield=constpattern or be able to use an index effectively.-
field=const:fieldis a table column;constis a constant value. -
Effective index usage: for example, in
t1.f1=t2.f2,f1must be a prefix of an index ont1andf2must be a prefix of an index ont2.
-
-
-
IN-list: Not transformed intoUNION ALLbecause therangeaccess method is more optimal.
Top-K transformation (single-table `ORDER BY ... LIMIT` queries):
-
ORclause: AllORconditions must apply to the same column. That column and theORDER BYcolumn must both be a prefix of the same index. For example, with index(c2, c3), the query must beWHERE c2 = ... OR c2 = ... ORDER BY c3. -
IN-list: The column in theIN-list and theORDER BYcolumn must both be a prefix of the same index.
Example: Verify the optimization effect
Data preparation
-- Create and populate table t1
CREATE TABLE `t1` (
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
KEY `idx_a` (`a`)
) ENGINE=InnoDB;
INSERT INTO `t1` VALUES (1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10);
-- Run repeatedly to increase data volume
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
-- Create and populate table t3
CREATE TABLE `t3` (
`c1` int(11) NOT NULL,
`c2` int(11) DEFAULT NULL,
`c3` int(11) DEFAULT NULL,
`c4` int(11) DEFAULT NULL,
KEY `idx_c1`(`c1`),
KEY `idx_c2_c3` (`c2`,`c3`)
) ENGINE=InnoDB;
INSERT INTO `t3` VALUES (1,0,1,0),(2,0,2,0),(3,0,3,0),(4,0,4,0),(5,0,5,0),(6,0,6,0),(7,0,7,0),(8,0,8,0),(9,0,9,0),(10,0,10,0),(11,0,11,0),(12,0,12,0),(13,0,13,0),(14,0,14,0),(15,0,15,0),(16,0,16,0),(17,0,17,0),(18,0,18,0),(19,0,19,0),(20,0,20,0),(21,0,21,0),(22,0,22,0),(23,0,23,0),(24,0,24,0),(25,1,25,0),(26,1,26,0),(27,1,27,0),(28,1,28,0),(29,1,29,0),(30,1,30,0),(31,1,31,0),(32,1,32,0),(33,1,33,0),(34,1,34,0),(35,1,35,0),(36,1,36,0),(37,1,37,0),(38,1,38,0),(39,1,39,0),(40,1,40,0),(41,1,41,0),(42,1,42,0),(43,1,43,0),(44,1,44,0),(45,1,45,0),(46,1,46,0),(47,1,47,0),(48,1,48,0),(49,1,49,0),(50,1,50,1),(51,1,51,1),(52,1,52,1),(53,1,53,1),(54,1,54,1),(55,1,55,1),(56,1,56,1),(57,1,57,1),(58,1,58,1),(59,1,59,1),(60,1,60,1),(61,1,61,1),(62,1,62,1),(63,1,63,1),(64,1,64,1),(65,1,65,1),(66,1,66,1),(67,1,67,1),(68,1,68,1),(69,1,69,1),(70,1,70,1),(71,1,71,1),(72,1,72,1),(73,1,73,1),(74,1,74,1),(75,2,75,1),(76,2,76,1),(77,2,77,1),(78,2,78,1),(79,2,79,1),(80,2,80,1),(81,2,81,1),(82,2,82,1),(83,2,83,1),(84,2,84,1),(85,2,85,1),(86,2,86,1),(87,2,87,1),(88,2,88,1),(89,2,89,1),(90,2,90,1),(91,2,91,1),(92,2,92,1),(93,2,93,1),(94,2,94,1),(95,2,95,1),(96,2,96,1),(97,2,97,1),(98,2,98,1),(99,2,99,1),(100,2,100,1);
-- Run repeatedly to increase data volume
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
-- Analyze the tables
ANALYZE TABLE t1, t3;
Scenario 1: Optimize a multi-table JOIN query
This scenario shows how the optimizer rewrites an OR condition that spans two tables to leverage indexes.
Without optimization: The execution plan falls back to a hash join with full table scans on both t1 and t3. Neither idx_a on t1.a nor idx_c1 on t3.c1 is used.
SET polar_optimizer_switch='or_expansion=off';
DESC SELECT * FROM t1, t3 WHERE t3.c1 > 98 OR t1.a < 5;+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | idx_a | NULL | NULL | NULL | 1280 | 100.00 | NULL |
| 1 | SIMPLE | t3 | NULL | ALL | idx_c1 | NULL | NULL | NULL | 6591 | 100.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
With optimization: The execution plan switches to UNION ALL. Each branch uses its respective index (idx_a on t1.a and idx_c1 on t3.c1)—the same result as a manual UNION ALL rewrite.
SET polar_optimizer_switch='or_expansion=on';
SET cbqt_cost_threshold=1; -- Lower the threshold to trigger the optimization
DESC SELECT * FROM t1, t3 WHERE t3.c1 > 98 OR t1.a < 5;+----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+
| 1 | PRIMARY | t1 | NULL | range | idx_a | idx_a | 5 | NULL | 256 | 100.00 | Using index condition; Using MRR |
| 1 | PRIMARY | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 6400 | 100.00 | Using join buffer (hash join) |
| 2 | UNION | t3 | NULL | range | idx_c1 | idx_c1 | 4 | NULL | 128 | 100.00 | Using index condition; Using MRR |
| 2 | UNION | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 640 | 66.67 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+
MRR (Multi-Range Read) in the output indicates a MySQL optimization that batches random disk reads to improve I/O efficiency.
Scenario 2: Optimize a Top-K query (OR clause)
This scenario shows how the optimizer rewrites an OR condition into UNION ALL and pushes the LIMIT clause down to each branch, eliminating a large-scale global sort.
Without optimization: The plan uses an index range scan to retrieve all rows matching c2=2 or c2=0, then sorts all rows before applying the limit. Execution time is about 200 milliseconds.
SET polar_optimizer_switch='or_expansion=off';
DESC ANALYZE SELECT c2 FROM t3 WHERE (c2 = 2 OR c2 = 0) ORDER BY t3.c3 DESC LIMIT 5;| -> Limit: 5 row(s) (actual time=193.389..193.393 rows=5 loops=1)
-> Sort: t3.c3 DESC, limit input to 5 row(s) per chunk (cost=641.82 rows=3200) (actual time=193.386..193.388 rows=5 loops=1)
-> Index range scan on t3 using idx_c2_c3, with index condition: ((t3.c2 = 2) or (t3.c2 = 0)) (actual time=0.348..187.455 rows=3200 loops=1)
|
1 row in set (0.20 sec)
With optimization: The plan rewrites to UNION ALL. It applies LIMIT 5 to each branch (c2=2 and c2=0) separately using an index lookup, then merges the two 5-row result sets. No global sort is needed. Execution time drops to about 1 millisecond.
SET polar_optimizer_switch='or_expansion=on';
SET cbqt_cost_threshold=1;
DESC ANALYZE SELECT c2 FROM t3 WHERE (c2 = 2 OR c2 = 0) ORDER BY t3.c3 DESC LIMIT 5;| -> Limit: 5 row(s) (actual time=1.249..1.254 rows=5 loops=1)
-> Sort: derived_1_2.Name_exp_1 DESC, limit input to 5 row(s) per chunk (actual time=0.104..0.106 rows=5 loops=1)
-> Table scan on derived_1_2 (actual time=0.006..0.013 rows=10 loops=1)
-> Union materialize (actual time=1.246..1.249 rows=5 loops=1)
-> Limit: 5 row(s) (actual time=0.336..0.571 rows=5 loops=1)
-> Index lookup on t3 using idx_c2_c3 (c2=2; iterate backwards) (cost=0.00 rows=5) (actual time=0.333..0.566 rows=5 loops=1)
-> Limit: 5 row(s) (actual time=0.215..0.431 rows=5 loops=1)
-> Index lookup on t3 using idx_c2_c3 (c2=0; iterate backwards) (cost=0.00 rows=5) (actual time=0.214..0.427 rows=5 loops=1)
|
1 row in set (0.01 sec)
Scenario 3: Optimize a Top-K query (IN-list)
An IN-list is logically equivalent to an OR clause and supports the same Top-K optimization.
Without optimization: The plan uses an index range scan across all rows matching c2 IN (2, 0), then sorts before applying the limit. Execution time is about 200 milliseconds.
SET polar_optimizer_switch='or_expansion=off';
DESC ANALYZE SELECT c2 FROM t3 WHERE c2 IN (2, 0) ORDER BY t3.c3 DESC LIMIT 5;| -> Limit: 5 row(s) (actual time=197.497..197.501 rows=5 loops=1)
-> Sort: t3.c3 DESC, limit input to 5 row(s) per chunk (cost=641.82 rows=3200) (actual time=197.494..197.496 rows=5 loops=1)
-> Index range scan on t3 using idx_c2_c3, with index condition: (t3.c2 in (2,0)) (actual time=0.319..191.560 rows=3200 loops=1)
|
1 row in set (0.20 sec)
With optimization: The plan applies LIMIT 5 per branch and merges the results, eliminating the global sort. Execution time drops to near 0 milliseconds.
SET polar_optimizer_switch='or_expansion=on';
SET cbqt_cost_threshold=1;
DESC ANALYZE SELECT c2 FROM t3 WHERE c2 IN (2, 0) ORDER BY t3.c3 DESC LIMIT 5;| -> Limit: 5 row(s) (actual time=1.256..1.260 rows=5 loops=1)
-> Sort: derived_1_2.Name_exp_1 DESC, limit input to 5 row(s) per chunk (actual time=0.090..0.093 rows=5 loops=1)
-> Table scan on derived_1_2 (actual time=0.005..0.012 rows=10 loops=1)
-> Union materialize (actual time=1.252..1.255 rows=5 loops=1)
-> Limit: 5 row(s) (actual time=0.259..0.545 rows=5 loops=1)
-> Index lookup on t3 using idx_c2_c3 (c2=2; iterate backwards) (cost=0.00 rows=5) (actual time=0.256..0.540 rows=5 loops=1)
-> Limit: 5 row(s) (actual time=0.237..0.455 rows=5 loops=1)
-> Index lookup on t3 using idx_c2_c3 (c2=0; iterate backwards) (cost=0.00 rows=5) (actual time=0.236..0.451 rows=5 loops=1)
|
1 row in set (0.00 sec)
Use hints for manual control
The optimizer applies OR expansion automatically when the estimated query cost exceeds cbqt_cost_threshold. Use optimizer hints to override this behavior for a specific query without changing global or session parameters.
The following hints are available:
| Hint | Effect |
|---|---|
NO_OR_EXPAND(@QB_NAME) |
Force-disables OR expansion for the specified query block |
OR_EXPAND(@QB_NAME) |
Force-enables OR expansion for the specified query block |
OR_EXPAND(@QB_NAME idx) |
Force-expands only the OR expression at 0-based position idx in the WHERE clause |
Disable OR expansion for a query block:
DESC SELECT /*+ NO_OR_EXPAND(@subq1) */ * FROM t1
WHERE EXISTS (
SELECT /*+ QB_NAME(subq1) */ 1 FROM t3
WHERE (t1.a = 1 OR t1.b = 2) AND t3.c1 < 5 AND t1.b = t3.c1
);
+----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | idx_a | NULL | NULL | NULL | 640 | 19.00 | Using where |
| 1 | SIMPLE | t3 | NULL | ref | idx_c1 | idx_c1 | 4 | test2.t1.b | 64 | 100.00 | Using index; FirstMatch(t1) |
+----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+
Enable OR expansion for a query block:
DESC SELECT /*+ OR_EXPAND(@subq1) */ * FROM t1
WHERE EXISTS (
SELECT /*+ QB_NAME(subq1) */ 1 FROM t3
WHERE (t3.c1 = 1 OR t1.b = 2) AND t3.c1 < 5 AND t1.b = t3.c1
);
+----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 640 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | t3 | NULL | ref | idx_c1 | idx_c1 | 4 | const | 64 | 100.00 | Using where; Using index |
| 3 | DEPENDENT UNION | t3 | NULL | ref | idx_c1 | idx_c1 | 4 | const | 64 | 100.00 | Using index condition; Using index |
+----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+
Expand a single `OR` expression when a `WHERE` clause contains multiple:
Use OR_EXPAND(@QB_NAME idx) when you want to expand only one expression. The idx parameter is the 0-based position of the target expression in the WHERE clause. In the following example, OR_EXPAND(@subq1 3) expands the expression at position 3, which is (t3.c2 = 1 OR t1.b = 2):
DESC FORMAT=TREE SELECT /*+ OR_EXPAND(@subq1 3) */ * FROM t1
WHERE EXISTS (
SELECT /*+ QB_NAME(subq1) */ 1 FROM t3
WHERE (t3.c2 = 999 OR t1.b = 999) AND t3.c1 < 5 AND t1.b = t3.c1 AND (t3.c2 = 1 OR t1.b = 2)
);
| -> Filter: exists(select #2) (cost=64.75 rows=640)
-> Table scan on t1 (cost=64.75 rows=640)
-> Select #2 (subquery in condition; dependent)
-> Limit: 1 row(s)
-> Append
-> Stream results
-> Filter: (t3.c2 = 1) (cost=17.45 rows=32)
-> Index lookup on t3 using idx_c1 (c1=t1.b), with index condition: ((t1.b = 999) and (t3.c1 < 5)) (cost=17.45 rows=64)
-> Stream results
-> Filter: (t3.c1 = 2) (cost=0.51 rows=0)
-> Index lookup on t3 using idx_c2_c3 (c2=999), with index condition: ((t1.b = 2) and lnnvl((t3.c2 = 1))) (cost=0.51 rows=1)