All Products
Search
Document Center

PolarDB:Transforming OR/IN expressions into UNION ALL

Last Updated:Mar 30, 2026

PolarDB for MySQL automatically rewrites eligible OR/IN expressions into a UNION ALL structure so that queries can use indexes effectively—without requiring any changes to your SQL.

How it works

A single index scan can use AND conditions to filter on index columns, but not OR conditions that span multiple tables. When an OR condition involves columns from two or more tables, the MySQL optimizer treats it as a post-join filter and falls back to a full table scan and a hash join.

For example, the following query cannot use the indexes on t1.b or t3.c1:

-- Before optimization: full table scan, high execution time
EXPLAIN ANALYZE SELECT * FROM t1, t3 WHERE t3.c1 > 98 OR t1.b <= 0;

-> Filter: ((t3.c1 > 98) or (t1.b <= 0)) ... (actual time=115.259..5416.434 ...)
    -> Inner hash join ...
        -> Table scan on t3 ...
        -> Hash
          -> Table scan on t1 ...

This OR query is logically equivalent to two separate queries merged with UNION ALL. When rewritten, each branch can use its respective index:

-- Manually rewritten to UNION ALL: indexes used, execution time drops significantly
EXPLAIN ANALYZE
SELECT * FROM t1, t3 WHERE t1.b <= 0
UNION ALL
SELECT * FROM t1, t3 WHERE t3.c1 > 98 AND (t1.b > 0 OR (t1.b <= 0) IS NULL);

-> Append (actual time=58.272..302.546 ...)
    ...
    -> Index range scan on t3 using idx_c1 ...

PolarDB automates this rewrite. During the plan generation phase, the optimizer evaluates whether rewriting an OR expression into a UNION ALL structure would lower the query cost. It compares both cost estimates and executes the cheaper plan. No SQL modifications are needed.

Applicability

  • Product series: Cluster Edition, Standard Edition

  • Engine version: MySQL 8.0.2, revision version 8.0.2.2.32 or later

This feature is in canary release. It is enabled by default on read-only (RO) nodes and requires additional settings on read-write (RW) nodes. To use this feature on RW nodes, submit a ticket.

Enable and configure query rewrite optimization

Control the optimization behavior using the following parameters.

Parameter names differ between the PolarDB console and a database session:

  • PolarDB console: Parameters use the loose_ prefix for compatibility with MySQL configuration files. Find and modify parameters with the loose_ prefix.

  • Database session (command line or client): Remove the loose_ prefix when using the SET command.

Parameter Level Description
loose_polar_optimizer_switch Global/Session Main switch for OR expansion. Set or_expansion=on to enable (default) or or_expansion=off to disable.
loose_cbqt_cost_threshold Global/Session Cost threshold that triggers the rewrite. The optimizer attempts a rewrite only when the estimated cost of the original query (visible in EXPLAIN) exceeds this value. Value range: 0–18,446,744,073,709,551,615. Default: 100000.
Keep loose_cbqt_cost_threshold at its default value. Setting it to 0 causes the optimizer to attempt rewriting all eligible queries, which increases optimization time for simple queries and can affect business performance.

Limitations

This feature applies only when all the following conditions are met.

General limits (apply to all transformation types):

  • The number of items in the OR clause or IN-list cannot exceed 10.

  • The query block cannot contain subqueries, GROUP BY clauses, window functions, DISTINCT clauses, or aggregate functions.

General `UNION ALL` transformation (multi-table JOINs):

  • OR clause:

    • The OR condition must involve two or more tables.

    • Each OR clause must use the field=const pattern or be able to use an index effectively.

      • field=const: field is a table column; const is a constant value.

      • Effective index usage: for example, in t1.f1=t2.f2, f1 must be a prefix of an index on t1 and f2 must be a prefix of an index on t2.

  • IN-list: Not transformed into UNION ALL because the range access method is more optimal.

Top-K transformation (single-table `ORDER BY ... LIMIT` queries):

  • OR clause: All OR conditions must apply to the same column. That column and the ORDER BY column must both be a prefix of the same index. For example, with index (c2, c3), the query must be WHERE c2 = ... OR c2 = ... ORDER BY c3.

  • IN-list: The column in the IN-list and the ORDER BY column must both be a prefix of the same index.

Example: Verify the optimization effect

Data preparation

-- Create and populate table t1
CREATE TABLE `t1` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  KEY `idx_a` (`a`)
) ENGINE=InnoDB;

INSERT INTO `t1` VALUES (1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10);
-- Run repeatedly to increase data volume
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;

-- Create and populate table t3
CREATE TABLE `t3` (
  `c1` int(11) NOT NULL,
  `c2` int(11) DEFAULT NULL,
  `c3` int(11) DEFAULT NULL,
  `c4` int(11) DEFAULT NULL,
  KEY `idx_c1`(`c1`),
  KEY `idx_c2_c3` (`c2`,`c3`)
) ENGINE=InnoDB;

INSERT INTO `t3` VALUES (1,0,1,0),(2,0,2,0),(3,0,3,0),(4,0,4,0),(5,0,5,0),(6,0,6,0),(7,0,7,0),(8,0,8,0),(9,0,9,0),(10,0,10,0),(11,0,11,0),(12,0,12,0),(13,0,13,0),(14,0,14,0),(15,0,15,0),(16,0,16,0),(17,0,17,0),(18,0,18,0),(19,0,19,0),(20,0,20,0),(21,0,21,0),(22,0,22,0),(23,0,23,0),(24,0,24,0),(25,1,25,0),(26,1,26,0),(27,1,27,0),(28,1,28,0),(29,1,29,0),(30,1,30,0),(31,1,31,0),(32,1,32,0),(33,1,33,0),(34,1,34,0),(35,1,35,0),(36,1,36,0),(37,1,37,0),(38,1,38,0),(39,1,39,0),(40,1,40,0),(41,1,41,0),(42,1,42,0),(43,1,43,0),(44,1,44,0),(45,1,45,0),(46,1,46,0),(47,1,47,0),(48,1,48,0),(49,1,49,0),(50,1,50,1),(51,1,51,1),(52,1,52,1),(53,1,53,1),(54,1,54,1),(55,1,55,1),(56,1,56,1),(57,1,57,1),(58,1,58,1),(59,1,59,1),(60,1,60,1),(61,1,61,1),(62,1,62,1),(63,1,63,1),(64,1,64,1),(65,1,65,1),(66,1,66,1),(67,1,67,1),(68,1,68,1),(69,1,69,1),(70,1,70,1),(71,1,71,1),(72,1,72,1),(73,1,73,1),(74,1,74,1),(75,2,75,1),(76,2,76,1),(77,2,77,1),(78,2,78,1),(79,2,79,1),(80,2,80,1),(81,2,81,1),(82,2,82,1),(83,2,83,1),(84,2,84,1),(85,2,85,1),(86,2,86,1),(87,2,87,1),(88,2,88,1),(89,2,89,1),(90,2,90,1),(91,2,91,1),(92,2,92,1),(93,2,93,1),(94,2,94,1),(95,2,95,1),(96,2,96,1),(97,2,97,1),(98,2,98,1),(99,2,99,1),(100,2,100,1);

-- Run repeatedly to increase data volume
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;

-- Analyze the tables
ANALYZE TABLE t1, t3;

Scenario 1: Optimize a multi-table JOIN query

This scenario shows how the optimizer rewrites an OR condition that spans two tables to leverage indexes.

Without optimization: The execution plan falls back to a hash join with full table scans on both t1 and t3. Neither idx_a on t1.a nor idx_c1 on t3.c1 is used.

SET polar_optimizer_switch='or_expansion=off';
DESC SELECT * FROM t1, t3 WHERE t3.c1 > 98 OR t1.a < 5;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                      |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | idx_a         | NULL | NULL    | NULL | 1280 |   100.00 | NULL                                       |
|  1 | SIMPLE      | t3    | NULL       | ALL  | idx_c1        | NULL | NULL    | NULL | 6591 |   100.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+

With optimization: The execution plan switches to UNION ALL. Each branch uses its respective index (idx_a on t1.a and idx_c1 on t3.c1)—the same result as a manual UNION ALL rewrite.

SET polar_optimizer_switch='or_expansion=on';
SET cbqt_cost_threshold=1;  -- Lower the threshold to trigger the optimization
DESC SELECT * FROM t1, t3 WHERE t3.c1 > 98 OR t1.a < 5;
+----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key    | key_len | ref  | rows | filtered | Extra                                      |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+
|  1 | PRIMARY     | t1    | NULL       | range | idx_a         | idx_a  | 5       | NULL |  256 |   100.00 | Using index condition; Using MRR           |
|  1 | PRIMARY     | t3    | NULL       | ALL   | NULL          | NULL   | NULL    | NULL | 6400 |   100.00 | Using join buffer (hash join)              |
|  2 | UNION       | t3    | NULL       | range | idx_c1        | idx_c1 | 4       | NULL |  128 |   100.00 | Using index condition; Using MRR           |
|  2 | UNION       | t1    | NULL       | ALL   | NULL          | NULL   | NULL    | NULL |  640 |    66.67 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+

MRR (Multi-Range Read) in the output indicates a MySQL optimization that batches random disk reads to improve I/O efficiency.

Scenario 2: Optimize a Top-K query (OR clause)

This scenario shows how the optimizer rewrites an OR condition into UNION ALL and pushes the LIMIT clause down to each branch, eliminating a large-scale global sort.

Without optimization: The plan uses an index range scan to retrieve all rows matching c2=2 or c2=0, then sorts all rows before applying the limit. Execution time is about 200 milliseconds.

SET polar_optimizer_switch='or_expansion=off';
DESC ANALYZE SELECT c2 FROM t3 WHERE (c2 = 2 OR c2 = 0) ORDER BY t3.c3 DESC LIMIT 5;
| -> Limit: 5 row(s)  (actual time=193.389..193.393 rows=5 loops=1)
    -> Sort: t3.c3 DESC, limit input to 5 row(s) per chunk  (cost=641.82 rows=3200) (actual time=193.386..193.388 rows=5 loops=1)
        -> Index range scan on t3 using idx_c2_c3, with index condition: ((t3.c2 = 2) or (t3.c2 = 0))  (actual time=0.348..187.455 rows=3200 loops=1)
|
1 row in set (0.20 sec)

With optimization: The plan rewrites to UNION ALL. It applies LIMIT 5 to each branch (c2=2 and c2=0) separately using an index lookup, then merges the two 5-row result sets. No global sort is needed. Execution time drops to about 1 millisecond.

SET polar_optimizer_switch='or_expansion=on';
SET cbqt_cost_threshold=1;
DESC ANALYZE SELECT c2 FROM t3 WHERE (c2 = 2 OR c2 = 0) ORDER BY t3.c3 DESC LIMIT 5;
| -> Limit: 5 row(s)  (actual time=1.249..1.254 rows=5 loops=1)
    -> Sort: derived_1_2.Name_exp_1 DESC, limit input to 5 row(s) per chunk  (actual time=0.104..0.106 rows=5 loops=1)
        -> Table scan on derived_1_2  (actual time=0.006..0.013 rows=10 loops=1)
            -> Union materialize  (actual time=1.246..1.249 rows=5 loops=1)
                -> Limit: 5 row(s)  (actual time=0.336..0.571 rows=5 loops=1)
                    -> Index lookup on t3 using idx_c2_c3 (c2=2; iterate backwards)  (cost=0.00 rows=5) (actual time=0.333..0.566 rows=5 loops=1)
                -> Limit: 5 row(s)  (actual time=0.215..0.431 rows=5 loops=1)
                    -> Index lookup on t3 using idx_c2_c3 (c2=0; iterate backwards)  (cost=0.00 rows=5) (actual time=0.214..0.427 rows=5 loops=1)
 |
1 row in set (0.01 sec)

Scenario 3: Optimize a Top-K query (IN-list)

An IN-list is logically equivalent to an OR clause and supports the same Top-K optimization.

Without optimization: The plan uses an index range scan across all rows matching c2 IN (2, 0), then sorts before applying the limit. Execution time is about 200 milliseconds.

SET polar_optimizer_switch='or_expansion=off';
DESC ANALYZE SELECT c2 FROM t3 WHERE c2 IN (2, 0) ORDER BY t3.c3 DESC LIMIT 5;
| -> Limit: 5 row(s)  (actual time=197.497..197.501 rows=5 loops=1)
    -> Sort: t3.c3 DESC, limit input to 5 row(s) per chunk  (cost=641.82 rows=3200) (actual time=197.494..197.496 rows=5 loops=1)
        -> Index range scan on t3 using idx_c2_c3, with index condition: (t3.c2 in (2,0))  (actual time=0.319..191.560 rows=3200 loops=1)
 |
1 row in set (0.20 sec)

With optimization: The plan applies LIMIT 5 per branch and merges the results, eliminating the global sort. Execution time drops to near 0 milliseconds.

SET polar_optimizer_switch='or_expansion=on';
SET cbqt_cost_threshold=1;
DESC ANALYZE SELECT c2 FROM t3 WHERE c2 IN (2, 0) ORDER BY t3.c3 DESC LIMIT 5;
| -> Limit: 5 row(s)  (actual time=1.256..1.260 rows=5 loops=1)
    -> Sort: derived_1_2.Name_exp_1 DESC, limit input to 5 row(s) per chunk  (actual time=0.090..0.093 rows=5 loops=1)
        -> Table scan on derived_1_2  (actual time=0.005..0.012 rows=10 loops=1)
            -> Union materialize  (actual time=1.252..1.255 rows=5 loops=1)
                -> Limit: 5 row(s)  (actual time=0.259..0.545 rows=5 loops=1)
                    -> Index lookup on t3 using idx_c2_c3 (c2=2; iterate backwards)  (cost=0.00 rows=5) (actual time=0.256..0.540 rows=5 loops=1)
                -> Limit: 5 row(s)  (actual time=0.237..0.455 rows=5 loops=1)
                    -> Index lookup on t3 using idx_c2_c3 (c2=0; iterate backwards)  (cost=0.00 rows=5) (actual time=0.236..0.451 rows=5 loops=1)
|
1 row in set (0.00 sec)

Use hints for manual control

The optimizer applies OR expansion automatically when the estimated query cost exceeds cbqt_cost_threshold. Use optimizer hints to override this behavior for a specific query without changing global or session parameters.

The following hints are available:

Hint Effect
NO_OR_EXPAND(@QB_NAME) Force-disables OR expansion for the specified query block
OR_EXPAND(@QB_NAME) Force-enables OR expansion for the specified query block
OR_EXPAND(@QB_NAME idx) Force-expands only the OR expression at 0-based position idx in the WHERE clause

Disable OR expansion for a query block:

DESC SELECT /*+ NO_OR_EXPAND(@subq1) */ * FROM t1
WHERE EXISTS (
  SELECT /*+ QB_NAME(subq1) */ 1 FROM t3
  WHERE (t1.a = 1 OR t1.b = 2) AND t3.c1 < 5 AND t1.b = t3.c1
);

+----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key    | key_len | ref        | rows | filtered | Extra                       |
+----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | idx_a         | NULL   | NULL    | NULL       |  640 |    19.00 | Using where                 |
|  1 | SIMPLE      | t3    | NULL       | ref  | idx_c1        | idx_c1 | 4       | test2.t1.b |   64 |   100.00 | Using index; FirstMatch(t1) |
+----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+

Enable OR expansion for a query block:

DESC SELECT /*+ OR_EXPAND(@subq1) */ * FROM t1
WHERE EXISTS (
  SELECT /*+ QB_NAME(subq1) */ 1 FROM t3
  WHERE (t3.c1 = 1 OR t1.b = 2) AND t3.c1 < 5 AND t1.b = t3.c1
);

+----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+
| id | select_type        | table | partitions | type | possible_keys | key    | key_len | ref   | rows | filtered | Extra                              |
+----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+
|  1 | PRIMARY            | t1    | NULL       | ALL  | NULL          | NULL   | NULL    | NULL  |  640 |   100.00 | Using where                        |
|  2 | DEPENDENT SUBQUERY | t3    | NULL       | ref  | idx_c1        | idx_c1 | 4       | const |   64 |   100.00 | Using where; Using index           |
|  3 | DEPENDENT UNION    | t3    | NULL       | ref  | idx_c1        | idx_c1 | 4       | const |   64 |   100.00 | Using index condition; Using index |
+----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+

Expand a single `OR` expression when a `WHERE` clause contains multiple:

Use OR_EXPAND(@QB_NAME idx) when you want to expand only one expression. The idx parameter is the 0-based position of the target expression in the WHERE clause. In the following example, OR_EXPAND(@subq1 3) expands the expression at position 3, which is (t3.c2 = 1 OR t1.b = 2):

DESC FORMAT=TREE SELECT /*+ OR_EXPAND(@subq1 3) */ * FROM t1
WHERE EXISTS (
  SELECT /*+ QB_NAME(subq1) */ 1 FROM t3
  WHERE (t3.c2 = 999 OR t1.b = 999) AND t3.c1 < 5 AND t1.b = t3.c1 AND (t3.c2 = 1 OR t1.b = 2)
);

| -> Filter: exists(select #2)  (cost=64.75 rows=640)
    -> Table scan on t1  (cost=64.75 rows=640)
    -> Select #2 (subquery in condition; dependent)
        -> Limit: 1 row(s)
            -> Append
                -> Stream results
                    -> Filter: (t3.c2 = 1)  (cost=17.45 rows=32)
                        -> Index lookup on t3 using idx_c1 (c1=t1.b), with index condition: ((t1.b = 999) and (t3.c1 < 5))  (cost=17.45 rows=64)
                -> Stream results
                    -> Filter: (t3.c1 = 2)  (cost=0.51 rows=0)
                        -> Index lookup on t3 using idx_c2_c3 (c2=999), with index condition: ((t1.b = 2) and lnnvl((t3.c2 = 1)))  (cost=0.51 rows=1)