All Products
Search
Document Center

Cost-based query rewrite

Last Updated: Jun 18, 2021

OceanBase Database V2.2.76 supports only one type of cost-based query rewrite: OR-Expansion.

It will support advanced cost-based rewriting rules for database administration such as complex view merge and window function rewrite in later versions.

OR-Expansion

OR-Expansion rewrites a query into several subqueries and the result sets of these subqueries are combined into the same result set through the UNION operator. This provides more room for the optimization of each subquery. However, the rewrite also results in the execution of multiple subqueries, which requires an analysis based on the cost.

Purposes of the OR-Expansion rewrite:

  • To allow subqueries to use different indexes to speed up the query.

    In the following example, Q1 is rewritten to Q2, where LNNVL(t1.a = 1), the predicate in Q2 ensures that the two subqueries do not generate duplicate results. Before the rewrite, Q1 generally accesses the primary table. After the rewrite, if indexes (a) and (b) are created on t1, subqueries of Q2 are allowed to use different indexes for data access.

    Q1: 
    obclient>SELECT * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
    Q2: 
    obclient>SELECT * FROM t1 WHERE t1.a = 1 UNION ALL SELECT * FROM t1.b = 1 
           AND LNNVL(t1.a = 1);

    Example:

    obclient>CREATE TABLE t1(a INT, b INT, c INT, d INT, e INT, INDEX IDX_a(a), 
             INDEX IDX_b(b));
    Query OK, 0 rows affected (0.17 sec)
    
    /*Without OR-EXPANSION rewrite, primary access path is the only option for the query*/
    obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
    +--------------------------------------------------------------+
    | Query Plan                                                                         |
    +--------------------------------------------------------------+
    | ===================================
    |ID|OPERATOR  |NAME|EST. ROWS|COST|
    -----------------------------------
    |0 |TABLE SCAN|t1  |4        |649 |
    ===================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter([t1.a = 1 OR t1.b = 1]),
          access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p0)
    
    /*After the rewrite, different index access paths are available for each subquery*/
    obclient>EXPLAIN SELECT * FROM t1 WHERE t1.a = 1 OR t1.b = 1;
    +------------------------------------------------------------------------+
    | Query Plan                                                                                         |
    +------------------------------------------------------------------------+
    | =========================================
    |ID|OPERATOR   |NAME     |EST. ROWS|COST|
    -----------------------------------------
    |0 |UNION ALL  |         |3        |190 |
    |1 | TABLE SCAN|t1(idx_a)|2        |94  |
    |2 | TABLE SCAN|t1(idx_b)|1        |95  |
    =========================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)], [UNION(t1.c, t1.c)], [UNION(t1.d, t1.d)], [UNION(t1.e, t1.e)]), filter(nil)
      1 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter(nil),
          access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p0)
      2 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter([lnnvl(t1.a = 1)]),
          access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p02
  • To allow subqueries to use different join algorithms to speed up the query and to avoid the use of CARTESIAN JOIN.

    In the following example, Q1 is rewritten to Q2. For Q1, Nested Loop Join (Cartesian product) is the only join option available. After the rewrite, Nested Loop Join, Hash Join, and Merge Join are optional for each subquery, leaving more room for optimization.

    Q1: 
    obclient>SELECT * FROM t1, t2 WHERE t1.a = t2.a OR t1.b = t2.b;
    
    Q2: 
    obclient>SELECT * FROM t1, t2 WHERE t1.a = t2.a UNION ALL
         SELECT * FROM t1, t2 WHERE t1.b = t2.b AND LNNVL(t1.a = t2.a);

    Example:

    obclient> CREATE TABLE t1(a INT, b INT);
    Query OK, 0 rows affected (0.17 sec)
    
    obclient> CREATE TABLE t2(a INT, b INT);
    Query OK, 0 rows affected (0.13 sec)
    
    /*Without the rewrite, Nested Loop Join is the only choice*/
    obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1, t2 
           WHERE t1.a = t2.a OR t1.b = t2.b;
    +--------------------------------------------------------------------------+
    | Query Plan                                                                                          |
    +--------------------------------------------------------------------------+
    | ===========================================
    |ID|OPERATOR        |NAME|EST. ROWS|COST  |
    -------------------------------------------
    |0 |NESTED-LOOP JOIN|    |3957     |585457|
    |1 | TABLE SCAN     |t1  |1000     |499   |
    |2 | TABLE SCAN     |t2  |4        |583   |
    ===========================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
          conds(nil), nl_params_([t1.a], [t1.b])
      1 - output([t1.a], [t1.b]), filter(nil),
          access([t1.a], [t1.b]), partitions(p0)
      2 - output([t2.a], [t2.b]), filter([? = t2.a OR ? = t2.b]),
          access([t2.a], [t2.b]), partitions(p0)
    
    /*After the rewrite, every subquery can use HASH JOIN*/
    obclient>  EXPLAIN SELECT * FROM t1, t2 WHERE t1.a = t2.a OR t1.b = t2.b;
    +--------------------------------------------------------------------------+
    | Query Plan                                                                                         |
    +--------------------------------------------------------------------------+
    |ID|OPERATOR    |NAME|EST. ROWS|COST|
    -------------------------------------
    |0 |UNION ALL   |    |2970     |9105|
    |1 | HASH JOIN  |    |1980     |3997|
    |2 |  TABLE SCAN|t1  |1000     |499 |
    |3 |  TABLE SCAN|t2  |1000     |499 |
    |4 | HASH JOIN  |    |990      |3659|
    |5 |  TABLE SCAN|t1  |1000     |499 |
    |6 |  TABLE SCAN|t2  |1000     |499 |
    =====================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)], [UNION(t2.a, t2.a)], [UNION(t2.b, t2.b)]), filter(nil)
      1 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
          equal_conds([t1.a = t2.a]), other_conds(nil)
      2 - output([t1.a], [t1.b]), filter(nil),
          access([t1.a], [t1.b]), partitions(p0)
      3 - output([t2.a], [t2.b]), filter(nil),
          access([t2.a], [t2.b]), partitions(p0)
      4 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
          equal_conds([t1.b = t2.b]), other_conds([lnnvl(t1.a = t2.a)])
      5 - output([t1.a], [t1.b]), filter(nil),
          access([t1.a], [t1.b]), partitions(p0)
      6 - output([t2.a], [t2.b]), filter(nil),
          access([t2.a], [t2.b]), partitions(p0)
  • To allow each subquery to separately cancel the sorting to retrieve the TOP K results faster.

    In the following example, Q1 is rewritten to Q2. For Q1, the only way of execution is to find the rows that fit the condition, sort them, and then retrieve the TOP 10 results. For Q2, assume that it has two subqueries. If indexes a and b are available, each subquery of Q2 can remove the sort operator by using an index and retrieve the TOP 10 results. Finally, sort the 20 rows retrieved by these subqueries to retrieve the final TOP 10 rows.

    Q1: 
    obclient>SELECT * FROM t1 WHERE t1.a = 1 OR t1.a = 2 ORDER BY b LIMIT 10;
    
    Q2: 
    obclient>SELECT * FROM  
        (SELECT * FROM  t1 WHERE t1.a = 1 ORDER BY b LIMIT 10 UNION ALL
         SELECT * FROM  t1 WHERE t1.a = 2 ORDER BY b LIMIT 10) AS TEMP
        ORDER BY temp.b LIMIT 10;

    Example:

    obclient> CREATE TABLE t1(a INT, b INT, INDEX IDX_a(a, b));
    Query OK, 0 rows affected (0.20 sec)
    
    /*Before the rewrite, data is sorted to retrieve the TOP K results*/
    obclient> EXPLAIN SELECT/*+NO_REWRITE()*/ * FROM t1 WHERE t1.a = 1 OR t1.a = 2 
            ORDER BY b LIMIT 10;
    +-------------------------------------------------------------------------+
    | Query Plan                                                                                         |
    +-------------------------------------------------------------------------+
    | ==========================================
    |ID|OPERATOR    |NAME     |EST. ROWS|COST|
    ------------------------------------------
    |0 |LIMIT       |         |4        |77  |
    |1 | TOP-N SORT |         |4        |76  |
    |2 |  TABLE SCAN|t1(idx_a)|4        |73  |
    ==========================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([t1.a], [t1.b]), filter(nil), limit(10), offset(nil)
      1 - output([t1.a], [t1.b]), filter(nil), sort_keys([t1.b, ASC]), topn(10)
      2 - output([t1.a], [t1.b]), filter(nil),
          access([t1.a], [t1.b]), partitions(p0)
    
    /*After the rewrite, the subqueries remove the SORT operator and eventually retrieve the TOP K results*/
    obclient>EXPLAIN SELECT * FROM t1 WHERE t1.a = 1 OR t1.a = 2 
            ORDER BY b LIMIT 10;
    +-------------------------------------------------------------------------+
    | Query Plan                                                                                          |
    +-------------------------------------------------------------------------+
    | ===========================================
    |ID|OPERATOR     |NAME     |EST. ROWS|COST|
    -------------------------------------------
    |0 |LIMIT        |         |3        |76  |
    |1 | TOP-N SORT  |         |3        |76  |
    |2 |  UNION ALL  |         |3        |74  |
    |3 |   TABLE SCAN|t1(idx_a)|2        |37  |
    |4 |   TABLE SCAN|t1(idx_a)|1        |37  |
    ===========================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil), limit(10), offset(nil)
      1 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil), sort_keys([UNION(t1.b, t1.b), ASC]), topn(10)
      2 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil)
      3 - output([t1.a], [t1.b]), filter(nil),
          access([t1.a], [t1.b]), partitions(p0),
          limit(10), offset(nil)
      4 - output([t1.a], [t1.b]), filter([lnnvl(t1.a = 1)]),
          access([t1.a], [t1.b]), partitions(p0),
          limit(10), offset(nil)