All Products
Search
Document Center

Join order

Last Updated: Jun 18, 2021

For queries to join multiple tables, an important task of the optimizer is to determine the Join Order of tables, which has effects on the size of the intermediate result sets and the overall cost of plan execution.

To reduce the search range and memory usage for plan execution, the OceanBase optimizer uses the left-deep join tree to determine the join order. The following figure shows the shapes of a left-deep tree, right-deep tree, and bushy tree.

Tree

OceanBase Database uses the System-R dynamic programming algorithm to determine the join order. It takes into account many factors, such as the possible access paths of each table, Interesting Order, join selectivity of different tables, and join algorithms, such as Nested Loop Join, Block Based Nested Loop Join, and Sort Merge Join.

To join N tables, OceanBase Database determines the join order through though following steps:

  1. Generate an access path for every base table and reserve the path that has the lowest cost and those with the Interesting Order. If a path has an Interesting Order, its order is usable by subsequent operators.

  2. Generate all plans with a table set size of i (1 < i <= N). OceanBase Database only supports the left-deep tree. A plan with a table set size of i can comprise a plan with a table set size of i and a base table plan. Based on this policy, OceanBase Database generates all plans with a table set size of i, taking into account factors such as various types of join algorithms and the inheritance of the Interesting Order. Again, it only keeps the plan that has the lowest cost and plans with an Interesting Order.

OceanBase Database also enables you to use the hint /*+LEADING(table_name_list)*/ to control the order of a multi-table join.

In the following example,table t1 and t2 are joined first, followed by table t3. If you want to join table t2 and t3 first before the join of table t1, you can add the hint /*+LEADING(t2,t3,t1)*/ to the query. If you want to join table t1 and t3 first before the join of table t2, you can add the hint /*+LEADING(t1,t3,t2)*/.

obclient>CREATE TABLE t1(c1 INT, c2 INT, PRIMARY KEY(c1));
Query OK, 0 rows affected (0.31 sec)

obclient>CREATE TABLE t2(c1 INT, c2 INT, PRIMARY KEY(c1));
Query OK, 0 rows affected (0.33 sec)

obclient>CREATE TABLE t3(c1 INT, c2 INT, PRIMARY KEY(c1));
Query OK, 0 rows affected (0.44 sec)

obclient>EXPLAIN SELECT * FROM t1,t2,t3 WHERE t1.c1 = t2.c2 AND t2.c1 = t3.c2;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| =======================================
|ID|OPERATOR    |NAME|EST. ROWS|COST  |
---------------------------------------
|0 |HASH JOIN   |    |98010    |926122|
|1 | TABLE SCAN |T3  |100000   |61860 |
|2 | HASH JOIN  |    |99000    |494503|
|3 |  TABLE SCAN|T1  |100000   |61860 |
|4 |  TABLE SCAN|T2  |100000   |61860 |
=======================================

Outputs & filters: 
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2], [T3.C1], [T3.C2]), filter(nil), 
      equal_conds([T2.C1 = T3.C2]), other_conds(nil)
  1 - output([T3.C2], [T3.C1]), filter(nil), 
      access([T3.C2], [T3.C1]), partitions(p0)
  2 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil), 
      equal_conds([T1.C1 = T2.C2]), other_conds(nil)
  3 - output([T1.C1], [T1.C2]), filter(nil), 
      access([T1.C1], [T1.C2]), partitions(p0)
  4 - output([T2.C2], [T2.C1]), filter(nil), 
      access([T2.C2], [T2.C1]), partitions(p0)

obclient>EXPLAIN SELECT /*+LEADING(t2,t3,t1)*/* FROM t1,t2,t3 WHERE t1.c1 = t2.c2
        AND t2.c1 = t3.c2;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| ========================================
|ID|OPERATOR    |NAME|EST. ROWS|COST   |
----------------------------------------
|0 |HASH JOIN   |    |98010    |1096613|
|1 | HASH JOIN  |    |99000    |494503 |
|2 |  TABLE SCAN|T2  |100000   |61860  |
|3 |  TABLE SCAN|T3  |100000   |61860  |
|4 | TABLE SCAN |T1  |100000   |61860  |
========================================

Outputs & filters: 
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2], [T3.C1], [T3.C2]), filter(nil), 
      equal_conds([T1.C1 = T2.C2]), other_conds(nil)
  1 - output([T2.C1], [T2.C2], [T3.C1], [T3.C2]), filter(nil), 
      equal_conds([T2.C1 = T3.C2]), other_conds(nil)
  2 - output([T2.C2], [T2.C1]), filter(nil), 
      access([T2.C2], [T2.C1]), partitions(p0)
  3 - output([T3.C2], [T3.C1]), filter(nil), 
      access([T3.C2], [T3.C1]), partitions(p0)
  4 - output([T1.C1], [T1.C2]), filter(nil), 
      access([T1.C1], [T1.C2]), partitions(p0)

obclient>EXPLAIN SELECT /*+LEADING(t1,t3,t2)*/* FROM t1,t2,t3 WHERE t1.c1 = t2.c2 
       AND t2.c1 = t3.c2;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| =============================================================
|ID|OPERATOR                   |NAME|EST. ROWS  |COST       |
-------------------------------------------------------------
|0 |HASH JOIN                  |    |98010      |53098071243|
|1 | NESTED-LOOP JOIN CARTESIAN|    |10000000000|7964490204 |
|2 |  TABLE SCAN               |T1  |100000     |61860      |
|3 |  MATERIAL                 |    |100000     |236426     |
|4 |   TABLE SCAN              |T3  |100000     |61860      |
|5 | TABLE SCAN                |T2  |100000     |61860      |
=============================================================

Outputs & filters: 
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2], [T3.C1], [T3.C2]), filter(nil), 
      equal_conds([T1.C1 = T2.C2], [T2.C1 = T3.C2]), other_conds(nil)
  1 - output([T1.C1], [T1.C2], [T3.C1], [T3.C2]), filter(nil), 
      conds(nil), nl_params_(nil)
  2 - output([T1.C1], [T1.C2]), filter(nil), 
      access([T1.C1], [T1.C2]), partitions(p0)
  3 - output([T3.C1], [T3.C2]), filter(nil)
  4 - output([T3.C2], [T3.C1]), filter(nil), 
      access([T3.C2], [T3.C1]), partitions(p0)
  5 - output([T2.C2], [T2.C1]), filter(nil), 
      access([T2.C2], [T2.C1]), partitions(p0)