All Products
Search
Document Center

JOIN

Last Updated: Jun 18, 2021

the JOIN operator is used to join data in two tables based on specified conditions.

The JOIN operations are classified into three types: Inner Join, Outer Join, and Semi/Anti Join.

JOIN operators supported in OceanBaseNESTED LOOP JOIN (NLJ),MERGE JOIN (MJ) and HASH JOIN (HJ).

NESTED LOOP JOIN (NLJ)

In the following example, queries Q1 and Q2 use NLJ based on the hint. The No. 0 operator is an NLJ operator, which has two subnodes, the No. 1 and No. 2 operators. Its execution logic is:

  1. Read a row from the No. 1 operator.

  2. Run the No. 2 operator and read all rows.

  3. Join the result sets of the No. 1 and No. 2 operators, and apply the filter condition to export the results.

  4. Repeat Step 1 until the No. 1 operator stops iteration.

obclient>CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected (0.12 sec)

obclient>CREATE TABLE t2 (d1 INT, d2 INT, PRIMARY KEY (d1));
Query OK, 0 rows affected (0.12 sec)

Q1: 
obclient>EXPLAIN SELECT /*+USE_NL(t1, t2)*/ t1.c2 + t2.d2 FROM t1, t2 
     WHERE c2 = d2\G;
*************************** 1. row ***************************
Query Plan:
===========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST  |
-------------------------------------------
|0 |NESTED-LOOP JOIN|    |9782     |411238|
|1 | TABLE SCAN     |T1  |999      |647   |
|2 | MATERIAL       |    |999      |1519  |
|3 |  TABLE SCAN    |T2  |999      |647   |
===========================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C2 + T2.D2]), filter(nil),
      conds([T1.C2 = T2.D2]), nl_params_(nil)
  1 - output([T1.C2]), filter(nil),
      access([T1.C2]), partitions(p0)
  2 - output([T2.D2]), filter(nil)
  3 - output([T2.D2]), filter(nil),
      access([T2.D2]), partitions(p0)

The MATERIAL operator materializes the data output of subsequent operators. For more information, see MATERIAL.

Q2: 
obclient>EXPLAIN SELECT /*+USE_NL(t1, t2)*/ t1.c2 + t2.d2 FROM t1, t2 
      WHERE c1 = d1\G;
*************************** 1. row ***************************
Query Plan:
| ==========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST |
------------------------------------------
|0 |NESTED-LOOP JOIN|    |990      |37346|
|1 | TABLE SCAN     |T1  |999      |669  |
|2 | TABLE GET      |T2  |1        |36   |
==========================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C2 + T2.D2]), filter(nil),
      conds(nil), nl_params_([T1.C1])
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.D2]), filter(nil),
      access([T2.D2]), partitions(p0)

In the preceding example, the Outputs & filters section in the execution plan display shows in detail the output information of the NESTED LOOP JOIN operator.

Field

Description

output

The output expression of the operator.

filter

The filter conditions of the operator.

In this example, the filter is set to nil because no filter condition is configured for the NLJ operator.

conds

Join conditions.

For example, t1.c2 = t2.d2 in query Q1.

nl_params_

Pushdown parameters based on the data of the table on the left of the NLJ.

For example, t1.c1 in query Q2.

For the iteration of each row of the table on the left, NLJ creates a parameter based on nl_params and, based on this parameter and the original join condition c1= d1, creates a filter condition applicable to the table on the right: d1=?. The filter condition is pushed down to the table on the right, extracting the query range of the index. The query range is the range of data to be scanned. In query Q2, the No. 2 operator is a TABLE GET operator because of the pushdown condition d1=?.

In the following example, no join condition is specified in query Q3, the No. 0 operator is displayed as a NESTED LOOP JOIN CARTESIAN, which is logically an NLJ operator that does not have any join conditions.

Q3: 
obclient>EXPLAIN SELECT t1.c2 + t2.d2 FROM t1, t2\G;
*************************** 1. row ***************************
Query Plan:
| =====================================================
|ID|OPERATOR                  |NAME|EST. ROWS|COST  |
-----------------------------------------------------
|0 |NESTED-LOOP JOIN CARTESIAN|    |998001   |747480|
|1 | TABLE SCAN               |T1  |999      |647   |
|2 | MATERIAL                 |    |999      |1519  |
|3 |  TABLE SCAN              |T2  |999      |647   |
=====================================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C2 + T2.D2]), filter(nil),
      conds(nil), nl_params_(nil)
  1 - output([T1.C2]), filter(nil),
      access([T1.C2]), partitions(p0)
  2 - output([T2.D2]), filter(nil)
  3 - output([T2.D2]), filter(nil),
      access([T2.D2]), partitions(p0)

MERGE JOIN (MJ)

In the following example, query Q4 uses MJ by adding the hint USE_MERGE. The No. 0 operator is an MJ operator and has two subnodes: the No. 1 and No. 3 operators. The operator merges data of the left and right subnodes, and therefore requires that the data of the two subnodes is ordered in relation to the JOIN column.

Take query Q4 for example. The join condition is t1.c2 = t2.d2, which means sorting data of t1 by c2, and data of t2 by d2. In query Q4, the output of the No. 2 operator is unordered, and the output of the No. 4 operator is sorted by d2. Both of them do not meet the requirement of MERGE JOIN. Therefore, the No. 1 and No. 3 operators are assigned for sorting.

Q4: 
obclient>EXPLAIN SELECT /*+USE_MERGE(t1, t2)*/ t1.c2 + t2.d2 FROM t1, t2 
                WHERE c2 = d2 AND c1 + d1 > 10\G;
*************************** 1. row ***************************
Query Plan:
| ======================================
|ID|OPERATOR    |NAME|EST. ROWS|COST |
--------------------------------------
|0 |MERGE JOIN  |    |3261     |14199|
|1 | SORT       |    |999      |4505 |
|2 |  TABLE SCAN|T1  |999      |669  |
|3 | SORT       |    |999      |4483 |
|4 |  TABLE SCAN|T2  |999      |647  |
======================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C2 + T2.D2]), filter(nil),
      equal_conds([T1.C2 = T2.D2]), other_conds([T1.C1 + T2.D1 > 10])
  1 - output([T1.C2], [T1.C1]), filter(nil), sort_keys([T1.C2, ASC])
  2 - output([T1.C2], [T1.C1]), filter(nil),
      access([T1.C2], [T1.C1]), partitions(p0)
  3 - output([T2.D2], [T2.D1]), filter(nil), sort_keys([T2.D2, ASC])
  4 - output([T2.D2], [T2.D1]), filter(nil),
      access([T2.D2], [T2.D1]), partitions(p0)

In the following example, the join condition in query Q5 is t1.c1 = t2.d1, which means sorting data of t1 by c1, and data of t2 by d1. In this execution plan, a primary table scan is selected for t2, and the results are sorted by d1. Therefore, it is unnecessary to assign an additional SORT operator. Ideally, proper indexes are selected for tables on the left and right of the join, and the data order specified by the indexes can meet the requirements of MJ. In this case, no SORT operator is needed.

Q5: 
obclient>EXPLAIN SELECT /*+USE_MERGE(t1, t2)*/ t1.c2 + t2.d2 FROM t1, t2 
     WHERE c1 = d1\G;
*************************** 1. row ***************************
Query Plan:
| =====================================
|ID|OPERATOR    |NAME|EST. ROWS|COST|
-------------------------------------
|0 |MERGE JOIN  |    |990      |6096|
|1 | SORT       |    |999      |4505|
|2 |  TABLE SCAN|T1  |999      |669 |
|3 | TABLE SCAN |T2  |999      |647 |
=====================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C2 + T2.D2]), filter(nil),
      equal_conds([T1.C1 = T2.D1]), other_conds(nil)
  1 - output([T1.C2], [T1.C1]), filter(nil), sort_keys([T1.C1, ASC])
  2 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  3 - output([T2.D1], [T2.D2]), filter(nil),
      access([T2.D1], [T2.D2]), partitions(p0)

In the preceding example, the Outputs & filters section in the execution plan display shows in detail the output information of the MERGE JOIN operator.

Field

Description

output

The output expression of the operator.

filter

The filter conditions of the operator.

In this example, the filter is set to nil because no filter condition is configured for the MJ operator.

equal_conds

Equivalent join conditions for MJ. The result sets of the subnodes on the left and right must be ordered in relation to the JOIN column.

other_conds

Other join conditions.

For example, t1.c1 + t2.d1 > 10 in query Q4.

HASH JOIN (HJ)

In the following example, query Q6 uses HJ based on the USE_HASH hint. The No. 0 operator is an HJ operator and has two subnodes, the No. 1 and No. 2 operators. Execution logic of this operator:

  1. Read data from the subnode on the left to generate the hash value based on the JOIN column, such ast1.c1, and create a hash table.

  2. Read data from the subnode on the right to generate a hash value based on the JOIN column, such ast2.d1, and try to join with the data in the corresponding hash table t1.

Q6: 
obclient>EXPLAIN SELECT /*+USE_HASH(t1, t2)*/ t1.c2 + t2.d2 FROM t1, t2 
      WHERE c1 = d1 AND c2 + d2 > 1\G;
*************************** 1. row ***************************
Query Plan:
| ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |HASH JOIN  |    |330      |4850|
|1 | TABLE SCAN|T1  |999      |669 |
|2 | TABLE SCAN|T2  |999      |647 |
====================================
Outputs & filters:
-------------------------------------
  0 - output([T1.C2 + T2.D2]), filter(nil),
      equal_conds([T1.C1 = T2.D1]), other_conds([T1.C2 + T2.D2 > 1])
  1 - output([T1.C1], [T1.C2]), filter(nil),
      access([T1.C1], [T1.C2]), partitions(p0)
  2 - output([T2.D1], [T2.D2]), filter(nil),
      access([T2.D1], [T2.D2]), partitions(p0)

In the preceding example, the Outputs & filters section in the execution plan display shows in detail the output information of the HASH JOIN operator.

Field

Description

output

The output expression of the operator.

filter

The filter conditions of the operator.

In this example, the filter is set to nil because no filter condition is configured for the HJ operator.

equal_conds

Equivalent join: the join columns on both the left and right sides are used to calculate the hash value.

other_conds

Other join conditions.

For example, t1.c2 + t2.d2 > 1 in query Q6.