All Products
Search
Document Center

JOIN

Last Updated: Sep 29, 2021

The JOIN statement is used in the database to combine two or more tables in the database based on the join conditions. The set generated by JOIN can be saved as a table or used as a table. A JOIN statement combines the attributes of two tables based on their values. JOIN types in the database generally include inner join, outer join, semi-join, and anti join. Among them, you can rewrite subqueries to implement semi-join and anti-join queries. SQL does not have special a syntax for anti-join or semi-join queries.

Join conditions

Join conditions can be divided into two types: equijoin conditions (for example, t1.a = t2.b) and non-equijoin conditions (for example, t1.a < t2.b). Unlike non-equijoin conditions, equijoin conditions allow the database to use efficient join algorithms, such as Hash join and Merge-Sort join.

Self-joins

A self-join is a join of a table to itself. The following example shows how a self-join looks.

obclient> create table t1(a int primary key, b int, c int);
Query OK, 0 rows affected (0.70 sec)

obclient> select * from t1 as ta, t1 as tb where ta.b = tb.b;

Inner joins

An inner join is the most basic join operation in a database. An inner join combines the columns of two tables (such as tables A and B) based on the join conditions to generate a new result table. The query compares each row of Table A with each row of Table B and returns the combinations that meet the join conditions. When the join conditions are met, the matching rows in Table A and Table B are combined by column (aligned) into rows in the result set. The join first generates the Cartesian product of the two tables, where each row in Table A is paired with each row in Table B, and then returns records that meet the join conditions.

Outer joins

An outer join does not require that each record in either of the two joined tables has a matching record in the other table. A table that needs to reserve all records (including records without a matching record) is called a reserved table. . Outer-Join operations are further divided into left outer joins, right outer joins, and full joins based on whether the result table contains rows from the table on the left or right side of JOIN, or both. In a left outer join, if a row in the table on the left side is not found in the table on the right side, NULL is automatically filled in the table on the right side. In a right outer join, if a row in the table on the right side is not found in the table on the left side, NULL is automatically filled in the table on the left side. In a full join, NULL is automatically filled if no matching row is found in the table on the left or right side.

Semi-joins

A left or right semi-join for Table A and Table B returns only rows in Table A that match rows in Table B and rows in Table B that match rows in Table A. You can get a semi-join query only by unnesting and rewriting a subquery. Example:

obclient> create table t1(a int primary key, b int, c int);
Query OK, 0 rows affected (0.70 sec)

obclient> create table t2(a int primary key, b int, c int);
Query OK, 0 rows affected (0.92 sec)

obclient> insert into t1 values(1, 1, 1);
obclient> insert into t1 values(2, 2, 2);
obclient> insert into t2 values(1, 1, 1);
obclient> insert into t2 values(2, 2, 2);

--- A semi-join query obtained by unnesting and rewriting a dependent subquery
obclient> explain select * from t1 where t1.a in (select t2.b from t2 where t2.c = t1.c);
| ========================================
|ID|OPERATOR       |NAME|EST. ROWS|COST|
----------------------------------------
|0 |MERGE SEMI JOIN|    |2        |76  |
|1 | TABLE SCAN    |t1  |2        |37  |
|2 | SORT          |    |2        |38  |
|3 |  TABLE SCAN   |t2  |2        |37  |
========================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c]), filter(nil),
      equal_conds([t1.a = t2.b], [t2.c = t1.c]), other_conds(nil)
  1 - output([t1.c], [t1.a], [t1.b]), filter(nil),
      access([t1.c], [t1.a], [t1.b]), partitions(p0)
  2 - output([t2.b], [t2.c]), filter(nil), sort_keys([t2.b, ASC], [t2.c, ASC])
  3 - output([t2.c], [t2.b]), filter(nil),
      access([t2.c], [t2.b]), partitions(p0)

Anti-Join

A left or right anti-join for Table A and Table B returns only rows in Table A that match rows in Table B and rows in Table B that match rows in Table A. Similar to a semi-join, you can get an anti-join query only by unnesting and rewriting a subquery. Example:

--- An anti join query obtained by unnesting and rewriting a dependent subquery
obclient> explain select * from t1 where t1.a not in (select t2.b from t2 where t2.c = t1.c);
| =============================================
|ID|OPERATOR            |NAME|EST. ROWS|COST|
---------------------------------------------
|0 |HASH RIGHT ANTI JOIN|    |0        |77  |
|1 | TABLE SCAN         |t2  |2        |37  |
|2 | TABLE SCAN         |t1  |2        |37  |
=============================================

Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c]), filter(nil),
      equal_conds([t2.c = t1.c]), other_conds([t1.a = t2.b OR (T_OP_IS, t2.b, NULL, 0)])
  1 - output([t2.c], [t2.b]), filter(nil),
      access([t2.c], [t2.b]), partitions(p0)
  2 - output([t1.c], [t1.a], [t1.b]), filter(nil),
      access([t1.c], [t1.a], [t1.b]), partitions(p0)