All Products
Search
Document Center

Join operations

Last Updated: Jan 10, 2020

The JOIN statement in the database is used to combine two or more tables in the database based on the join conditions. Collections created by JOIN statements can be saved as tables or used as tables. The meaning of a JOIN statement is to combine the fields of two tables by their values. The join types in the database include inner join, outer join, semi-join, and anti-join. Semi-join and anti-join are obtained by rewriting subqueries. SQL itself does not support the anti-join and semi-join expressions.

Join conditions

Join conditions can be divided into equi joins (for example, t1.a = t2.b) and non-equi joins (for example, t1.a < t2.b). Compared with non-equi joins, one of the benefits of equi joins is allowing efficient join algorithms such as hash joins and merge-sort joins to be used in the database.

Self-join

Self-join is a join in which a table is joined with itself. The following example shows a self-join.

  1. OceanBase (root@test)> create table t1(a int primary key, b int, c int);
  2. Query OK, 0 rows affected (0.70 sec)
  3. --- An example of self-join
  4. OceanBase (root@test)> select * from t1 as ta, t1 as tb where ta.b = tb.b

Inner join

Inner join is the most basic join operation in the database. An inner join combines the columns of two tables (such as A and B) based on the join conditions to produce a new table. The query compares each row of Table A with each row of Table B and finds combinations that satisfy the join conditions. If a row in Table A and a row in Table B satisfy the join conditions, the two rows are merged into one row in the new table. The result set produced by the join can be defined as the Cartesian product (cross join) of the two tables. This involves combining each row of Table A and Table B and then returning entries that satisfy the join conditions.

Outer join

Outer join does not require each record of the two joining tables to have a matching record in the other table. A table that keeps all entries (even if no matching entries exist in other tables for the entries) is called a key preserved table. Outer joins can be further divided into left outer joins, right outer joins, and full outer joins to retain the left table, right table, or all table rows respectively In a left outer join, when a row in the left table is not found in the right table, NULL is automatically filled in the right table. In a right outer join, when a row in the right table is not found in the left table, NULL is automatically filled in the left table. In a full outer join, all rows from both tables are preserved and both tables are null-supplying.

Semi-join

When Table A and Table B are left semi-joined, the result only returns all rows in Table A that can be found in B. Whereas, in a right semi-join, the result only returns all rows in Table B that can be found in A. Semi-join can only be obtained by subquery unnesting, shown as follows.

  1. OceanBase (root@test)> create table t1(a int primary key, b int, c int);
  2. Query OK, 0 rows affected (0.70 sec)
  3. OceanBase (root@test)> create table t2(a int primary key, b int, c int);
  4. Query OK, 0 rows affected (0.92 sec)
  5. --- Dependent subquery is unnested and rewritten as a semi-join
  6. OceanBase (root@test)> explain select * from t1 where t1.a in (select t2.b from t2 where t2.c = t1.c);
  7. | =======================================
  8. |ID|OPERATOR |NAME|EST. ROWS|COST|
  9. ---------------------------------------
  10. |0 |HASH SEMI JOIN| |1 |2924|
  11. |1 | TABLE SCAN |t1 |1000 |455 |
  12. |2 | TABLE SCAN |t2 |1000 |455 |
  13. =======================================
  14. Outputs & filters:
  15. -------------------------------------
  16. 0 - output([t1.a], [t1.b], [t1.c]), filter(nil),
  17. equal_conds([t1.a = t2.b], [t2.c = t1.c]), other_conds(nil)
  18. 1 - output([t1.c], [t1.a], [t1.b]), filter(nil),
  19. access([t1.c], [t1.a], [t1.b]), partitions(p0)
  20. 2 - output([t2.c], [t2.b]), filter(nil),
  21. access([t2.c], [t2.b]), partitions(p0)

Anti-join

When Table A and Table B are left anti-joined, it only returns all rows in Table A that cannot be found in Table B. Whereas, in a left semi-join, the result only returns all rows in Table B that cannot be found in Table A. Similar to semi-join, anti-join can only be obtained by subquery unnesting, shown as follows.

  1. OceanBase (root@test)> create table t1(a int primary key, b int, c int);
  2. Query OK, 0 rows affected (0.70 sec)
  3. OceanBase (root@test)> create table t2(a int primary key, b int, c int);
  4. Query OK, 0 rows affected (0.92 sec)
  5. --- Dependent subqueries are rewritten with an anti-join
  6. OceanBase (root@test)> explain select * from t1 where t1.a not in (select t2.b from t2 where t2.c = t1.c);
  7. | =======================================
  8. |ID|OPERATOR |NAME|EST. ROWS|COST|
  9. ---------------------------------------
  10. |0 |HASH ANTI JOIN| |995 |3262|
  11. |1 | TABLE SCAN |t1 |1000 |455 |
  12. |2 | TABLE SCAN |t2 |1000 |455 |
  13. =======================================
  14. Outputs & filters:
  15. -------------------------------------
  16. 0 - output([t1.a], [t1.b], [t1.c]), filter(nil),
  17. equal_conds([t2.c = t1.c]), other_conds([t1.a = t2.b OR (T_OP_IS, t2.b, NULL, 0)])
  18. 1 - output([t1.c], [t1.a], [t1.b]), filter(nil),
  19. access([t1.c], [t1.a], [t1.b]), partitions(p0)
  20. 2 - output([t2.c], [t2.b]), filter(nil),
  21. access([t2.c], [t2.b]), partitions(p0)