All Products
Search
Document Center

Joins

Last Updated: Jun 18, 2021

A join is a query that combines two or more tables, views, or materialized views. If the FROM clause of a query contains multiple tables, ApsaraDB for OceanBase performs a join query. The output columns of the query can be selected from a table that is contained in the FROM clause. If multiple tables have a same column name, you must use table names to qualify all the references to these columns during the query. The join types in databases generally include inner join, outer join, semi-join, and anti-join. Among the preceding join types, you can obtain semi-join and anti-join by rewriting subqueries. ApsaraDB for OceanBase does not provide syntax for expressing anti-join and semi-join.

Join conditions

Join conditions are rules to combine multiple tables and exist in FROM clauses or WHERE clauses. A join condition is used to compare two columns from different tables. Most joins contain at least one join condition. Join conditions can be divided into equi joins (for example, t1.a = t2.b) and non-equi joins (for example, t1.a < t2.b). Compared with non-equi join conditions, equi join conditions allow databases to use efficient join algorithms, such as Hash Join and Merge-Sort join.

To perform a join, ApsaraDB for OceanBase retrieves rows from different tables to combine the rows into pairs and matches them based on join conditions. To perform a join on more than two tables, ApsaraDB for OceanBase first joins two of the tables based on the join conditions that compare their columns. Then, ApsaraDB for OceanBase joins the result to another table based on the join conditions that contain the columns of the joined tables and the new table. The optimizer determines the join order in ApsaraDB for OceanBase based on the join conditions, indexes on the base table, and available statistics.

A WHERE clause may contain other conditions in addition to join conditions. These conditions that reference only one table can further limit the number of rows returned by the join query.

Equi joins

Equi joins are joins for which join conditions contain equality operators. In an equi join, the rows that meet equality conditions for the specified columns are combined for output.

Self joins

A self join joins a table to itself. The table appears twice in the FROM clause and is followed by the table alias that qualify column names in the join condition. If you perform a self join, ApsaraDB for OceanBase combines and returns the rows that meet the join condition.

Cartesian products

If two tables in a join query have no join condition, ApsaraDB for OceanBase returns their Cartesian products. ApsaraDB for OceanBase combines each row of one table with each row of the other table for output. A Cartesian product always generates a number of rows. This is rarely useful. For example, the Cartesian product of two tables that each contain 100 rows is 10,000 rows. Always specify a join condition unless you particularly need a Cartesian product. If a query joins three or more tables and no join condition is specified for a specific pair, the optimizer can choose a join order that prevents from generating an intermediate Cartesian product.

Inner joins

Inner joins are the most basic joins in databases. An inner join combines the columns of two tables, such as Table A and Table B, based on join conditions to generate a new result table. The query compares each row of Table A with each row of Table B and finds combinations that meet the join conditions. If the join conditions are met, the matched rows in Table A and Table B are combined side by side based on columns into one row in the result set. The result set that is generated by the join equals to the result that is generated in the following process: The Cartesian products of the two tables are calculated first. Each row of Table A is combined with each row of Table B. Then, the records that meet the join conditions are returned.

Outer joins

An outer join returns all the rows that meet the join condition, also returns unused rows from one table, and fills in NULL at the corresponding positions in the other table. Outer joins can be further divided into left joins, right joins, and full joins. The join type depends on whether the joined table retains rows of the left table, right table, or both tables. In a LEFT [OUTER] JOIN, if the rows in the left table have no matching rows in the right table, the right table is automatically filled with NULL. In a RIGHT [OUTER] JOIN, if the rows in the right table have no matching rows in the left table, the left table is automatically filled with NULL. In a FULL [OUTER] JOIN, if the rows in the left or right table have no matching rows in the other table, both tables are automatically filled with NULL.

Semi-joins

A LEFT or RIGHT ANTI-JOIN on Table A and Table B returns only the rows in Table A or Table B that have matching rows in Table B or Table A. A semi-join can be implemented by only subquery unnesting.

Anti-joins

A LEFT or RIGHT ANTI-JOIN on Table A and Table B returns all the rows in Table A or Table B that have no matching rows in Table B or Table A. An anti-join can be implemented by only subquery unnesting. This is similar to the semi-join.

Examples

Create tables table_a and table_b and insert data. Execute the following statements:

CREATE TABLE table_a(PK INT, name VARCHAR(25));
INSERT INTO table_a VALUES(1,'Fox');
INSERT INTO table_a VALUES(2,'Police');  
INSERT INTO table_a VALUES(3,'Taxi');  
INSERT INTO table_a VALUES(4,'Lincoln');  
INSERT INTO table_a VALUES(5,'Arizona');  
INSERT INTO table_a VALUES(6,'Washington');  
INSERT INTO table_a VALUES(7,'Dell');  
INSERT INTO table_a VALUES(10,'Lucent'); 
CREATE TABLE table_b(PK INT, name VARCHAR(25));
INSERT INTO table_b VALUES(1,'Fox');
INSERT INTO table_b VALUES(2,'Police');  
INSERT INTO table_b VALUES(3,'Taxi');  
INSERT INTO table_b VALUES(6,'Washington');  
INSERT INTO table_b VALUES(7,'Dell');  
INSERT INTO table_b VALUES(8,'Microsoft');  
INSERT INTO table_b VALUES(9,'Apple'); 
INSERT INTO table_b VALUES(11,'Scotch whisky');

Self join query:

SELECT * FROM table_a ta, table_a tb WHERE ta.NAME = tb.NAME;

The following query result is returned:

+------+-----------------+------+-----------------+
| PK   | NAME            | PK   | NAME            |
+------+-----------------+------+-----------------+
|    1 | Fox          |    1 | Fox          |
|    2 | Police            |    2 | Police            |
|    3 | Taxi            |    3 | Taxi            |
|    4 | Lincoln            |    4 | Lincoln            |
|    5 | Arizona      |    5 | Arizona      |
|    6 | Washington          |    6 | Washington          |
|    7 | Dell            |    7 | Dell            |
|   10 | Lucent            |   10 | Lucent            |
+------+-----------------+------+-----------------+

Inner join query:

SELECT A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value 
FROM table_a A INNER JOIN table_b B ON A.PK = B.PK;

The following query result is returned:

+------+-----------+------+-----------+
| A_PK | A_VALUE   | B_PK | B_VALUE   |
+------+-----------+------+-----------+
|    1 | Fox    |    1 | Fox    |
|    2 | Police      |    2 | Police      |
|    3 | Taxi      |    3 | Taxi      |
|    6 | Washington    |    6 | Washington    |
|    7 | Dell      |    7 | Dell      |
+------+-----------+------+-----------+

Left join query:

SELECT  A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value 
FROM table_a A LEFT JOIN  table_b B ON A.PK = B.PK;

The following query result is returned:

+------+-----------------+------+-----------+
| A_PK | A_VALUE         | B_PK | B_VALUE   |
+------+-----------------+------+-----------+
|    1 | Fox          |    1 | Fox    |
|    2 | Police            |    2 | Police      |
|    3 | Taxi            |    3 | Taxi      |
|    6 | Washington          |    6 | Washington    |
|    7 | Dell            |    7 | Dell      |
|    4 | Lincoln            | NULL | NULL      |
|    5 | Arizona      | NULL | NULL      |
|   10 | Lucent            | NULL | NULL      |
+------+-----------------+------+-----------+

Right join query:

obclient> SELECT  A.PK AS A_PK, A.name AS A_Value, B.PK AS B_PK, B.name AS B_Value FROM table_a A RIGHT JOIN   table_b B ON A.PK = B.PK;

The following query result is returned:

+------+-----------+------+--------------------+
| A_PK | A_VALUE   | B_PK | B_VALUE            |
+------+-----------+------+--------------------+
|    1 | Fox    |    1 | Fox             |
|    2 | Police      |    2 | Police               |
|    3 | Taxi      |    3 | Taxi               |
|    6 | Washington    |    6 | Washington             |
|    7 | Dell      |    7 | Dell               |
| NULL | NULL      |    8 | Microsoft               |
| NULL | NULL      |   11 | Scotch whisky       |
| NULL | NULL      |    9 | Apple               |
+------+-----------+------+--------------------+

Full join query:

obclient> SELECT  A.PK AS A_PK,A.name AS A_Value,B.PK AS B_PK,B.name AS B_Value FROM table_a A FULL  JOIN table_b B ON A.PK = B.PK;

The following query result is returned:

+------+-----------------+------+--------------------+
| A_PK | A_VALUE         | B_PK | B_VALUE            |
+------+-----------------+------+--------------------+
|    1 | Fox          |    1 | Fox             |
|    2 | Police            |    2 | Police               |
|    3 | Taxi            |    3 | Taxi               |
|    6 | Washington          |    6 | Washington             |
|    7 | Dell            |    7 | Dell               |
| NULL | NULL            |    8 | Microsoft               |
| NULL | NULL            |    9 | Apple               |
| NULL | NULL            |   11 | Scotch whisky       |
|    4 | Lincoln            | NULL | NULL               |
|    5 | Arizona      | NULL | NULL               |
|   10 | Lucent            | NULL | NULL               |
+------+-----------------+------+--------------------+

Semi-join: A dependent subquery is unnested and rewritten into a semi-join.

explain SELECT * FROM table_a t1 WHERE t1.PK IN (SELECT t2.PK FROM table_b t2 WHERE t2.NAME = t1.NAME);

The following query result is returned:

+------------------------------------------------+
| Query Plan                                     |
+------------------------------------------------+
=======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH SEMI JOIN|    |8        |114 |
|1 | TABLE SCAN   |T1  |8        |38  |
|2 | TABLE SCAN   |T2  |8        |38  |
=======================================

Outputs & filters:
-------------------------------------
  0 - output([T1.PK], [T1.NAME]), filter(nil),
      equal_conds([T1.PK = T2.PK], [T2.NAME = T1.NAME]), other_conds(nil)
  1 - output([T1.NAME], [T1.PK]), filter(nil),
      access([T1.NAME], [T1.PK]), partitions(p0)
  2 - output([T2.NAME], [T2.PK]), filter(nil),
      access([T2.NAME], [T2.PK]), partitions(p0)
+------------------------------------------------+

Anti-join: A dependent subquery is rewritten into an anti-join.

EXPLAIN SELECT * FROM table_a t1 WHERE t1.PK NOT IN (SELECT t2.PK 
FROM table_b t2 WHERE t2.name = t1.name);

The following query result is returned:

+------------------------------------------------+
| Query Plan                                     |
+------------------------------------------------+
=======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH ANTI JOIN|    |0        |112 |
|1 | TABLE SCAN   |T1  |8        |38  |
|2 | TABLE SCAN   |T2  |8        |38  |
=======================================
Outputs & filters:
-------------------------------------
  0 - output([T1.PK], [T1.NAME]), filter(nil),
      equal_conds([T2.NAME = T1.NAME]), other_conds([(T_OP_OR, T1.PK = T2.PK, 
      (T_OP_IS, T1.PK, NULL, 0), (T_OP_IS, T2.PK, NULL, 0))])
  1 - output([T1.NAME], [T1.PK]), filter(nil),
      access([T1.NAME], [T1.PK]), partitions(p0)
  2 - output([T2.NAME], [T2.PK]), filter(nil),
      access([T2.NAME], [T2.PK]), partitions(p0)
+---------------------------------------------------------+