MaxCompute allows you to use JOIN operations to join tables and return the data that meets join and query conditions. This topic describes the following JOIN operations: LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, INNER JOIN, NATURAL JOIN, implicit JOIN, and multiple JOIN operations.

Description

MaxCompute supports the following types of JOIN operations:
  • LEFT OUTER JOIN

    It is also called LEFT JOIN. LEFT OUTER JOIN returns all rows in the left table, including the rows that do not match any rows in the right table.

    Note In most cases, the left table is a large table, and the right table is a small table. If the values in some rows of the right table are duplicate, we recommend that you do not perform multiple consecutive LEFT JOIN operations. If you perform multiple consecutive LEFT JOIN operations, data expansion may occur and interrupt your jobs.
  • RIGHT OUTER JOIN

    It is also called RIGHT JOIN. RIGHT OUTER JOIN returns all rows in the right table, including the rows that do not match any rows in the left table.

  • FULL OUTER JOIN

    It is also called FULL JOIN. FULL OUTER JOIN returns all rows in both the left and right tables.

  • INNER JOIN

    The INNER keyword can be omitted. INNER JOIN returns data rows if a match exists between the left and right tables. If no rows match, no result is returned.

  • NATURAL JOIN

    In a NATURAL JOIN operation, the conditions that are used to join two tables are determined based on the common fields between the two tables. MaxCompute supports OUTER NATURAL JOIN. If you use the USING clause, the JOIN operation returns common fields only once.

  • Implicit JOIN operation

    You can perform an implicit JOIN operation without the need to specify the JOIN keyword.

  • Multiple JOIN operations

    MaxCompute supports multiple JOIN operations. You can use parentheses () to specify the priorities of JOIN operations. A JOIN operation that is enclosed in parentheses () has a higher priority.

Note If an SQL statement contains the WHERE clause and you use the JOIN clause before the WHERE clause, the JOIN operation is performed first. Then, the results obtained from the JOIN operation are filtered based on the conditions specified by the WHERE clause. The final result is the intersection of two tables, not all rows in the left table.

Limits

Limits on the use of JOIN:
  • MaxCompute does not support CROSS JOIN. A CROSS JOIN operation joins two tables without requiring you to specify conditions in the ON clause.
  • You must use equi-joins and combine conditions by using AND. You can use non-equi joins or combine multiple conditions by using OR in a MAPJOIN operation. For more information, see MAPJOIN.

Syntax

<table_reference> join <table_factor> [<join_condition>]
| <table_reference> {left outer|right outer|full outer|inner|natural} join <table_reference> <join_condition>
  • table_reference: required. The query statement for the left table on which the JOIN operation is performed. The value of this parameter is in the table_name [alias] | table_query [alias] |... format.
  • table_factor: required. The query statement for the right table or a table on which the JOIN operation is performed. The value of this parameter is in the table_name [alias] | table_subquery [alias] |... format.
  • join_condition: optional. A JOIN condition is a combination of one or more equality expressions. The value of this parameter is in the on equality_expression [and equality_expression]... format. equality_expression is an equality expression.
Note If partition pruning conditions are specified in the WHERE clause, partition pruning takes effect on both tables. If partition pruning conditions are specified in the ON clause, partition pruning takes effect only on the secondary table. As a result, a full table scan is run for the primary table. For more information, see Check whether partition pruning is effective.

Sample data

The following sample source data is provided to help you understand the examples in this topic. The following statements show how to create the sale_detail and sale_detail_jt tables and insert data into the tables.
-- Create two partitioned tables named sale_detail and sale_detail_jt. 
create table if not exists sale_detail
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);
create table if not exists sale_detail_jt
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

-- Add partitions to the partitioned tables sale_detail and sale_detail_jt. 
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');
alter table sale_detail_jt add partition (sale_date='2013', region='china');

-- Insert data into the partitioned tables sale_detail and sale_detail_jt. 
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
insert into sale_detail_jt partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2);
Query data in the sale_detail and sale_detail_jt table. Sample statement:
select * from sale_detail;
-- The following result is returned:
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
| null       | c5          | NULL        | 2014       | shanghai   |
| s6         | c6          | 100.4       | 2014       | shanghai   |
| s7         | c7          | 100.5       | 2014       | shanghai   |
+------------+-------------+-------------+------------+------------+
select * from sale_detail_jt;
-- The following result is returned:
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s5         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+
-- Create a table that you want to join. 
create table shop as select shop_name, customer_id, total_price from sale_detail;

Examples

  • Example 1: LEFT OUTER JOIN. Sample statements:
    -- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. 
    set odps.sql.allow.fullscan=true;
    -- Both the sale_detail_jt and sale_detail tables have the shop_name column. You must use aliases to distinguish between the columns in the SELECT statement. 
    select a.shop_name as ashop, b.shop_name as bshop from sale_detail_jt a 
           left outer join sale_detail b on a.shop_name=b.shop_name;
    Returned result:
    +------------+------------+
    | ashop      | bshop      |
    +------------+------------+
    | s2         | s2         |
    | s1         | s1         |
    | s5         | NULL       |
    +------------+------------+
  • Example 2: RIGHT OUTER JOIN. Sample statements:
    -- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. 
    set odps.sql.allow.fullscan=true;
    -- Both the sale_detail_jt and sale_detail tables have the shop_name column. You must use aliases to distinguish between the columns in the SELECT statement. 
    select a.shop_name as ashop, b.shop_name as bshop from sale_detail_jt a 
           right outer join sale_detail b on a.shop_name=b.shop_name;
    Returned result:
    +------------+------------+
    | ashop      | bshop      |
    +------------+------------+
    | NULL       | s3         |
    | NULL       | s6         |
    | NULL       | null       |
    | s2         | s2         |
    | NULL       | s7         |
    | s1         | s1         |
    +------------+------------+
  • Example 3: FULL OUTER JOIN. Sample statements:
    -- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. 
    set odps.sql.allow.fullscan=true;
    -- Both the sale_detail_jt and sale_detail tables have the shop_name column. You must use aliases to distinguish between the columns in the SELECT statement. 
    select a.shop_name as ashop, b.shop_name as bshop from sale_detail_jt a 
           full outer join sale_detail b on a.shop_name=b.shop_name;
    Returned result:
    +------------+------------+
    | ashop      | bshop      |
    +------------+------------+
    | NULL       | s3         |
    | NULL       | s6         |
    | s2         | s2         |
    | NULL       | null       |
    | NULL       | s7         |
    | s1         | s1         |
    | s5         | NULL       |
    +------------+------------+
  • Example 4: INNER JOIN. Sample statements:
    -- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. 
    set odps.sql.allow.fullscan=true;
    -- Both the sale_detail_jt and sale_detail tables have the shop_name column. You must use aliases to distinguish between the columns in the SELECT statement. 
    select a.shop_name as ashop, b.shop_name as bshop from sale_detail_jt a 
           inner join sale_detail b on a.shop_name=b.shop_name;
    Returned result:
    +------------+------------+
    | ashop      | bshop      |
    +------------+------------+
    | s2         | s2         |
    | s1         | s1         |
    +------------+------------+
  • Example 5: NATURAL JOIN. Sample statements:
    -- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. 
    set odps.sql.allow.fullscan=true;
    -- Perform a NATURAL JOIN operation. 
    select * from sale_detail_jt natural join sale_detail;
    -- The preceding statement is equivalent to the following statements. 
    select sale_detail_jt.shop_name as shop_name, sale_detail_jt.customer_id as customer_id, sale_detail_jt.total_price as total_price, sale_detail_jt.sale_date as sale_date, sale_detail_jt.region as region from sale_detail_jt 
    inner join sale_detail 
    on sale_detail_jt.shop_name=sale_detail.shop_name and sale_detail_jt.customer_id=sale_detail.customer_id and sale_detail_jt.total_price=sale_detail.total_price and sale_detail_jt.sale_date=sale_detail.sale_date and sale_detail_jt.region=sale_detail.region;
    Returned result:
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    +------------+-------------+-------------+------------+------------+
  • Example 6: implicit JOIN. Sample statements:
    -- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. 
    set odps.sql.allow.fullscan=true;
    -- Perform an implicit JOIN operation. 
    select * from sale_detail_jt, sale_detail where sale_detail_jt.shop_name = sale_detail.shop_name;
    -- The preceding statement is equivalent to the following statement. 
    select * from sale_detail_jt join sale_detail on sale_detail_jt.shop_name = sale_detail.shop_name;
    Returned result:
    +------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     | shop_name2 | customer_id2 | total_price2 | sale_date2 | region2    |
    +------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+
    | s2         | c2          | 100.2       | 2013       | china      | s2         | c2           | 100.2        | 2013       | china      |
    | s1         | c1          | 100.1       | 2013       | china      | s1         | c1           | 100.1        | 2013       | china      |
    +------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+
  • Example 7: Multiple JOIN operations. No priority is specified. Sample statements:
    -- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. 
    set odps.sql.allow.fullscan=true;
    -- Both the sale_detail_jt and sale_detail tables have the shop_name column. You must use aliases to distinguish between the columns in the SELECT statement. 
    select a.* from sale_detail_jt a full outer join sale_detail b on a.shop_name=b.shop_name
            full outer join sale_detail c on a.shop_name=c.shop_name;
    Returned result:
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s5         | c2          | 100.2       | 2013       | china      |
    | NULL       | NULL        | NULL        | NULL       | NULL       |
    | NULL       | NULL        | NULL        | NULL       | NULL       |
    | NULL       | NULL        | NULL        | NULL       | NULL       |
    | NULL       | NULL        | NULL        | NULL       | NULL       |
    | NULL       | NULL        | NULL        | NULL       | NULL       |
    | NULL       | NULL        | NULL        | NULL       | NULL       |
    | s1         | c1          | 100.1       | 2013       | china      |
    | NULL       | NULL        | NULL        | NULL       | NULL       |
    | s2         | c2          | 100.2       | 2013       | china      |
    | NULL       | NULL        | NULL        | NULL       | NULL       |
    +------------+-------------+-------------+------------+------------+
  • Example 7: Multiple JOIN operations. Use parentheses () to specify the priorities of JOIN operations. Sample statements:
    -- The full table scan feature must be enabled for partition tables. Otherwise, the JOIN operation fails. 
    set odps.sql.allow.fullscan=true;
    -- Perform multiple JOIN operations. Use parentheses () to specify the priority. 
    select * from shop join (sale_detail_jt join sale_detail on sale_detail_jt.shop_name = sale_detail.shop_name) on shop.shop_name=sale_detail_jt.shop_name;
    Returned result:
    +------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+------------+--------------+--------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     | shop_name2 | customer_id2 | total_price2 | sale_date2 | region2    | shop_name3 | customer_id3 | total_price3 | sale_date3 | region3    |
    +------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+------------+--------------+--------------+------------+------------+
    | s2         | c2          | 100.2       | 2013       | china      | s2         | c2           | 100.2        | 2013       | china      | s2         | c2           | 100.2        | 2013       | china      |
    | s1         | c1          | 100.1       | 2013       | china      | s1         | c1           | 100.1        | 2013       | china      | s1         | c1           | 100.1        | 2013       | china      |
    +------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+------------+--------------+--------------+------------+------------+
  • Example 8: Use JOIN and WHERE to query the number of records whose region is china and whose shop_name field has the same value in the two tables. All records in the sale_detail table are retained. Sample statements:
    -- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. 
    set odps.sql.allow.fullscan=true;
    -- Execute the following SQL statement. 
    select a.shop_name
            ,a.customer_id
            ,a.total_price
            ,b.total_price
    from    (select * from sale_detail where region = "china") a
    left join  (select * from sale_detail_jt where region = "china") b
    on      a.shop_name = b.shop_name;
    Returned result:
    +------------+-------------+-------------+--------------+
    | shop_name  | customer_id | total_price | total_price2 |
    +------------+-------------+-------------+--------------+
    | s1         | c1          | 100.1       | 100.1        |
    | s2         | c2          | 100.2       | 100.2        |
    | s3         | c3          | 100.3       | NULL         |
    +------------+-------------+-------------+--------------+
    Incorrect usage:
    select a.shop_name
            ,a.customer_id
            ,a.total_price
            ,b.total_price
    from    sale_detail a
    left join  sale_detail_jt b
    on      a.shop_name = b.shop_name
    where   a.region = "china" and b.region = "china";
    Returned result:
    +------------+-------------+-------------+--------------+
    | shop_name  | customer_id | total_price | total_price2 |
    +------------+-------------+-------------+--------------+
    | s1         | c1          | 100.1       | 100.1        |
    | s2         | c2          | 100.2       | 100.2        |
    +------------+-------------+-------------+--------------+

    The returned result is the intersection of the two tables, not all rows in the sale_detail table.