MaxCompute allows you to use JOIN
operations to join tables and return the data that meets join and query conditions. This topic describes the following JOIN operations: LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, INNER JOIN, NATURAL JOIN, implicit JOIN, and multiple JOIN operations.
Overview
MaxCompute supports the following types of JOIN
operations:
LEFT OUTER JOIN
It is also called
LEFT JOIN
. LEFT OUTER JOIN returns all rows in the left table, including the rows that do not match any rows in the right table.NoteIn a
JOIN
operation, the left table is a large table, and the right table is a small table in most cases. If the values in some rows of the right table are duplicate, we recommend that you do not perform multiple consecutiveLEFT JOIN
operations. If you perform multiple consecutiveLEFT JOIN
operations, data bloat may occur, and your jobs are interrupted.RIGHT OUTER JOIN
It is also called
RIGHT JOIN
. RIGHT OUTER JOIN returns all rows in the right table, including the rows that do not match any rows in the left table.FULL OUTER JOIN
It is also called
FULL JOIN
. FULL OUTER JOIN returns all rows in both the left and right tables.INNER JOIN
The
INNER
keyword can be omitted.INNER JOIN
returns data rows if a match exists between the left and right tables.NATURAL JOIN
In a
NATURAL JOIN
operation, the fields that are used to join two tables are determined based on the common fields of the two tables. MaxCompute supportsOUTER NATURAL JOIN
. If you use theUSING
clause, theNATURAL JOIN
operation returns common fields only once.Implicit JOIN operation
You can perform an implicit
JOIN
operation without the need to specify the JOIN keyword.Multiple JOIN operations
MaxCompute supports multiple
JOIN
operations. You can use parentheses () to specify the priorities ofJOIN
operations. AJOIN
operation that is enclosed in parentheses () has a higher priority.
If an SQL statement contains the
WHERE
clause and you use theJOIN
clause before theWHERE
clause, theJOIN
operation is performed first. Then, the results obtained from theJOIN
operation are filtered based on the conditions specified by theWHERE
clause. The final result is the intersection of two tables, not all rows in a table.You can use the odps.task.sql.outerjoin.ppd parameter to control whether to use a non-JOIN condition in the
OUTER JOIN ON
clause as the input data of theJOIN
operation. You can configure this parameter at the project or session level.If you set this parameter to
false
, the non-JOIN condition in theON
clause is considered as the condition in theWHERE
clause for the subquery of the JOIN operation. This is non-standard behavior. We recommend that you specify the non-JOIN condition in theWHERE
clause.If you set this parameter to
false
, the following SQL statements are equivalent. If you set this parameter totrue
, the two SQL statements are not equivalent.
SELECT A.*, B.* FROM A LEFT JOIN B ON A.c1 = B.c1 and A.c2='xxx'; SELECT A.*, B.* FROM (SELECT * FROM A WHERE c2='xxx') A LEFT JOIN B ON A.c1 = B.c1;
Usage notes
In a JOIN
operation, the filter condition key is not null
of the JOIN
operation is automatically added for calculation. The row whose value of the join key is null is filtered out after the JOIN operation.
Limits
When you perform a JOIN
operation, take note of the following limits:
MaxCompute does not support
CROSS JOIN
. A CROSS JOIN operation joins two tables without requiring you to specify conditions in theON
clause.You must use equi-joins and combine conditions by using
AND
. You can use non-equi joins or combine multiple conditions by usingOR
in aMAPJOIN
operation. For more information, see MAPJOIN.
Syntax
<table_reference> join <table_factor> [<join_condition>]
| <table_reference> {left outer|right outer|full outer|inner|natural} join <table_reference> <join_condition>
table_reference: required. The query statement for the left table on which the
JOIN
operation is performed. The value of this parameter is in thetable_name [alias] | table_query [alias] |...
format.table_factor: required. The query statement for the right table or a table on which the
JOIN
operation is performed. The value of this parameter is in thetable_name [alias] | table_subquery [alias] |...
format.join_condition: optional. A
JOIN
condition is a combination of one or more equality expressions. The value of this parameter is in theon equality_expression [and equality_expression]...
format.equality_expression
is an equality expression.
If partition pruning conditions are specified in the WHERE
clause, partition pruning takes effect on both the parent and child tables. If partition pruning conditions are specified in the ON
clause, partition pruning takes effect only on the child table. As a result, a full table scan is run for the parent table. For more information, see Check whether partition pruning is effective.
Sample data
Sample source data is provided for you to better understand the examples in this topic. The following statements show how to create the sale_detail and sale_detail_jt tables and insert data into the tables.
-- Create two partitioned tables named sale_detail and sale_detail_jt.
create table if not exists sale_detail
(
shop_name string,
customer_id string,
total_price double
)
partitioned by (sale_date string, region string);
create table if not exists sale_detail_jt
(
shop_name string,
customer_id string,
total_price double
)
partitioned by (sale_date string, region string);
-- Add partitions to the two tables.
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');
alter table sale_detail_jt add partition (sale_date='2013', region='china');
-- Insert data into the tables.
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
insert into sale_detail_jt partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2);
Query data from the sale_detail and sale_detail_jt tables. Sample statements:
select * from sale_detail;
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+
select * from sale_detail_jt;
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s5 | c2 | 100.2 | 2013 | china |
+------------+-------------+-------------+------------+------------+
-- Create a table for the JOIN operation.
SET odps.sql.allow.fullscan=true;
create table shop as select shop_name, customer_id, total_price from sale_detail;
Examples
Example 1: LEFT OUTER JOIN. Sample statements:
-- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. set odps.sql.allow.fullscan=true; -- Both the sale_detail_jt and sale_detail tables have the shop_name column. You must use aliases to distinguish between the columns in the SELECT statement. select a.shop_name as ashop, b.shop_name as bshop from sale_detail_jt a left outer join sale_detail b on a.shop_name=b.shop_name;
The following result is returned:
+------------+------------+ | ashop | bshop | +------------+------------+ | s2 | s2 | | s1 | s1 | | s5 | NULL | +------------+------------+
Example 2: RIGHT OUTER JOIN. Sample statements:
-- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. set odps.sql.allow.fullscan=true; -- Both the sale_detail_jt and sale_detail tables have the shop_name column. You must use aliases to distinguish between the columns in the SELECT statement. select a.shop_name as ashop, b.shop_name as bshop from sale_detail_jt a right outer join sale_detail b on a.shop_name=b.shop_name;
The following result is returned:
+------------+------------+ | ashop | bshop | +------------+------------+ | NULL | s3 | | NULL | s6 | | NULL | null | | s2 | s2 | | NULL | s7 | | s1 | s1 | +------------+------------+
Example 3: FULL OUTER JOIN. Sample statements:
-- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. set odps.sql.allow.fullscan=true; -- Both the sale_detail_jt and sale_detail tables have the shop_name column. You must use aliases to distinguish between the columns in the SELECT statement. select a.shop_name as ashop, b.shop_name as bshop from sale_detail_jt a full outer join sale_detail b on a.shop_name=b.shop_name;
The following result is returned:
+------------+------------+ | ashop | bshop | +------------+------------+ | NULL | s3 | | NULL | s6 | | s2 | s2 | | NULL | null | | NULL | s7 | | s1 | s1 | | s5 | NULL | +------------+------------+
Example 4: INNER JOIN. Sample statements:
-- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. set odps.sql.allow.fullscan=true; -- Both the sale_detail_jt and sale_detail tables have the shop_name column. You must use aliases to distinguish between the columns in the SELECT statement. select a.shop_name as ashop, b.shop_name as bshop from sale_detail_jt a inner join sale_detail b on a.shop_name=b.shop_name;
The following result is returned:
+------------+------------+ | ashop | bshop | +------------+------------+ | s2 | s2 | | s1 | s1 | +------------+------------+
Example 5: NATURAL JOIN. Sample statements:
-- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. set odps.sql.allow.fullscan=true; -- Perform a NATURAL JOIN operation. select * from sale_detail_jt natural join sale_detail; -- The preceding statement is equivalent to the following statement: select sale_detail_jt.shop_name as shop_name, sale_detail_jt.customer_id as customer_id, sale_detail_jt.total_price as total_price, sale_detail_jt.sale_date as sale_date, sale_detail_jt.region as region from sale_detail_jt inner join sale_detail on sale_detail_jt.shop_name=sale_detail.shop_name and sale_detail_jt.customer_id=sale_detail.customer_id and sale_detail_jt.total_price=sale_detail.total_price and sale_detail_jt.sale_date=sale_detail.sale_date and sale_detail_jt.region=sale_detail.region;
The following result is returned:
+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | +------------+-------------+-------------+------------+------------+
Example 6: implicit JOIN. Sample statements:
-- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. set odps.sql.allow.fullscan=true; -- Perform an implicit JOIN operation. select * from sale_detail_jt, sale_detail where sale_detail_jt.shop_name = sale_detail.shop_name; -- The preceding statement is equivalent to the following statement: select * from sale_detail_jt join sale_detail on sale_detail_jt.shop_name = sale_detail.shop_name;
The following result is returned:
+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | shop_name2 | customer_id2 | total_price2 | sale_date2 | region2 | +------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+ | s2 | c2 | 100.2 | 2013 | china | s2 | c2 | 100.2 | 2013 | china | | s1 | c1 | 100.1 | 2013 | china | s1 | c1 | 100.1 | 2013 | china | +------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+
Example 7: Multiple JOIN operations. No priority is specified. Sample statements:
-- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. set odps.sql.allow.fullscan=true; -- Both the sale_detail_jt and sale_detail tables have the shop_name column. You must use aliases to distinguish between the columns in the SELECT statement. select a.* from sale_detail_jt a full outer join sale_detail b on a.shop_name=b.shop_name full outer join sale_detail c on a.shop_name=c.shop_name;
The following result is returned:
+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s5 | c2 | 100.2 | 2013 | china | | NULL | NULL | NULL | NULL | NULL | | NULL | NULL | NULL | NULL | NULL | | NULL | NULL | NULL | NULL | NULL | | NULL | NULL | NULL | NULL | NULL | | NULL | NULL | NULL | NULL | NULL | | NULL | NULL | NULL | NULL | NULL | | s1 | c1 | 100.1 | 2013 | china | | NULL | NULL | NULL | NULL | NULL | | s2 | c2 | 100.2 | 2013 | china | | NULL | NULL | NULL | NULL | NULL | +------------+-------------+-------------+------------+------------+
Example 7: Multiple JOIN operations. Use parentheses () to specify the priorities of JOIN operations. Sample statements:
-- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. set odps.sql.allow.fullscan=true; -- Perform multiple JOIN operations. Use parentheses () to specify the priority. select * from shop join (sale_detail_jt join sale_detail on sale_detail_jt.shop_name = sale_detail.shop_name) on shop.shop_name=sale_detail_jt.shop_name;
The following result is returned:
+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+------------+--------------+--------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | shop_name2 | customer_id2 | total_price2 | sale_date2 | region2 | shop_name3 | customer_id3 | total_price3 | sale_date3 | region3 | +------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+------------+--------------+--------------+------------+------------+ | s2 | c2 | 100.2 | 2013 | china | s2 | c2 | 100.2 | 2013 | china | s2 | c2 | 100.2 | 2013 | china | | s1 | c1 | 100.1 | 2013 | china | s1 | c1 | 100.1 | 2013 | china | s1 | c1 | 100.1 | 2013 | china | +------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+------------+--------------+--------------+------------+------------+
Example 8: Use
JOIN
andWHERE
to query the number of records whose region is china and whose shop_name field has the same value in the two tables. All records in the sale_detail table are retained. Sample statements:-- The full table scan feature must be enabled for partitioned tables. Otherwise, the JOIN operation fails. set odps.sql.allow.fullscan=true; -- Execute the following SQL statement: select a.shop_name ,a.customer_id ,a.total_price ,b.total_price from (select * from sale_detail where region = "china") a left join (select * from sale_detail_jt where region = "china") b on a.shop_name = b.shop_name;
The following result is returned:
+------------+-------------+-------------+--------------+ | shop_name | customer_id | total_price | total_price2 | +------------+-------------+-------------+--------------+ | s1 | c1 | 100.1 | 100.1 | | s2 | c2 | 100.2 | 100.2 | | s3 | c3 | 100.3 | NULL | +------------+-------------+-------------+--------------+
Sample statement of incorrect usage:
select a.shop_name ,a.customer_id ,a.total_price ,b.total_price from sale_detail a left join sale_detail_jt b on a.shop_name = b.shop_name where a.region = "china" and b.region = "china";
The following result is returned:
+------------+-------------+-------------+--------------+ | shop_name | customer_id | total_price | total_price2 | +------------+-------------+-------------+--------------+ | s1 | c1 | 100.1 | 100.1 | | s2 | c2 | 100.2 | 100.2 | +------------+-------------+-------------+--------------+
The returned result is the intersection of the two tables, not all rows in the sale_detail table.