MaxCompute supports JOIN operations to combine rows from two tables based on a related column. Supported types include LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, INNER JOIN, NATURAL JOIN, implicit JOIN, and multiple JOIN operations.
Supported JOIN types
| Join type | Behavior |
|---|---|
LEFT OUTER JOIN (or LEFT JOIN) | Returns all rows from the left table, including rows with no match in the right table. Unmatched columns from the right table are NULL. |
RIGHT OUTER JOIN (or RIGHT JOIN) | Returns all rows from the right table, including rows with no match in the left table. Unmatched columns from the left table are NULL. |
FULL OUTER JOIN (or FULL JOIN) | Returns all rows from both tables. Unmatched columns from either side are NULL. |
INNER JOIN | Returns only rows where a match exists in both tables. The INNER keyword is optional. |
NATURAL JOIN | Automatically joins on all columns with the same name in both tables. MaxCompute supports OUTER NATURAL JOIN. When used with the USING clause, common columns appear only once in the output. |
| Implicit JOIN | Omit the JOIN keyword. List tables in the FROM clause separated by commas, and define the join condition in WHERE. Equivalent to INNER JOIN. |
| Multiple JOINs | Chain multiple JOIN operations in a single statement. Use parentheses () to control evaluation order — operations inside parentheses run first. |
Limitations
CROSS JOINis not supported. A CROSS JOIN joins two tables without conditions in theONclause, producing a Cartesian product where every row in the first table is paired with every row in the second table.JOIN conditions must use equi-joins combined with
AND. Non-equi joins orOR-combined conditions are only supported inMAPJOIN.
Usage notes
NULL join keys are excluded. MaxCompute automatically adds a
key IS NOT NULLfilter for join keys. Rows where the join key is NULL are dropped from the result.WHEREfilters afterJOIN. When a statement includes bothJOINandWHERE, the join runs first and its results are then filtered byWHERE. The final result is an intersection of the two tables, not all rows from either table. See Example 9 for a concrete illustration.Avoid consecutive
LEFT JOINon tables with duplicate rows. In most JOIN operations, the left table is large and the right table is small. If the right table has duplicate row values, multiple consecutiveLEFT JOINoperations may cause data bloat and job interruptions.
Syntax
<table_reference> JOIN <table_factor> [<join_condition>]
| <table_reference> {LEFT OUTER|RIGHT OUTER|FULL OUTER|INNER|NATURAL} JOIN <table_reference> <join_condition>Parameters:
table_reference: The left table. Format:table_name [alias] | table_query [alias] | ...table_factor: The right table or join target. Format:table_name [alias] | table_subquery [alias] | ...join_condition: One or more equality expressions combined withAND. Format:ON equality_expression [AND equality_expression] ...
Partition pruning behavior differs based on where filter conditions are placed:
Conditions in WHERE: partition pruning applies to both the parent and child tables.Conditions in ON: partition pruning applies only to the child table; a full table scan runs on the parent table. For details, see Check whether partition pruning is effective.Controlling OUTER JOIN predicate behavior
The odps.task.sql.outerjoin.ppd parameter controls whether non-JOIN conditions in an OUTER JOIN ON clause are applied before or after the join.
| Value | Behavior |
|---|---|
false | The non-JOIN condition in ON is treated as a WHERE subquery condition (non-standard behavior). Place non-JOIN conditions in WHERE instead. |
true | Standard behavior. Non-JOIN conditions in ON are not moved to WHERE. |
Configure this parameter at the project or session level.
When set to false, the following two statements are equivalent. When set to true, they are not:
SELECT A.*, B.* FROM A LEFT JOIN B ON A.c1 = B.c1 AND A.c2 = 'xxx';
SELECT A.*, B.* FROM (SELECT * FROM A WHERE c2 = 'xxx') A LEFT JOIN B ON A.c1 = B.c1;Sample data
The following examples use two partitioned tables: sale_detail and sale_detail_jt.
-- Create the tables
CREATE TABLE IF NOT EXISTS sale_detail
(
shop_name STRING,
customer_id STRING,
total_price DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);
CREATE TABLE IF NOT EXISTS sale_detail_jt
(
shop_name STRING,
customer_id STRING,
total_price DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);
-- Add partitions
ALTER TABLE sale_detail ADD PARTITION (sale_date='2013', region='china') PARTITION (sale_date='2014', region='shanghai');
ALTER TABLE sale_detail_jt ADD PARTITION (sale_date='2013', region='china');
-- Insert data
INSERT INTO sale_detail PARTITION (sale_date='2013', region='china') VALUES ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
INSERT INTO sale_detail PARTITION (sale_date='2014', region='shanghai') VALUES ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
INSERT INTO sale_detail_jt PARTITION (sale_date='2013', region='china') VALUES ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2);
-- Preview the data (requires full table scan to be enabled for partitioned tables)
SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail;sale_detail contains:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail_jt;sale_detail_jt contains:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s5 | c2 | 100.2 | 2013 | china |
+------------+-------------+-------------+------------+------------+-- Create a helper table used in the multiple-JOIN example
SET odps.sql.allow.fullscan=true;
CREATE TABLE shop AS SELECT shop_name, customer_id, total_price FROM sale_detail;Examples
All examples use the sample tables defined in Sample data. Because both tables have a shop_name column, examples use table aliases (a, b) to distinguish columns in SELECT.
Example 1: LEFT OUTER JOIN
SET odps.sql.allow.fullscan=true;
SELECT a.shop_name AS ashop, b.shop_name AS bshop
FROM sale_detail_jt a
LEFT OUTER JOIN sale_detail b ON a.shop_name = b.shop_name;Result:
+------------+------------+
| ashop | bshop |
+------------+------------+
| s2 | s2 |
| s1 | s1 |
| s5 | NULL |
+------------+------------+All three rows from sale_detail_jt (the left table) appear. s5 has no matching shop_name in sale_detail, so bshop is NULL.
Example 2: RIGHT OUTER JOIN
SET odps.sql.allow.fullscan=true;
SELECT a.shop_name AS ashop, b.shop_name AS bshop
FROM sale_detail_jt a
RIGHT OUTER JOIN sale_detail b ON a.shop_name = b.shop_name;Result:
+------------+------------+
| ashop | bshop |
+------------+------------+
| s1 | s1 |
| s2 | s2 |
| NULL | s3 |
| NULL | null |
| NULL | s6 |
| NULL | s7 |
+------------+------------+All six rows from sale_detail (the right table) appear. The four shops that have no matching entry in sale_detail_jt produce NULL in the ashop column.
Example 3: FULL OUTER JOIN
SET odps.sql.allow.fullscan=true;
SELECT a.shop_name AS ashop, b.shop_name AS bshop
FROM sale_detail_jt a
FULL OUTER JOIN sale_detail b ON a.shop_name = b.shop_name;Result:
+------------+------------+
| ashop | bshop |
+------------+------------+
| NULL | s3 |
| NULL | s6 |
| s2 | s2 |
| NULL | null |
| NULL | s7 |
| s1 | s1 |
| s5 | NULL |
+------------+------------+All rows from both tables appear. s5 exists only in sale_detail_jt (NULL on the right), and s3, null, s6, s7 exist only in sale_detail (NULL on the left).
Example 4: INNER JOIN
SET odps.sql.allow.fullscan=true;
SELECT a.shop_name AS ashop, b.shop_name AS bshop
FROM sale_detail_jt a
INNER JOIN sale_detail b ON a.shop_name = b.shop_name;Result:
+------------+------------+
| ashop | bshop |
+------------+------------+
| s2 | s2 |
| s1 | s1 |
+------------+------------+Only s1 and s2 exist in both tables, so only those two rows are returned.
Example 5: NATURAL JOIN
NATURAL JOIN joins on all columns with the same name in both tables. Here, that means all five columns: shop_name, customer_id, total_price, sale_date, and region.
SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail_jt NATURAL JOIN sale_detail;This is equivalent to:
SELECT sale_detail_jt.shop_name AS shop_name,
sale_detail_jt.customer_id AS customer_id,
sale_detail_jt.total_price AS total_price,
sale_detail_jt.sale_date AS sale_date,
sale_detail_jt.region AS region
FROM sale_detail_jt
INNER JOIN sale_detail
ON sale_detail_jt.shop_name = sale_detail.shop_name
AND sale_detail_jt.customer_id = sale_detail.customer_id
AND sale_detail_jt.total_price = sale_detail.total_price
AND sale_detail_jt.sale_date = sale_detail.sale_date
AND sale_detail_jt.region = sale_detail.region;Result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
+------------+-------------+-------------+------------+------------+Only s1 and s2 match across all five columns. s5 shares customer_id and total_price with s2, but shop_name differs, so it is excluded.
Example 6: Implicit JOIN
Omitting the JOIN keyword and listing tables separated by commas in FROM produces an implicit inner join.
SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail_jt, sale_detail
WHERE sale_detail_jt.shop_name = sale_detail.shop_name;This is equivalent to:
SELECT * FROM sale_detail_jt
JOIN sale_detail ON sale_detail_jt.shop_name = sale_detail.shop_name;Result:
+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region | shop_name2 | customer_id2 | total_price2 | sale_date2 | region2 |
+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+
| s2 | c2 | 100.2 | 2013 | china | s2 | c2 | 100.2 | 2013 | china |
| s1 | c1 | 100.1 | 2013 | china | s1 | c1 | 100.1 | 2013 | china |
+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+Example 7: Multiple JOINs without explicit priority
When multiple JOINs are chained without parentheses, they are evaluated left to right.
SET odps.sql.allow.fullscan=true;
SELECT a.* FROM sale_detail_jt a
FULL OUTER JOIN sale_detail b ON a.shop_name = b.shop_name
FULL OUTER JOIN sale_detail c ON a.shop_name = c.shop_name;Result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s5 | c2 | 100.2 | 2013 | china |
| NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL |
| s1 | c1 | 100.1 | 2013 | china |
| NULL | NULL | NULL | NULL | NULL |
| s2 | c2 | 100.2 | 2013 | china |
| NULL | NULL | NULL | NULL | NULL |
+------------+-------------+-------------+------------+------------+Each FULL OUTER JOIN with sale_detail (which has 6 rows) generates additional NULL rows for unmatched combinations from both sides of each join. The result selects only columns from a (sale_detail_jt), so matched rows from sale_detail appear as NULL.
Example 8: Multiple JOINs with explicit priority
Use parentheses to control which JOIN is evaluated first.
SET odps.sql.allow.fullscan=true;
SELECT * FROM shop
JOIN (sale_detail_jt JOIN sale_detail ON sale_detail_jt.shop_name = sale_detail.shop_name)
ON shop.shop_name = sale_detail_jt.shop_name;The inner join (sale_detail_jt JOIN sale_detail ...) runs first, producing only matched rows. The outer join with shop then further filters on matching shop_name.
Result:
+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+------------+--------------+--------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region | shop_name2 | customer_id2 | total_price2 | sale_date2 | region2 | shop_name3 | customer_id3 | total_price3 | sale_date3 | region3 |
+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+------------+--------------+--------------+------------+------------+
| s2 | c2 | 100.2 | 2013 | china | s2 | c2 | 100.2 | 2013 | china | s2 | c2 | 100.2 | 2013 | china |
| s1 | c1 | 100.1 | 2013 | china | s1 | c1 | 100.1 | 2013 | china | s1 | c1 | 100.1 | 2013 | china |
+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+------------+--------------+--------------+------------+------------+Example 9: WHERE vs subquery filtering with LEFT JOIN
This example shows a critical behavioral difference when filtering rows in an OUTER JOIN.
Correct approach — filter in a subquery:
SET odps.sql.allow.fullscan=true;
SELECT a.shop_name, a.customer_id, a.total_price, b.total_price
FROM (SELECT * FROM sale_detail WHERE region = 'china') a
LEFT JOIN (SELECT * FROM sale_detail_jt WHERE region = 'china') b
ON a.shop_name = b.shop_name;Result (all 3 rows from the filtered left table are preserved):
+------------+-------------+-------------+--------------+
| shop_name | customer_id | total_price | total_price2 |
+------------+-------------+-------------+--------------+
| s1 | c1 | 100.1 | 100.1 |
| s2 | c2 | 100.2 | 100.2 |
| s3 | c3 | 100.3 | NULL |
+------------+-------------+-------------+--------------+s3 has no match in sale_detail_jt, so total_price2 is NULL — but the row is still returned because it belongs to the left table.
Incorrect approach — filter in WHERE after LEFT JOIN:
SELECT a.shop_name, a.customer_id, a.total_price, b.total_price
FROM sale_detail a
LEFT JOIN sale_detail_jt b ON a.shop_name = b.shop_name
WHERE a.region = 'china' AND b.region = 'china';Result (only 2 rows — the intersection):
+------------+-------------+-------------+--------------+
| shop_name | customer_id | total_price | total_price2 |
+------------+-------------+-------------+--------------+
| s1 | c1 | 100.1 | 100.1 |
| s2 | c2 | 100.2 | 100.2 |
+------------+-------------+-------------+--------------+s3 is excluded because b.region is NULL for that row, and the WHERE condition b.region = 'china' filters it out. The WHERE clause runs after LEFT JOIN and effectively turns the outer join into an inner join — a common source of bugs.
To preserve outer join semantics, filter rows before the join using subqueries. Placing filter conditions in WHERE after an OUTER JOIN silently drops rows that the outer join was meant to preserve.