All Products
Search
Document Center

MaxCompute:JOIN

Last Updated:Mar 26, 2026

MaxCompute supports JOIN operations to combine rows from two tables based on a related column. Supported types include LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, INNER JOIN, NATURAL JOIN, implicit JOIN, and multiple JOIN operations.

Supported JOIN types

Join typeBehavior
LEFT OUTER JOIN (or LEFT JOIN)Returns all rows from the left table, including rows with no match in the right table. Unmatched columns from the right table are NULL.
RIGHT OUTER JOIN (or RIGHT JOIN)Returns all rows from the right table, including rows with no match in the left table. Unmatched columns from the left table are NULL.
FULL OUTER JOIN (or FULL JOIN)Returns all rows from both tables. Unmatched columns from either side are NULL.
INNER JOINReturns only rows where a match exists in both tables. The INNER keyword is optional.
NATURAL JOINAutomatically joins on all columns with the same name in both tables. MaxCompute supports OUTER NATURAL JOIN. When used with the USING clause, common columns appear only once in the output.
Implicit JOINOmit the JOIN keyword. List tables in the FROM clause separated by commas, and define the join condition in WHERE. Equivalent to INNER JOIN.
Multiple JOINsChain multiple JOIN operations in a single statement. Use parentheses () to control evaluation order — operations inside parentheses run first.

Limitations

  • CROSS JOIN is not supported. A CROSS JOIN joins two tables without conditions in the ON clause, producing a Cartesian product where every row in the first table is paired with every row in the second table.

  • JOIN conditions must use equi-joins combined with AND. Non-equi joins or OR-combined conditions are only supported in MAPJOIN.

Usage notes

  • NULL join keys are excluded. MaxCompute automatically adds a key IS NOT NULL filter for join keys. Rows where the join key is NULL are dropped from the result.

  • WHERE filters after JOIN. When a statement includes both JOIN and WHERE, the join runs first and its results are then filtered by WHERE. The final result is an intersection of the two tables, not all rows from either table. See Example 9 for a concrete illustration.

  • Avoid consecutive LEFT JOIN on tables with duplicate rows. In most JOIN operations, the left table is large and the right table is small. If the right table has duplicate row values, multiple consecutive LEFT JOIN operations may cause data bloat and job interruptions.

Syntax

<table_reference> JOIN <table_factor> [<join_condition>]
| <table_reference> {LEFT OUTER|RIGHT OUTER|FULL OUTER|INNER|NATURAL} JOIN <table_reference> <join_condition>

Parameters:

  • table_reference: The left table. Format: table_name [alias] | table_query [alias] | ...

  • table_factor: The right table or join target. Format: table_name [alias] | table_subquery [alias] | ...

  • join_condition: One or more equality expressions combined with AND. Format: ON equality_expression [AND equality_expression] ...

Partition pruning behavior differs based on where filter conditions are placed:
Conditions in WHERE: partition pruning applies to both the parent and child tables.
Conditions in ON: partition pruning applies only to the child table; a full table scan runs on the parent table. For details, see Check whether partition pruning is effective.

Controlling OUTER JOIN predicate behavior

The odps.task.sql.outerjoin.ppd parameter controls whether non-JOIN conditions in an OUTER JOIN ON clause are applied before or after the join.

ValueBehavior
falseThe non-JOIN condition in ON is treated as a WHERE subquery condition (non-standard behavior). Place non-JOIN conditions in WHERE instead.
trueStandard behavior. Non-JOIN conditions in ON are not moved to WHERE.

Configure this parameter at the project or session level.

When set to false, the following two statements are equivalent. When set to true, they are not:

SELECT A.*, B.* FROM A LEFT JOIN B ON A.c1 = B.c1 AND A.c2 = 'xxx';

SELECT A.*, B.* FROM (SELECT * FROM A WHERE c2 = 'xxx') A LEFT JOIN B ON A.c1 = B.c1;

Sample data

The following examples use two partitioned tables: sale_detail and sale_detail_jt.

-- Create the tables
CREATE TABLE IF NOT EXISTS sale_detail
(
  shop_name     STRING,
  customer_id   STRING,
  total_price   DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);

CREATE TABLE IF NOT EXISTS sale_detail_jt
(
  shop_name     STRING,
  customer_id   STRING,
  total_price   DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);

-- Add partitions
ALTER TABLE sale_detail ADD PARTITION (sale_date='2013', region='china') PARTITION (sale_date='2014', region='shanghai');
ALTER TABLE sale_detail_jt ADD PARTITION (sale_date='2013', region='china');

-- Insert data
INSERT INTO sale_detail PARTITION (sale_date='2013', region='china') VALUES ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
INSERT INTO sale_detail PARTITION (sale_date='2014', region='shanghai') VALUES ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
INSERT INTO sale_detail_jt PARTITION (sale_date='2013', region='china') VALUES ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2);

-- Preview the data (requires full table scan to be enabled for partitioned tables)
SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail;

sale_detail contains:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
| null       | c5          | NULL        | 2014       | shanghai   |
| s6         | c6          | 100.4       | 2014       | shanghai   |
| s7         | c7          | 100.5       | 2014       | shanghai   |
+------------+-------------+-------------+------------+------------+
SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail_jt;

sale_detail_jt contains:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s5         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+
-- Create a helper table used in the multiple-JOIN example
SET odps.sql.allow.fullscan=true;
CREATE TABLE shop AS SELECT shop_name, customer_id, total_price FROM sale_detail;

Examples

All examples use the sample tables defined in Sample data. Because both tables have a shop_name column, examples use table aliases (a, b) to distinguish columns in SELECT.

Example 1: LEFT OUTER JOIN

SET odps.sql.allow.fullscan=true;
SELECT a.shop_name AS ashop, b.shop_name AS bshop
FROM sale_detail_jt a
LEFT OUTER JOIN sale_detail b ON a.shop_name = b.shop_name;

Result:

+------------+------------+
| ashop      | bshop      |
+------------+------------+
| s2         | s2         |
| s1         | s1         |
| s5         | NULL       |
+------------+------------+

All three rows from sale_detail_jt (the left table) appear. s5 has no matching shop_name in sale_detail, so bshop is NULL.

Example 2: RIGHT OUTER JOIN

SET odps.sql.allow.fullscan=true;
SELECT a.shop_name AS ashop, b.shop_name AS bshop
FROM sale_detail_jt a
RIGHT OUTER JOIN sale_detail b ON a.shop_name = b.shop_name;

Result:

+------------+------------+
| ashop      | bshop      |
+------------+------------+
| s1         | s1         |
| s2         | s2         |
| NULL       | s3         |
| NULL       | null       |
| NULL       | s6         |
| NULL       | s7         |
+------------+------------+

All six rows from sale_detail (the right table) appear. The four shops that have no matching entry in sale_detail_jt produce NULL in the ashop column.

Example 3: FULL OUTER JOIN

SET odps.sql.allow.fullscan=true;
SELECT a.shop_name AS ashop, b.shop_name AS bshop
FROM sale_detail_jt a
FULL OUTER JOIN sale_detail b ON a.shop_name = b.shop_name;

Result:

+------------+------------+
| ashop      | bshop      |
+------------+------------+
| NULL       | s3         |
| NULL       | s6         |
| s2         | s2         |
| NULL       | null       |
| NULL       | s7         |
| s1         | s1         |
| s5         | NULL       |
+------------+------------+

All rows from both tables appear. s5 exists only in sale_detail_jt (NULL on the right), and s3, null, s6, s7 exist only in sale_detail (NULL on the left).

Example 4: INNER JOIN

SET odps.sql.allow.fullscan=true;
SELECT a.shop_name AS ashop, b.shop_name AS bshop
FROM sale_detail_jt a
INNER JOIN sale_detail b ON a.shop_name = b.shop_name;

Result:

+------------+------------+
| ashop      | bshop      |
+------------+------------+
| s2         | s2         |
| s1         | s1         |
+------------+------------+

Only s1 and s2 exist in both tables, so only those two rows are returned.

Example 5: NATURAL JOIN

NATURAL JOIN joins on all columns with the same name in both tables. Here, that means all five columns: shop_name, customer_id, total_price, sale_date, and region.

SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail_jt NATURAL JOIN sale_detail;

This is equivalent to:

SELECT sale_detail_jt.shop_name AS shop_name,
       sale_detail_jt.customer_id AS customer_id,
       sale_detail_jt.total_price AS total_price,
       sale_detail_jt.sale_date AS sale_date,
       sale_detail_jt.region AS region
FROM sale_detail_jt
INNER JOIN sale_detail
ON sale_detail_jt.shop_name = sale_detail.shop_name
   AND sale_detail_jt.customer_id = sale_detail.customer_id
   AND sale_detail_jt.total_price = sale_detail.total_price
   AND sale_detail_jt.sale_date = sale_detail.sale_date
   AND sale_detail_jt.region = sale_detail.region;

Result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

Only s1 and s2 match across all five columns. s5 shares customer_id and total_price with s2, but shop_name differs, so it is excluded.

Example 6: Implicit JOIN

Omitting the JOIN keyword and listing tables separated by commas in FROM produces an implicit inner join.

SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail_jt, sale_detail
WHERE sale_detail_jt.shop_name = sale_detail.shop_name;

This is equivalent to:

SELECT * FROM sale_detail_jt
JOIN sale_detail ON sale_detail_jt.shop_name = sale_detail.shop_name;

Result:

+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     | shop_name2 | customer_id2 | total_price2 | sale_date2 | region2    |
+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+
| s2         | c2          | 100.2       | 2013       | china      | s2         | c2           | 100.2        | 2013       | china      |
| s1         | c1          | 100.1       | 2013       | china      | s1         | c1           | 100.1        | 2013       | china      |
+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+

Example 7: Multiple JOINs without explicit priority

When multiple JOINs are chained without parentheses, they are evaluated left to right.

SET odps.sql.allow.fullscan=true;
SELECT a.* FROM sale_detail_jt a
FULL OUTER JOIN sale_detail b ON a.shop_name = b.shop_name
FULL OUTER JOIN sale_detail c ON a.shop_name = c.shop_name;

Result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s5         | c2          | 100.2       | 2013       | china      |
| NULL       | NULL        | NULL        | NULL       | NULL       |
| NULL       | NULL        | NULL        | NULL       | NULL       |
| NULL       | NULL        | NULL        | NULL       | NULL       |
| NULL       | NULL        | NULL        | NULL       | NULL       |
| NULL       | NULL        | NULL        | NULL       | NULL       |
| NULL       | NULL        | NULL        | NULL       | NULL       |
| s1         | c1          | 100.1       | 2013       | china      |
| NULL       | NULL        | NULL        | NULL       | NULL       |
| s2         | c2          | 100.2       | 2013       | china      |
| NULL       | NULL        | NULL        | NULL       | NULL       |
+------------+-------------+-------------+------------+------------+

Each FULL OUTER JOIN with sale_detail (which has 6 rows) generates additional NULL rows for unmatched combinations from both sides of each join. The result selects only columns from a (sale_detail_jt), so matched rows from sale_detail appear as NULL.

Example 8: Multiple JOINs with explicit priority

Use parentheses to control which JOIN is evaluated first.

SET odps.sql.allow.fullscan=true;
SELECT * FROM shop
JOIN (sale_detail_jt JOIN sale_detail ON sale_detail_jt.shop_name = sale_detail.shop_name)
ON shop.shop_name = sale_detail_jt.shop_name;

The inner join (sale_detail_jt JOIN sale_detail ...) runs first, producing only matched rows. The outer join with shop then further filters on matching shop_name.

Result:

+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+------------+--------------+--------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     | shop_name2 | customer_id2 | total_price2 | sale_date2 | region2    | shop_name3 | customer_id3 | total_price3 | sale_date3 | region3    |
+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+------------+--------------+--------------+------------+------------+
| s2         | c2          | 100.2       | 2013       | china      | s2         | c2           | 100.2        | 2013       | china      | s2         | c2           | 100.2        | 2013       | china      |
| s1         | c1          | 100.1       | 2013       | china      | s1         | c1           | 100.1        | 2013       | china      | s1         | c1           | 100.1        | 2013       | china      |
+------------+-------------+-------------+------------+------------+------------+--------------+--------------+------------+------------+------------+--------------+--------------+------------+------------+

Example 9: WHERE vs subquery filtering with LEFT JOIN

This example shows a critical behavioral difference when filtering rows in an OUTER JOIN.

Correct approach — filter in a subquery:

SET odps.sql.allow.fullscan=true;
SELECT a.shop_name, a.customer_id, a.total_price, b.total_price
FROM (SELECT * FROM sale_detail WHERE region = 'china') a
LEFT JOIN (SELECT * FROM sale_detail_jt WHERE region = 'china') b
ON a.shop_name = b.shop_name;

Result (all 3 rows from the filtered left table are preserved):

+------------+-------------+-------------+--------------+
| shop_name  | customer_id | total_price | total_price2 |
+------------+-------------+-------------+--------------+
| s1         | c1          | 100.1       | 100.1        |
| s2         | c2          | 100.2       | 100.2        |
| s3         | c3          | 100.3       | NULL         |
+------------+-------------+-------------+--------------+

s3 has no match in sale_detail_jt, so total_price2 is NULL — but the row is still returned because it belongs to the left table.

Incorrect approach — filter in WHERE after LEFT JOIN:

SELECT a.shop_name, a.customer_id, a.total_price, b.total_price
FROM sale_detail a
LEFT JOIN sale_detail_jt b ON a.shop_name = b.shop_name
WHERE a.region = 'china' AND b.region = 'china';

Result (only 2 rows — the intersection):

+------------+-------------+-------------+--------------+
| shop_name  | customer_id | total_price | total_price2 |
+------------+-------------+-------------+--------------+
| s1         | c1          | 100.1       | 100.1        |
| s2         | c2          | 100.2       | 100.2        |
+------------+-------------+-------------+--------------+

s3 is excluded because b.region is NULL for that row, and the WHERE condition b.region = 'china' filters it out. The WHERE clause runs after LEFT JOIN and effectively turns the outer join into an inner join — a common source of bugs.

To preserve outer join semantics, filter rows before the join using subqueries. Placing filter conditions in WHERE after an OUTER JOIN silently drops rows that the outer join was meant to preserve.