All Products
Search
Document Center

MaxCompute:JOIN operations in MaxCompute SQL

Last Updated:Mar 26, 2026

Placing a filter condition in the wrong clause of a JOIN query can silently produce incorrect results — especially with outer joins. This topic explains how each JOIN type handles filter conditions placed in subqueries, the ON clause, and the outer WHERE clause, so you can write queries that return exactly what you expect.

Supported JOIN types

MaxCompute SQL supports the following JOIN operations.

Operation Description
INNER JOIN Returns rows with matching column values in both tables.
LEFT JOIN Returns all rows from the left table. For rows in the left table with no match in the right table, NULL values appear in right-table columns.
RIGHT JOIN Returns all rows from the right table. For rows in the right table with no match in the left table, NULL values appear in left-table columns.
FULL JOIN Returns all rows from both tables. Where a row has no match in the other table, NULL values fill the unmatched columns.
LEFT SEMI JOIN Returns rows from the left table that have at least one match in the right table. Right-table rows are not included in the result.
LEFT ANTI JOIN Returns rows from the left table that have no match in the right table. Right-table rows are not included in the result. Often used to replace NOT EXISTS.

How filter condition placement affects results

A single SQL statement can combine subquery filters, an ON clause, and an outer WHERE clause:

SELECT *
FROM
  (SELECT * FROM A WHERE <subquery_filter_A>) A
JOIN
  (SELECT * FROM B WHERE <subquery_filter_B>) B
ON <on_condition>
WHERE <where_condition>

MaxCompute evaluates these conditions in this order:

  1. <subquery_filter> — WHERE conditions inside subqueries

  2. <on_condition> — the ON clause

  3. <where_condition> — the WHERE clause after JOIN

Because of this evaluation order, the same logical filter can produce different results depending on where you place it.

Important

For outer JOINs (LEFT JOIN, RIGHT JOIN, FULL JOIN), filtering on the null-supplying table's columns in the outer WHERE clause eliminates rows that carry NULL values — rows where the preserved-row table had no match. This effectively converts the outer join into an INNER JOIN. To preserve outer-join behavior, place those filters in the ON clause or in a subquery.

Outer JOIN terminology used in this topic:

Term Definition Examples
Preserved-row table The table whose rows must all appear in the result Left table in LEFT JOIN; right table in RIGHT JOIN; both tables in FULL JOIN
Null-supplying table The table that contributes NULL values for unmatched rows Right table in LEFT JOIN; left table in RIGHT JOIN; both tables in FULL JOIN

Test tables

The examples in this topic use two tables, A and B.

Table A

CREATE TABLE A AS SELECT * FROM VALUES (1, 20180101),(2, 20180101),(2, 20180102) t (key, ds);
key ds
1 20180101
2 20180101
2 20180102

Table B

CREATE TABLE B AS SELECT * FROM VALUES (1, 20180101),(3, 20180101),(2, 20180102) t (key, ds);
key ds
1 20180101
3 20180101
2 20180102

The Cartesian product of A and B has 9 rows. Enable Cartesian product queries with:

SET odps.sql.allow.cartesian=true;
SELECT * FROM A, B;

Result:

+------+----------+------+----------+
| key  | ds       | key2 | ds2      |
+------+----------+------+----------+
| 1    | 20180101 | 1    | 20180101 |
| 2    | 20180101 | 1    | 20180101 |
| 2    | 20180102 | 1    | 20180101 |
| 1    | 20180101 | 3    | 20180101 |
| 2    | 20180101 | 3    | 20180101 |
| 2    | 20180102 | 3    | 20180101 |
| 1    | 20180101 | 2    | 20180102 |
| 2    | 20180101 | 2    | 20180102 |
| 2    | 20180102 | 2    | 20180102 |
+------+----------+------+----------+

INNER JOIN

An INNER JOIN returns rows from the Cartesian product that satisfy the join condition.

Result: Filter condition placement does not affect the result.

Case 1 — filter in subquery:

SELECT A.*, B.*
FROM
  (SELECT * FROM A WHERE ds='20180101') A
JOIN
  (SELECT * FROM B WHERE ds='20180101') B
ON a.key = b.key;

Case 2 — filter in ON clause:

SELECT A.*, B.*
FROM A JOIN B
ON a.key = b.key AND A.ds='20180101' AND B.ds='20180101';

Case 3 — filter in outer WHERE:

SELECT A.*, B.*
FROM A JOIN B
ON a.key = b.key
WHERE A.ds='20180101' AND B.ds='20180101';

All three cases return:

+------+----------+------+----------+
| key  | ds       | key2 | ds2      |
+------+----------+------+----------+
| 1    | 20180101 | 1    | 20180101 |
+------+----------+------+----------+

For INNER JOIN, the query engine first applies the join condition (producing 3 matching rows from the 9-row Cartesian product), then applies the outer WHERE filter. Because both steps must be satisfied, the final result is the same regardless of placement.

LEFT JOIN

A LEFT JOIN returns all rows from the left table (preserved-row table). Rows in the left table with no match in the right table appear with NULL values in right-table columns.

Result: Filter condition placement affects the result.

Watch out: Filtering on a right-table (null-supplying) column in the outer WHERE clause eliminates all NULL-carrying rows — the rows where the left table had no match. This converts the LEFT JOIN into an effective INNER JOIN.
  • For left-table filters: a subquery filter and an outer WHERE filter produce the same result.

  • For right-table filters: a subquery filter and an ON clause filter produce the same result.

Case 1 — filter in subquery:

SELECT A.*, B.*
FROM
  (SELECT * FROM A WHERE ds='20180101') A
LEFT JOIN
  (SELECT * FROM B WHERE ds='20180101') B
ON a.key = b.key;

Result (2 rows — left-table rows with ds='20180101', right-table match or NULL):

+------+----------+------+----------+
| key  | ds       | key2 | ds2      |
+------+----------+------+----------+
| 1    | 20180101 | 1    | 20180101 |
| 2    | 20180101 | NULL | NULL     |
+------+----------+------+----------+

Case 2 — filter in ON clause:

SELECT A.*, B.*
FROM A LEFT JOIN B
ON a.key = b.key AND A.ds='20180101' AND B.ds='20180101';

The ON clause filters which rows in B can match, but all rows of A are still preserved. The 9-row Cartesian product has 1 matching pair. The 2 unmatched A rows return NULL for B columns.

Result (3 rows — all rows from A, match or NULL):

+------+----------+------+----------+
| key  | ds       | key2 | ds2      |
+------+----------+------+----------+
| 1    | 20180101 | 1    | 20180101 |
| 2    | 20180101 | NULL | NULL     |
| 2    | 20180102 | NULL | NULL     |
+------+----------+------+----------+

Case 3 — filter in outer WHERE:

SELECT A.*, B.*
FROM A LEFT JOIN B
ON a.key = b.key
WHERE A.ds='20180101' AND B.ds='20180101';

The join produces rows with NULL in B.ds for unmatched A rows. The WHERE filter B.ds='20180101' cannot match NULL, so those rows are eliminated.

Result (1 row — same as INNER JOIN):

+------+----------+------+----------+
| key  | ds       | key2 | ds2      |
+------+----------+------+----------+
| 1    | 20180101 | 1    | 20180101 |
+------+----------+------+----------+

RIGHT JOIN

RIGHT JOIN mirrors LEFT JOIN with the tables reversed. All rows from the right table (preserved-row table) are returned, and left-table columns carry NULL where no match exists.

Result: Filter condition placement affects the result.

  • For right-table filters: a subquery filter and an outer WHERE filter produce the same result.

  • For left-table filters: a subquery filter and an ON clause filter produce the same result.

The same pitfall applies: filtering on a left-table (null-supplying) column in the outer WHERE clause eliminates NULL-carrying rows, effectively converting the RIGHT JOIN into an INNER JOIN.

FULL JOIN

A FULL JOIN returns all rows from both tables. Both tables are simultaneously preserved-row tables and null-supplying tables. Where a row has no match, NULL values fill the unmatched columns.

Result: Filter condition placement affects the result.

Watch out: For FULL JOIN, filters placed in the ON clause or outer WHERE clause do not restrict which rows are returned from each table — they only affect whether matches are found. Only subquery filters reliably restrict input before the join.

Case 1 — filter in subquery:

SELECT A.*, B.*
FROM
  (SELECT * FROM A WHERE ds='20180101') A
FULL JOIN
  (SELECT * FROM B WHERE ds='20180101') B
ON a.key = b.key;

Result (3 rows):

+------+----------+------+----------+
| key  | ds       | key2 | ds2      |
+------+----------+------+----------+
| 2    | 20180101 | NULL | NULL     |
| 1    | 20180101 | 1    | 20180101 |
| NULL | NULL     | 3    | 20180101 |
+------+----------+------+----------+

Case 2 — filter in ON clause:

SELECT A.*, B.*
FROM A FULL JOIN B
ON a.key = b.key AND A.ds='20180101' AND B.ds='20180101';

Only 1 row from the 9-row Cartesian product meets the join condition. The 2 unmatched A rows get NULL for B columns, and the 2 unmatched B rows get NULL for A columns.

Result (5 rows):

+------+----------+------+----------+
| key  | ds       | key2 | ds2      |
+------+----------+------+----------+
| NULL | NULL     | 2    | 20180102 |
| 2    | 20180101 | NULL | NULL     |
| 2    | 20180102 | NULL | NULL     |
| 1    | 20180101 | 1    | 20180101 |
| NULL | NULL     | 3    | 20180101 |
+------+----------+------+----------+

Case 3 — filter in outer WHERE:

SELECT A.*, B.*
FROM A FULL JOIN B
ON a.key = b.key
WHERE A.ds='20180101' AND B.ds='20180101';

The join produces 4 rows (3 matching + 1 unmatched B row with NULL A columns). The WHERE filter eliminates rows where either column is NULL.

Result (1 row):

+------+----------+------+----------+
| key  | ds       | key2 | ds2      |
+------+----------+------+----------+
| 1    | 20180101 | 1    | 20180101 |
+------+----------+------+----------+

LEFT SEMI JOIN

A LEFT SEMI JOIN returns rows from the left table that have at least one match in the right table. Right-table rows are not included in the result, so right-table columns cannot be referenced in the outer WHERE clause.

Result: Filter condition placement does not affect the result.

Case 1 — filter in subquery:

SELECT A.*
FROM
  (SELECT * FROM A WHERE ds='20180101') A
LEFT SEMI JOIN
  (SELECT * FROM B WHERE ds='20180101') B
ON a.key = b.key;

Case 2 — filter in ON clause:

SELECT A.*
FROM A LEFT SEMI JOIN B
ON a.key = b.key AND A.ds='20180101' AND B.ds='20180101';

Case 3 — filter in outer WHERE (right-table filter must be in subquery):

SELECT A.*
FROM A LEFT SEMI JOIN
  (SELECT * FROM B WHERE ds='20180101') B
ON a.key = b.key
WHERE A.ds='20180101';

All three cases return:

+------+----------+
| key  | ds       |
+------+----------+
| 1    | 20180101 |
+------+----------+

LEFT ANTI JOIN

A LEFT ANTI JOIN returns rows from the left table that have no match in the right table. Right-table rows are not included in the result, so right-table columns cannot be referenced in the outer WHERE clause.

Result: Filter condition placement affects the result.

  • For left-table filters: a subquery filter and an outer WHERE filter produce the same result.

  • For right-table filters: a subquery filter and an ON clause filter produce the same result.

Case 1 — filter in subquery:

SELECT A.*
FROM
  (SELECT * FROM A WHERE ds='20180101') A
LEFT ANTI JOIN
  (SELECT * FROM B WHERE ds='20180101') B
ON a.key = b.key;

Result (1 row):

+------+----------+
| key  | ds       |
+------+----------+
| 2    | 20180101 |
+------+----------+

Case 2 — filter in ON clause:

SELECT A.*
FROM A LEFT ANTI JOIN B
ON a.key = b.key AND A.ds='20180101' AND B.ds='20180101';

The ON clause restricts which B rows can serve as a match. With the narrower join condition, more A rows have no match and are returned.

Result (2 rows):

+------+----------+
| key  | ds       |
+------+----------+
| 2    | 20180101 |
| 2    | 20180102 |
+------+----------+

Case 3 — filter in outer WHERE (right-table filter must be in subquery):

SELECT A.*
FROM A LEFT ANTI JOIN
  (SELECT * FROM B WHERE ds='20180101') B
ON a.key = b.key
WHERE A.ds='20180101';

The join returns 2 rows, then the outer WHERE filters to 1.

Result (1 row):

+------+----------+
| key  | ds       |
+------+----------+
| 2    | 20180101 |
+------+----------+

Filter placement reference

Behavior by JOIN type

JOIN type Left-table filter Right-table filter
INNER JOIN Any position gives the same result Any position gives the same result
LEFT SEMI JOIN Any position gives the same result Any position gives the same result
LEFT JOIN Subquery = outer WHERE Subquery = ON clause
LEFT ANTI JOIN Subquery = outer WHERE Subquery = ON clause
RIGHT JOIN Subquery = ON clause Subquery = outer WHERE
FULL JOIN Subquery only Subquery only

Recommendations

Use subquery filters for outer JOINs. Subquery filters restrict input data before the join runs, so join semantics are never affected. This is the safest approach across all JOIN types.

If you use the ON clause for null-supplying table filters in a LEFT JOIN or RIGHT JOIN, outer join semantics are preserved. Placing those filters in the outer WHERE clause eliminates NULL-carrying rows and effectively converts the outer join into an INNER JOIN.

For FULL JOIN, subquery filters are the only option that reliably restricts input — filters in the ON clause or outer WHERE clause change which rows match but do not exclude unmatched rows from either table.

What's next

  • JOIN — syntax reference for standard JOIN operations in MaxCompute SQL

  • SEMI JOIN — syntax reference for LEFT SEMI JOIN and LEFT ANTI JOIN

  • MAPJOIN HINT — improve performance when joining a large table with a small table

  • DISTRIBUTED MAPJOIN — improve performance when joining a large table with a medium-sized table

  • SKEWJOIN HINT — handle hot key values that cause long tail issues in JOIN operations