MaxCompute supports two types of SEMI JOIN: LEFT SEMI JOIN and LEFT ANTI JOIN. Unlike a regular join, the right table is never included in the result set — it only filters rows from the left table. This makes SEMI JOIN efficient for existence checks and exclusion queries.
Overview
| Join type | Returns |
|---|---|
LEFT SEMI JOIN | Rows from the left table that have a matching row in the right table |
LEFT ANTI JOIN | Rows from the left table that have no matching row in the right table |
Both join types support MAPJOIN hints to improve query performance.
Syntax
LEFT SEMI JOIN
relation LEFT SEMI JOIN relation [join_criteria]Returns rows from the left table where the join condition is satisfied. Equivalent to IN subquery.
LEFT ANTI JOIN
relation LEFT ANTI JOIN relation [join_criteria]Returns rows from the left table where the join condition is not satisfied. Functionally similar to NOT IN subquery.
Sample data
The following examples use two tables: sale_detail and sale_detail_sj. Run the statements below to create the tables and insert data.
-- Create a partitioned table named sale_detail
CREATE TABLE IF NOT EXISTS sale_detail
(
shop_name STRING,
customer_id STRING,
total_price DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);
CREATE TABLE IF NOT EXISTS sale_detail_sj
(
shop_name STRING,
customer_id STRING,
total_price DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);
-- Add partitions
ALTER TABLE sale_detail ADD PARTITION (sale_date='2013', region='china');
ALTER TABLE sale_detail_sj ADD PARTITION (sale_date='2013', region='china');
-- Insert data
INSERT INTO sale_detail PARTITION (sale_date='2013', region='china')
VALUES ('s1','c1',100.1), ('s2','c2',100.2), ('s3','c3',100.3);
INSERT INTO sale_detail_sj PARTITION (sale_date='2013', region='china')
VALUES ('s1','c1',100.1), ('s2','c2',100.2), ('s5','c2',100.2), ('s2','c2',100.2);Verify the inserted data:
SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail;Result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
+------------+-------------+-------------+------------+------------+SELECT * FROM sale_detail_sj;Result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s5 | c2 | 100.2 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
+------------+-------------+-------------+------------+------------+Examples
Example 1: LEFT SEMI JOIN
Return all rows from sale_detail where total_price also appears in sale_detail_sj:
SELECT * FROM sale_detail a LEFT SEMI JOIN sale_detail_sj b ON a.total_price = b.total_price;Result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s2 | c2 | 100.2 | 2013 | china |
| s1 | c1 | 100.1 | 2013 | china |
+------------+-------------+-------------+------------+------------+Row s3 (total_price = 100.3) is excluded because 100.3 does not appear in sale_detail_sj.
Example 2: LEFT ANTI JOIN
Return all rows from sale_detail where total_price does not appear in sale_detail_sj:
SELECT * FROM sale_detail a LEFT ANTI JOIN sale_detail_sj b ON a.total_price = b.total_price;Result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s3 | c3 | 100.3 | 2013 | china |
+------------+-------------+-------------+------------+------------+Only row s3 is returned because 100.3 is the only total_price value not present in sale_detail_sj.
What's next
MAPJOIN hints — Improve LEFT SEMI JOIN and LEFT ANTI JOIN performance
IN subquery — Alternative to LEFT SEMI JOIN
NOT IN subquery — Alternative to LEFT ANTI JOIN