SEMI JOIN filters rows from the left table based on whether a match exists in the right table. The right table is used only as a filter — its columns never appear in the result set.
MaxCompute supports two SEMI JOIN variants:
| Variant | Behavior |
|---|---|
LEFT SEMI JOIN | Returns rows from the left table that have a match in the right table. |
LEFT ANTI JOIN | Returns rows from the left table that have no match in the right table. |
The two variants are mirror opposites: LEFT SEMI JOIN keeps matching rows; LEFT ANTI JOIN keeps non-matching rows.
LEFT SEMI JOIN is similar to IN SUBQUERY in MaxCompute — use whichever fits your query style. For details, see IN SUBQUERY.
LEFT ANTI JOIN is similar to NOT IN SUBQUERY, but there are behavioral differences. For details, see NOT IN SUBQUERY.
To improve join performance, both variants support MAPJOIN hints. For details, see MAPJOIN hints.
Limitations
The right table can only be referenced in the ON clause. It cannot be referenced in the WHERE or SELECT clause.
Examples
The following examples use two tables: sale_detail and sale_detail_sj. Run the setup script before executing the examples.
-- Create the sale_detail table
CREATE TABLE IF NOT EXISTS sale_detail
(
shop_name STRING,
customer_id STRING,
total_price DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);
-- Create the sale_detail_sj table
CREATE TABLE IF NOT EXISTS sale_detail_sj
(
shop_name STRING,
customer_id STRING,
total_price DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);
-- Add partitions
ALTER TABLE sale_detail ADD PARTITION (sale_date='2013', region='china');
ALTER TABLE sale_detail_sj ADD PARTITION (sale_date='2013', region='china');
-- Insert data into sale_detail: three rows with prices 100.1, 100.2, and 100.3
INSERT INTO sale_detail PARTITION (sale_date='2013', region='china')
VALUES ('s1','c1',100.1), ('s2','c2',100.2), ('s3','c3',100.3);
-- Insert data into sale_detail_sj: prices 100.1 and 100.2 exist; 100.3 does not
INSERT INTO sale_detail_sj PARTITION (sale_date='2013', region='china')
VALUES ('s1','c1',100.1), ('s2','c2',100.2), ('s5','c2',100.2), ('s2','c2',100.2);Example 1: LEFT SEMI JOIN
Return rows from sale_detail whose total_price appears in sale_detail_sj.
SELECT * FROM sale_detail a LEFT SEMI JOIN sale_detail_sj b ON a.total_price = b.total_price;Result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s2 | c2 | 100.2 | 2013 | china |
| s1 | c1 | 100.1 | 2013 | china |
+------------+-------------+-------------+------------+------------+Prices 100.1 and 100.2 exist in sale_detail_sj, so s1 and s2 are returned. Price 100.3 has no match, so s3 is excluded. Only columns from sale_detail appear in the result — the right table contributes nothing to the output.
Example 2: LEFT ANTI JOIN
Return rows from sale_detail whose total_price does not appear in sale_detail_sj.
SELECT * FROM sale_detail a LEFT ANTI JOIN sale_detail_sj b ON a.total_price = b.total_price;Result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s3 | c3 | 100.3 | 2013 | china |
+------------+-------------+-------------+------------+------------+Price 100.3 has no match in sale_detail_sj, so only s3 is returned. This result is the complement of Example 1: together, both results cover all three rows in sale_detail.