MaxCompute supports four set operators for combining query result sets: INTERSECT, UNION, EXCEPT, and MINUS. Each operator has two variants that differ in how they handle duplicate rows.
Overview
| Operator | Returns |
|---|---|
INTERSECT | Rows present in both datasets |
UNION | All rows from both datasets combined |
EXCEPT / MINUS | Rows in the left dataset that are not in the right dataset |
EXCEPT and MINUS are synonyms — they produce identical results.
Duplicate handling: set semantics vs. bag semantics
All set operators work in one of two modes:
Set semantics (default): removes duplicate rows from the result, equivalent to applying
DISTINCT. UseINTERSECT,UNION, orEXCEPTwithout theALLkeyword.Bag semantics: preserves duplicate rows. Use
INTERSECT ALL,UNION ALL, orEXCEPT ALL.
In short: with ALL, duplicates are kept; without ALL, duplicates are removed.
Limits
Set operators can combine a maximum of 256 datasets in a single statement. Exceeding this limit returns an error.
Both datasets must have the same number of columns.
Usage notes
Results are not guaranteed to be in any specific order unless you add an
ORDER BYclause.If the corresponding columns have incompatible data types, MaxCompute performs implicit type conversion before executing the operation. See Data types for conversion rules.
Implicit type conversion between
STRINGand other types is disabled for all set operations. Cast columns explicitly when combiningSTRINGwith non-string types.
INTERSECT
Returns rows that appear in both datasets.
Syntax
-- Keep duplicate rows in the result.
<select_statement1> INTERSECT ALL <select_statement2>;
-- Remove duplicate rows from the result. INTERSECT and INTERSECT DISTINCT are equivalent.
<select_statement1> INTERSECT [DISTINCT] <select_statement2>;Parameters
| Parameter | Required | Description |
|---|---|---|
select_statement1, select_statement2 | Yes | SELECT clauses to combine. See SELECT syntax. |
DISTINCT | No | Removes duplicate rows from the intersection. Omitting ALL has the same effect. |
Examples
Example 1: Return the intersection, keeping duplicate rows.
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b)
INTERSECT ALL
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);Result:
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 2 |
| 3 | 4 |
+------------+------------+Example 2: Return the intersection, removing duplicate rows.
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b)
INTERSECT DISTINCT
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);
-- Equivalent to:
SELECT DISTINCT * FROM
(SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b)
INTERSECT ALL
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b)) t;Result:
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 3 | 4 |
+------------+------------+UNION
Returns all rows from both datasets combined.
Syntax
-- Keep duplicate rows in the result.
<select_statement1> UNION ALL <select_statement2>;
-- Remove duplicate rows from the result.
<select_statement1> UNION [DISTINCT] <select_statement2>;Parameters
| Parameter | Required | Description |
|---|---|---|
select_statement1, select_statement2 | Yes | SELECT clauses to combine. See SELECT syntax. |
DISTINCT | No | Removes duplicate rows from the union result. |
Usage notes
When chaining multiple
UNION ALLoperations, use parentheses to specify evaluation order.The behavior of
CLUSTER BY,DISTRIBUTE BY,SORT BY,ORDER BY, andLIMITclauses followingUNIONdepends on theodps.sql.type.system.odps2setting:SET odps.sql.type.system.odps2=true;— the clause applies to the results of allUNIONoperations.SET odps.sql.type.system.odps2=false;— the clause applies only to the lastselect_statementin theUNION.
Examples
Example 1: Return the union, keeping duplicate rows.
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b)
UNION ALL
SELECT * FROM VALUES (1, 2), (1, 4) t(a, b);Result:
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 2 |
| 3 | 4 |
| 1 | 2 |
| 1 | 4 |
+------------+------------+Example 2: Return the union, removing duplicate rows.
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b)
UNION DISTINCT
SELECT * FROM VALUES (1, 2), (1, 4) t(a, b);
-- Equivalent to:
SELECT DISTINCT * FROM (
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b)
UNION ALL
SELECT * FROM VALUES (1, 2), (1, 4) t(a, b));Result:
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 4 |
| 3 | 4 |
+------------+------------+Example 3: Use parentheses to control the evaluation order of UNION ALL operations.
SELECT * FROM VALUES (1, 2), (1, 2), (5, 6) t(a, b)
UNION ALL
(SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b)
UNION ALL
SELECT * FROM VALUES (1, 2), (1, 4) t(a, b));Result:
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 2 |
| 5 | 6 |
| 1 | 2 |
| 1 | 2 |
| 3 | 4 |
| 1 | 2 |
| 1 | 4 |
+------------+------------+Example 4: Use UNION ALL followed by ORDER BY and LIMIT with odps.sql.type.system.odps2=true. The ORDER BY and LIMIT apply to the combined result set.
SET odps.sql.type.system.odps2=true;
SELECT explode(ARRAY(3, 1)) AS (a) UNION ALL SELECT explode(ARRAY(0, 4, 2)) AS (a) ORDER BY a limit 3;Result:
+------------+
| a |
+------------+
| 0 |
| 1 |
| 2 |
+------------+Example 5: Use UNION ALL followed by ORDER BY and LIMIT with odps.sql.type.system.odps2=false. The ORDER BY and LIMIT apply only to the last SELECT statement, not to the combined result. All rows from both queries are returned.
SET odps.sql.type.system.odps2=false;
SELECT explode(ARRAY(3, 1)) AS (a) UNION ALL SELECT explode(ARRAY(0, 4, 2)) AS (a) ORDER BY a limit 3;Result:
+------------+
| a |
+------------+
| 3 |
| 1 |
| 0 |
| 2 |
| 4 |
+------------+EXCEPT and MINUS
Returns rows from the left dataset that do not appear in the right dataset. EXCEPT and MINUS are synonyms.
Syntax
-- Keep duplicate rows in the result.
<select_statement1> EXCEPT ALL <select_statement2>;
<select_statement1> MINUS ALL <select_statement2>;
-- Remove duplicate rows from the result.
<select_statement1> EXCEPT [DISTINCT] <select_statement2>;
<select_statement1> MINUS [DISTINCT] <select_statement2>;Parameters
| Parameter | Required | Description |
|---|---|---|
select_statement1, select_statement2 | Yes | SELECT clauses to combine. See SELECT syntax. |
DISTINCT | No | Removes duplicate rows from the result. |
Examples
Example 1: Return rows in the left dataset that are not in the right dataset, keeping duplicate rows.
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b)
EXCEPT ALL
SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
-- Equivalent to:
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b)
MINUS ALL
SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);Result:
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 2 |
| 3 | 4 |
| 7 | 8 |
+------------+------------+Example 2: Return rows in the left dataset that are not in the right dataset, removing duplicate rows.
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b)
EXCEPT DISTINCT
SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
-- Equivalent to:
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b)
MINUS DISTINCT
SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
-- Both are equivalent to:
SELECT DISTINCT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) except all select * from values (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);Result:
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 7 | 8 |
+------------+------------+