This topic describes how to perform the following operations on datasets of query results in MaxCompute: INTERSECT, INTERSECT ALL, INTERSECT DISTINCT, UNION, UNION ALL, UNION DISTINCT, EXCEPT, EXCEPT ALL, EXCEPT DISTINCT, MINUS, MINUS ALL, and MINUS DISTINCT.
Description
MaxCompute supports the following operations on datasets:
INTERSECT: returns the intersection of two datasets. The intersection includes the values that are contained in both datasets.
UNION: returns the union of two datasets. The union is a dataset that is obtained by combining the two datasets.
EXCEPT and MINUS: returns distinct values from one of two datasets. These values are not contained in the other dataset.
Limits
The following limits are imposed on INTERSECT, UNION, EXCEPT, and MINUS:
MaxCompute allows you to perform INTERSECT, UNION, EXCEPT, or MINUS operations on a maximum of 256 datasets at the same time. If the number of datasets exceeds 256, an error is returned.
The number of columns in the left and right tables must be the same.
Usage notes
When you perform INTERSECT, UNION, EXCEPT, or MINUS operations on two datasets, take note of the following items:
The results of INTERSECT, UNION, EXCEPT, or MINUS operations may not be sorted in a specific order.
If data types of the datasets are not consistent, MaxCompute implicitly converts the data types before you perform INTERSECT, UNION, EXCEPT, or MINUS operations. For more information about implicit conversions, see Data types. To prevent compatibility issues, MaxCompute disables implicit conversions between data of the STRING type and data of other types for INTERSECT, UNION, EXCEPT, or MINUS operations.
INTERSECT
Syntax
-- Obtain an intersection that contains duplicate values. <select_statement1> INTERSECT ALL <select_statement2>; -- Obtain an intersection that does not contain duplicate values. The usage of INTERSECT is equivalent to that of INTERSECT DISTINCT. <select_statement1> INTERSECT [DISTINCT] <select_statement2>;Parameters
select_statement1 and select_statement2: required. These parameters specify the
SELECTclauses. For more information about the syntax of the clauses, see SELECT syntax.distinct: optional. This parameter is used to remove duplicate values from the intersection of two datasets.
Examples
Example 1: Obtain the intersection of two datasets. The intersection contains duplicate values. Sample statement:
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b) INTERSECT ALL SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);The following result is returned:
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 2 | | 3 | 4 | +------------+------------+Example 2: Obtain the intersection of two datasets. The intersection does not contain duplicate values. Sample statements:
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b) intersect distinct SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b); -- The preceding statement is equivalent to the following statement: SELECT DISTINCT * FROM (SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b) INTERSECT ALL SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b)) t;The following result is returned:
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 3 | 4 | +------------+------------+
UNION
Syntax
-- Obtain a union that contains duplicate values. <select_statement1> UNION ALL <select_statement2>; -- Obtain a union that does not contain duplicate values. <select_statement1> UNION [DISTINCT] <select_statement2>;Usage notes
If multiple
UNION ALLoperations exist, use parentheses () to specify the priorities of theUNION ALLoperations.If
UNIONis followed by aCLUSTER BY,DISTRIBUTE BY,SORT BY,ORDER BY, orLIMITclause andSET odps.sql.type.system.odps2=false;is configured, the clause works only on the lastselect_statementofUNION. IfSET odps.sql.type.system.odps2=false;is configured, the clause works on the results of allUNIONoperations.
Parameters
select_statement1 and select_statement2: required. These parameters specify the
SELECTclauses. For more information about the syntax of the clauses, see SELECT syntax.distinct: optional. This parameter is used to remove duplicate values from the union of two datasets.
Examples
Example 1: Obtain the union of two datasets. The union contains duplicate values. Sample statement:
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b) UNION ALL SELECT * FROM VALUES (1, 2), (1, 4) t(a, b);The following result is returned:
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 2 | | 3 | 4 | | 1 | 2 | | 1 | 4 | +------------+------------+Example 2: Obtain the union of two datasets. The union does not contain duplicate values. Sample statements:
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b) UNION DISTINCT SELECT * FROM VALUES (1, 2), (1, 4) t(a, b); -- The preceding statement is equivalent to the following statement: SELECT DISTINCT * FROM ( SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b) UNION ALL SELECT * FROM VALUES (1, 2), (1, 4) t(a, b));The following result is returned:
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 4 | | 3 | 4 | +------------+------------+Example 3: Use parentheses () to specify the priorities of
UNION ALLoperations. Sample statement:SELECT * FROM VALUES (1, 2), (1, 2), (5, 6) t(a, b) UNION ALL (SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b) UNION ALL SELECT * FROM VALUES (1, 2), (1, 4) t(a, b));The following result is returned:
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 2 | | 5 | 6 | | 1 | 2 | | 1 | 2 | | 3 | 4 | | 1 | 2 | | 1 | 4 | +------------+------------+Example 4: Use
UNIONthat is followed by aCLUSTER BY,DISTRIBUTE BY,SORT BY,ORDER BY, orLIMITclause in a SELECT statement and configureSET odps.sql.type.system.odps2=true;. Sample statements:SET odps.sql.type.system.odps2=true; SELECT explode(ARRAY(3, 1)) AS (a) UNION ALL SELECT explode(ARRAY(0, 4, 2)) AS (a) ORDER BY a limit 3;The following result is returned:
+------------+ | a | +------------+ | 0 | | 1 | | 2 | +------------+Example 5: Use
UNIONthat is followed by aCLUSTER BY,DISTRIBUTE BY,SORT BY,ORDER BY, orLIMITclause in a SELECT statement and configureSET odps.sql.type.system.odps2=false;. Sample statements:SET odps.sql.type.system.odps2=false; SELECT explode(ARRAY(3, 1)) AS (a) UNION ALL SELECT explode(ARRAY(0, 4, 2)) AS (a) ORDER BY a limit 3;The following result is returned:
+------------+ | a | +------------+ | 3 | | 1 | | 0 | | 2 | | 4 | +------------+
EXCEPT and MINUS
Syntax
-- Obtain the supplementary set of two datasets. The supplementary set contains duplicate values. <select_statement1> EXCEPT ALL <select_statement2>; <select_statement1> minus ALL <select_statement2>; -- Obtain the supplementary set of two datasets. The supplementary set does not contain duplicate values. <select_statement1> EXCEPT [DISTINCT] <select_statement2>; <select_statement1> minus [DISTINCT] <select_statement2>;NoteThe usage of
EXCEPTis equivalent to that ofMINUS.Parameters
select_statement1 and select_statement2: required. These parameters specify the
SELECTclauses. For more information about the syntax of the clauses, see SELECT syntax.distinct: optional. This parameter is used to remove duplicate values from the supplementary set of two datasets.
Examples
Example 1: Obtain the supplementary set of two datasets. The supplementary set contains duplicate values. Sample statements:
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) EXCEPT ALL SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b); -- The preceding statement is equivalent to the following statement: SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) MINUS ALL SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);The following result is returned:
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 2 | | 3 | 4 | | 7 | 8 | +------------+------------+Example 2: Obtain the supplementary set of two datasets. The supplementary set does not contain duplicate values. Sample statements:
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) EXCEPT DISTINCT SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b); -- The preceding statement is equivalent to the following statements: SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) MINUS DISTINCT SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b); -- The preceding statements are equivalent to the following statement: SELECT DISTINCT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) except all select * from values (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);The following result is returned:
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 7 | 8 | +------------+------------+