MaxCompute supports subquery operations, and some of these subqueries are internally optimized by being converted into JOINs during execution. To further enhance the execution efficiency of a subquery, you can use the SUBQUERY_MAPJOIN HINT within the subquery statement to explicitly specify the MAPJOIN algorithm. This topic describes how to use the SUBQUERY_MAPJOIN HINT.
Limits
Only supports the SCALAR, IN, NOT IN, EXISTS, and NOT EXISTS subqueries, and does not support the basic subqueries. For more information on subqueries, see Subqueries.
You do not need to manually specify a small table when using the SUBQUERY_MAPJOIN HINT. The system automatically uses the result of the subquery execution as the small table for the MAPJOIN.
ImportantIf the result of the subquery is excessively large, it could trigger an Out of Memory error during MAPJOIN operations. It is essential to verify that the subquery result is a small table prior to employing this HINT. For information on MAPJOIN limitations, see Limits.
In certain scenarios, a subquery is not be converted into a JOIN operation for execution. When the SUBQUERY_MAPJOIN HINT is utilized in these scenarios, the system will issue a warning. For detailed instructions, see Example 5: Subqueries not converted to JOIN output a warning while returning query results.
Usage method
To execute the MAPJOIN algorithm within a subquery statement, you must use and place the HINT /*+ subquery_mapjoin */
immediately after the left parenthesis of the subquery. The following illustrates the usage:
Consider the table definitions for t1 and t2:
CREATE TABLE t1(a BIGINT, b BIGINT);
CREATE TABLE t2(a BIGINT, b BIGINT);
SCALAR SUBQUERY
SELECT a, (/*+ subquery_mapjoin */ SELECT b FROM t2 WHERE a = t1.a) FROM t1;
IN and NOT IN SUBQUERY
SELECT * FROM t1 WHERE a IN (/*+ subquery_mapjoin */ SELECT a FROM t2 WHERE b = t1.b); SELECT * FROM t1 WHERE a NOT IN (/*+ subquery_mapjoin */ SELECT a FROM t2 WHERE b = t1.b);
EXISTS and NOT EXISTS SUBQUERY
SELECT * FROM t1 WHERE EXISTS (/*+ subquery_mapjoin */ SELECT * FROM t2 WHERE b = t1.b); SELECT * FROM t1 WHERE NOT EXISTS (/*+ subquery_mapjoin */ SELECT * FROM t2 WHERE b = t1.b);
Incorrect usage
-- The following code is incorrect because the SUBQUERY_MAPJOIN HINT is not written immediately after the left parenthesis corresponding to the subquery SELECT * FROM t1 WHERE a IN (SELECT /*+ subquery_mapjoin */ a FROM t2 WHERE b = t1.b);
Sample data
To facilitate understanding of the examples provided, sample source data is available.
Create the
sale_detail
andshop_detail
tables and populate them with data. Sample command:-- Create a partitioned table sale_detail CREATE TABLE if NOT EXISTS sale_detail ( shop_name STRING, customer_id STRING, total_price DOUBLE ) PARTITIONED BY (sale_date STRING, region STRING); -- Add partitions to the sale_detail table ALTER TABLE sale_detail ADD PARTITION (sale_date='2013', region='china') PARTITION (sale_date='2014', region='shanghai'); -- Append data to the sale_detail table INSERT INTO sale_detail PARTITION (sale_date='2013', region='china') VALUES ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3); INSERT INTO sale_detail PARTITION (sale_date='2014', region='shanghai') VALUES ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5); -- Create a shop_detail table and insert data from the 2013 partition of the sale_detail table SET odps.sql.allow.fullscan=true; CREATE TABLE shop_detail AS SELECT shop_name,customer_id,total_price FROM sale_detail WHERE sale_date='2013'AND region='china';
Query data from the
sale_detail
table. Sample command:SET odps.sql.allow.fullscan=true; SELECT * FROM sale_detail;
The query returns the following result:
+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | | null | c5 | NULL | 2014 | shanghai | | s6 | c6 | 100.4 | 2014 | shanghai | | s7 | c7 | 100.5 | 2014 | shanghai | +------------+-------------+-------------+------------+------------+
Query data from the
shop_detail
table. Sample command:SELECT * FROM shop_detail;
The query returns the following result:
+------------+-------------+-------------+ | shop_name | customer_id | total_price | +------------+-------------+-------------+ | s1 | c1 | 100.1 | | s2 | c2 | 100.2 | | s3 | c3 | 100.3 | +------------+-------------+-------------+
Usage examples
The following examples utilize Sample data to demonstrate the application of the SUBQUERY_MAPJOIN HINT.
Example 1: SCALAR SUBQUERY
SET odps.sql.allow.fullscan=true;
SELECT * FROM shop_detail WHERE (/*+ subquery_mapjoin */ SELECT COUNT(*) FROM sale_detail WHERE sale_detail.shop_name = shop_detail.shop_name) >= 1;
The query returns the following result:
+------------+-------------+-------------+
| shop_name | customer_id | total_price |
+------------+-------------+-------------+
| s1 | c1 | 100.1 |
| s2 | c2 | 100.2 |
| s3 | c3 | 100.3 |
+------------+-------------+-------------+
Example 2: IN SUBQUERY
SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail WHERE total_price IN (/*+ subquery_mapjoin */ SELECT total_price FROM shop_detail WHERE customer_id = shop_detail.customer_id);
The query returns the following result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
+------------+-------------+-------------+------------+------------+
Example 3: EXISTS SUBQUERY
SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail WHERE EXISTS (/*+ subquery_mapjoin */ SELECT * FROM shop_detail WHERE customer_id = sale_detail.customer_id);
The query returns the following result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
+------------+-------------+-------------+------------+------------+
Example 4: Basic subqueries do not support specifying the use of SUBQUERY_MAPJOIN HINT
The following query will fail because basic subqueries do not support the subquery_mapjoin HINT.
SET odps.sql.allow.fullscan=true;
SELECT * FROM (/*+ subquery_mapjoin */ SELECT shop_name FROM sale_detail) a;
The query returns the following result:
-- Error message
FAILED: ODPS-0130161:[1,16] Parse exception - invalid subquery_mapjoin hint, should only be used for scalar/in/exists subquery
Example 5: Subqueries not converted to JOIN output a warning while returning query results
-- Add a column to the shop_detail table and add data
ALTER TABLE shop_detail ADD columns if not exists(sale_date STRING);
INSERT OVERWRITE TABLE shop_detail VALUES ('s1','c1',100.1,'2013'),('s2','c2',100.2,'2013'),('s3','c3',100.3,'2013');
-- This flag allows the system to output warnings. If the default configuration of the project already meets the requirements, there is no need to set it
SET odps.compiler.warning.disable=false;
/** The following query involves partitioned tables. To support partition pruning, the system does not convert the subquery to join.
When using the SUBQUERY_MAPJOIN HINT, the system will output a warning **/
SELECT * FROM sale_detail WHERE sale_date IN (/*+ subquery_mapjoin */ SELECT sale_date FROM shop_detail);
The query returns the following result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
+------------+-------------+-------------+------------+------------+
The query issues the following warning:
WARNING:[1,47] subquery_mapjoin hint does not work because the subquery is not converted to join