Use the SUBQUERY_MAPJOIN hint to force MAPJOIN on subqueries that MaxCompute converts to JOINs at execution time, improving query performance when the subquery result is small.
Limitations
-
Supported subquery types: scalar, IN, NOT IN, EXISTS, and NOT EXISTS subqueries. Basic subqueries (derived tables such as
SELECT * FROM (subquery) alias) are not supported. -
The system automatically uses the subquery result as the small table for MAPJOIN. No manual table designation is needed.
-
If the subquery result is large, MAPJOIN may trigger an Out of Memory (OOM) error. Verify that the subquery result is a small table before using this hint. For MAPJOIN size limits, see Limits.
-
In some scenarios, MaxCompute does not convert a subquery to a JOIN — for example, when partition pruning applies. In these cases, adding the hint causes the system to issue a warning, but the query still returns results. See Example 5: Warning when a subquery is not converted to a JOIN.
Syntax
Place /*+ subquery_mapjoin */ immediately after the opening parenthesis of the subquery. The hint must be the first token inside the parenthesis.
Given two tables:
CREATE TABLE t1(a BIGINT, b BIGINT);
CREATE TABLE t2(a BIGINT, b BIGINT);
The following shows correct placement for each supported subquery type.
Scalar subquery
SELECT a,
(/*+ subquery_mapjoin */ SELECT b FROM t2 WHERE a = t1.a)
FROM t1;
IN and NOT IN subqueries
SELECT * FROM t1 WHERE a IN (/*+ subquery_mapjoin */ SELECT a FROM t2 WHERE b = t1.b);
SELECT * FROM t1 WHERE a NOT IN (/*+ subquery_mapjoin */ SELECT a FROM t2 WHERE b = t1.b);
EXISTS and NOT EXISTS subqueries
SELECT * FROM t1 WHERE EXISTS (/*+ subquery_mapjoin */ SELECT * FROM t2 WHERE b = t1.b);
SELECT * FROM t1 WHERE NOT EXISTS (/*+ subquery_mapjoin */ SELECT * FROM t2 WHERE b = t1.b);
Incorrect placement
-- Wrong: hint is inside the SELECT clause, not immediately after the opening parenthesis
SELECT * FROM t1 WHERE a IN (SELECT /*+ subquery_mapjoin */ a FROM t2 WHERE b = t1.b);
Sample data
The examples below use two tables: sale_detail (a partitioned table) and shop_detail. Run the following to create them and load sample data.
-- Create a partitioned table
CREATE TABLE IF NOT EXISTS sale_detail
(
shop_name STRING,
customer_id STRING,
total_price DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);
-- Add partitions
ALTER TABLE sale_detail
ADD PARTITION (sale_date='2013', region='china')
PARTITION (sale_date='2014', region='shanghai');
-- Load data
INSERT INTO sale_detail PARTITION (sale_date='2013', region='china')
VALUES ('s1','c1',100.1), ('s2','c2',100.2), ('s3','c3',100.3);
INSERT INTO sale_detail PARTITION (sale_date='2014', region='shanghai')
VALUES ('null','c5',null), ('s6','c6',100.4), ('s7','c7',100.5);
-- Create shop_detail from the 2013 partition
SET odps.sql.allow.fullscan=true;
CREATE TABLE shop_detail AS
SELECT shop_name, customer_id, total_price
FROM sale_detail
WHERE sale_date='2013' AND region='china';
sale_detail contents
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+
shop_detail contents
+------------+-------------+-------------+
| shop_name | customer_id | total_price |
+------------+-------------+-------------+
| s1 | c1 | 100.1 |
| s2 | c2 | 100.2 |
| s3 | c3 | 100.3 |
+------------+-------------+-------------+
Examples
Example 1: Scalar subquery
Return all rows from shop_detail that have at least one matching row in sale_detail.
SET odps.sql.allow.fullscan=true;
SELECT * FROM shop_detail
WHERE (/*+ subquery_mapjoin */ SELECT COUNT(*) FROM sale_detail
WHERE sale_detail.shop_name = shop_detail.shop_name) >= 1;
Result:
+------------+-------------+-------------+
| shop_name | customer_id | total_price |
+------------+-------------+-------------+
| s1 | c1 | 100.1 |
| s2 | c2 | 100.2 |
| s3 | c3 | 100.3 |
+------------+-------------+-------------+
Example 2: IN subquery
Return rows from sale_detail whose total_price appears in shop_detail.
SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail
WHERE total_price IN (/*+ subquery_mapjoin */ SELECT total_price FROM shop_detail
WHERE customer_id = shop_detail.customer_id);
Result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
+------------+-------------+-------------+------------+------------+
Example 3: EXISTS subquery
Return rows from sale_detail that have a matching customer_id in shop_detail.
SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail
WHERE EXISTS (/*+ subquery_mapjoin */ SELECT * FROM shop_detail
WHERE customer_id = sale_detail.customer_id);
Result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
+------------+-------------+-------------+------------+------------+
Example 4: Basic subqueries are not supported
The hint does not work with basic subqueries (derived tables). The following query fails.
SET odps.sql.allow.fullscan=true;
SELECT * FROM (/*+ subquery_mapjoin */ SELECT shop_name FROM sale_detail) a;
Error:
FAILED: ODPS-0130161:[1,16] Parse exception - invalid subquery_mapjoin hint, should only be used for scalar/in/exists subquery
Example 5: Warning when a subquery is not converted to a JOIN
When a subquery filters on a partition column, MaxCompute keeps it as a subquery rather than converting it to a JOIN, so partition pruning can apply. The hint has no effect in this case, and the system issues a warning. The query still returns results.
First, add a sale_date column to shop_detail and reload the data:
ALTER TABLE shop_detail ADD COLUMNS IF NOT EXISTS (sale_date STRING);
INSERT OVERWRITE TABLE shop_detail
VALUES ('s1','c1',100.1,'2013'), ('s2','c2',100.2,'2013'), ('s3','c3',100.3,'2013');
Then run the query with warnings enabled:
-- Enable warning output (skip this if your project already outputs warnings)
SET odps.compiler.warning.disable=false;
-- sale_date is a partition column of sale_detail.
-- MaxCompute keeps this subquery as-is (no JOIN conversion) to support partition pruning.
-- The hint is ignored and the system issues a warning.
SELECT * FROM sale_detail
WHERE sale_date IN (/*+ subquery_mapjoin */ SELECT sale_date FROM shop_detail);
Result:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
+------------+-------------+-------------+------------+------------+
Warning:
WARNING:[1,47] subquery_mapjoin hint does not work because the subquery is not converted to join
What's next
-
Subqueries — overview of subquery types supported in MaxCompute SQL
-
MAPJOIN limits — small table size constraints for MAPJOIN