All Products
Search
Document Center

MaxCompute:SUBQUERY_MAPJOIN HINT

Last Updated:Nov 22, 2024

MaxCompute supports subquery operations, and some of these subqueries are internally optimized by being converted into JOINs during execution. To further enhance the execution efficiency of a subquery, you can use the SUBQUERY_MAPJOIN HINT within the subquery statement to explicitly specify the MAPJOIN algorithm. This topic describes how to use the SUBQUERY_MAPJOIN HINT.

Limits

  • Only supports the SCALAR, IN, NOT IN, EXISTS, and NOT EXISTS subqueries, and does not support the basic subqueries. For more information on subqueries, see Subqueries.

  • You do not need to manually specify a small table when using the SUBQUERY_MAPJOIN HINT. The system automatically uses the result of the subquery execution as the small table for the MAPJOIN.

    Important

    If the result of the subquery is excessively large, it could trigger an Out of Memory error during MAPJOIN operations. It is essential to verify that the subquery result is a small table prior to employing this HINT. For information on MAPJOIN limitations, see Limits.

  • In certain scenarios, a subquery is not be converted into a JOIN operation for execution. When the SUBQUERY_MAPJOIN HINT is utilized in these scenarios, the system will issue a warning. For detailed instructions, see Example 5: Subqueries not converted to JOIN output a warning while returning query results.

Usage method

To execute the MAPJOIN algorithm within a subquery statement, you must use and place the HINT /*+ subquery_mapjoin */ immediately after the left parenthesis of the subquery. The following illustrates the usage:

Consider the table definitions for t1 and t2:

CREATE TABLE t1(a BIGINT, b BIGINT);
CREATE TABLE t2(a BIGINT, b BIGINT);
  • SCALAR SUBQUERY

    SELECT a,
           (/*+ subquery_mapjoin */ SELECT b FROM t2 WHERE a = t1.a)
    FROM t1;
  • IN and NOT IN SUBQUERY

    SELECT * FROM t1 WHERE a IN (/*+ subquery_mapjoin */ SELECT a FROM t2 WHERE b = t1.b);
    SELECT * FROM t1 WHERE a NOT IN (/*+ subquery_mapjoin */ SELECT a FROM t2 WHERE b = t1.b);
  • EXISTS and NOT EXISTS SUBQUERY

    SELECT * FROM t1 WHERE EXISTS (/*+ subquery_mapjoin */ SELECT * FROM t2 WHERE b = t1.b);
    SELECT * FROM t1 WHERE NOT EXISTS (/*+ subquery_mapjoin */ SELECT * FROM t2 WHERE b = t1.b);
  • Incorrect usage

    -- The following code is incorrect because the SUBQUERY_MAPJOIN HINT is not written immediately after the left parenthesis corresponding to the subquery
    SELECT * FROM t1 WHERE a IN (SELECT /*+ subquery_mapjoin */ a FROM t2 WHERE b = t1.b);

Sample data

To facilitate understanding of the examples provided, sample source data is available.

  1. Create the sale_detail and shop_detail tables and populate them with data. Sample command:

    -- Create a partitioned table sale_detail
    CREATE TABLE if NOT EXISTS sale_detail
    (
    shop_name     STRING,
    customer_id   STRING,
    total_price   DOUBLE
    )
    PARTITIONED BY (sale_date STRING, region STRING);
    
    -- Add partitions to the sale_detail table
    ALTER TABLE sale_detail ADD PARTITION (sale_date='2013', region='china') PARTITION (sale_date='2014', region='shanghai');
    
    -- Append data to the sale_detail table
    INSERT INTO sale_detail PARTITION (sale_date='2013', region='china') VALUES ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
    INSERT INTO sale_detail PARTITION (sale_date='2014', region='shanghai') VALUES ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
    
    -- Create a shop_detail table and insert data from the 2013 partition of the sale_detail table
    SET odps.sql.allow.fullscan=true;
    CREATE TABLE shop_detail AS SELECT shop_name,customer_id,total_price FROM sale_detail WHERE sale_date='2013'AND region='china'; 
  2. Query data from the sale_detail table. Sample command:

    SET odps.sql.allow.fullscan=true;
    SELECT * FROM sale_detail; 

    The query returns the following result:

    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    | s3         | c3          | 100.3       | 2013       | china      |
    | null       | c5          | NULL        | 2014       | shanghai   |
    | s6         | c6          | 100.4       | 2014       | shanghai   |
    | s7         | c7          | 100.5       | 2014       | shanghai   |
    +------------+-------------+-------------+------------+------------+
  3. Query data from the shop_detail table. Sample command:

    SELECT * FROM shop_detail; 

    The query returns the following result:

    +------------+-------------+-------------+
    | shop_name  | customer_id | total_price |
    +------------+-------------+-------------+
    | s1         | c1          | 100.1       |
    | s2         | c2          | 100.2       |
    | s3         | c3          | 100.3       |
    +------------+-------------+-------------+

Usage examples

The following examples utilize Sample data to demonstrate the application of the SUBQUERY_MAPJOIN HINT.

Example 1: SCALAR SUBQUERY

SET odps.sql.allow.fullscan=true;
SELECT * FROM shop_detail WHERE (/*+ subquery_mapjoin */ SELECT COUNT(*) FROM sale_detail WHERE sale_detail.shop_name = shop_detail.shop_name) >= 1;

The query returns the following result:

+------------+-------------+-------------+
| shop_name  | customer_id | total_price |
+------------+-------------+-------------+
| s1         | c1          | 100.1       |
| s2         | c2          | 100.2       |
| s3         | c3          | 100.3       |
+------------+-------------+-------------+

Example 2: IN SUBQUERY

SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail WHERE total_price IN (/*+ subquery_mapjoin */ SELECT total_price FROM shop_detail WHERE customer_id = shop_detail.customer_id);

The query returns the following result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

Example 3: EXISTS SUBQUERY

SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail WHERE EXISTS (/*+ subquery_mapjoin */ SELECT * FROM shop_detail WHERE customer_id = sale_detail.customer_id);

The query returns the following result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

Example 4: Basic subqueries do not support specifying the use of SUBQUERY_MAPJOIN HINT

The following query will fail because basic subqueries do not support the subquery_mapjoin HINT.

SET odps.sql.allow.fullscan=true;
SELECT * FROM (/*+ subquery_mapjoin */ SELECT shop_name FROM sale_detail) a;

The query returns the following result:

-- Error message
FAILED: ODPS-0130161:[1,16] Parse exception - invalid subquery_mapjoin hint, should only be used for scalar/in/exists subquery

Example 5: Subqueries not converted to JOIN output a warning while returning query results

-- Add a column to the shop_detail table and add data
ALTER TABLE shop_detail ADD columns if not exists(sale_date STRING);
INSERT OVERWRITE TABLE shop_detail VALUES ('s1','c1',100.1,'2013'),('s2','c2',100.2,'2013'),('s3','c3',100.3,'2013');

-- This flag allows the system to output warnings. If the default configuration of the project already meets the requirements, there is no need to set it
SET odps.compiler.warning.disable=false;

/** The following query involves partitioned tables. To support partition pruning, the system does not convert the subquery to join.
When using the SUBQUERY_MAPJOIN HINT, the system will output a warning **/
SELECT * FROM sale_detail WHERE sale_date IN (/*+ subquery_mapjoin */ SELECT sale_date FROM shop_detail);

The query returns the following result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

The query issues the following warning:

WARNING:[1,47]  subquery_mapjoin hint does not work because the subquery is not converted to join