All Products
Search
Document Center

MaxCompute:MAPJOIN hints

Last Updated:Mar 26, 2026

Add a MAPJOIN hint to a SELECT statement to force a join to execute in the map stage, bypassing the shuffle and reduce stages. This reduces data transmission overhead and improves query performance when joining a large table with one or more small tables. Use MAPJOIN when the optimizer does not automatically apply map-stage join to your query.

How it works

A standard JOIN in MaxCompute runs through three stages: map, shuffle, and reduce. The actual join logic executes in the reduce stage, which requires shuffling data across nodes.

MAPJOIN changes this: it loads the entire contents of the specified small table into memory during the map stage. Each mapper then performs the join locally against the in-memory data, eliminating the shuffle and reduce stages entirely.

The hint names the small table to load into memory. For every mapper reading rows from the large table, the small table is read completely from memory:

SELECT /*+ mapjoin(a) */ a.shop_name, a.total_price, b.total_price
FROM sale_detail_sj a JOIN sale_detail b
ON a.total_price < b.total_price OR a.total_price + b.total_price < 500;

In this example, a (aliased from sale_detail_sj) is the small table. The hint specifies a, so a is what gets loaded into memory on each mapper.

Limits

Memory and table count

ConstraintLimitNotes
Total memory for all small tables512 MBMeasured after data is loaded into memory and decompressed, not the compressed size stored in MaxCompute
Maximum number of small tables128Specifying more than 128 returns a syntax error

Supported JOIN types

JOIN typeSupportedRequirement
INNER JOINYesEither the left or right table can be the large table
LEFT OUTER JOINYesThe left table must be the large table
RIGHT OUTER JOINYesThe right table must be the large table
FULL OUTER JOINNo

Usage notes

Add /*+ mapjoin(<table_name>) */ immediately after SELECT. Keep the following in mind:

  • Reference aliases, not original table names. When the small table or subquery has an alias, use the alias inside the hint.

  • Subqueries are supported as small tables. Use a subquery in place of a table reference, and reference its alias in the hint.

  • Separate multiple small tables with commas: /*+ mapjoin(a,b,c) */

  • Non-equi joins and OR conditions are supported. Standard MaxCompute SQL does not allow non-equi joins or OR logic in the ON condition, but MAPJOIN does.

  • Cartesian products are supported using ON 1 = 1 (for example, SELECT /*+ mapjoin(a) */ a.id FROM shop a JOIN table_name b ON 1=1), but this can significantly increase output data volume.

Note

Subquery types such as SCALAR, IN, NOT IN, EXISTS, and NOT EXISTS may be converted to JOIN operations at execution time. If the subquery result qualifies as a small table, add a MAPJOIN hint to the subquery statement to explicitly apply the map-stage join algorithm.

Sample data

The examples in this topic use the sale_detail and sale_detail_sj tables. Run the following statements to create the tables and insert sample data.

-- Create a partitioned table named sale_detail.
CREATE TABLE IF NOT EXISTS sale_detail
(
  shop_name     STRING,
  customer_id   STRING,
  total_price   DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);

CREATE TABLE IF NOT EXISTS sale_detail_sj
(
  shop_name     STRING,
  customer_id   STRING,
  total_price   DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);

-- Add partitions.
ALTER TABLE sale_detail ADD PARTITION (sale_date='2013', region='china');
ALTER TABLE sale_detail_sj ADD PARTITION (sale_date='2013', region='china');

-- Insert sample data.
INSERT INTO sale_detail PARTITION (sale_date='2013', region='china')
VALUES ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);

INSERT INTO sale_detail_sj PARTITION (sale_date='2013', region='china')
VALUES ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2),('s2','c2',100.2);

Example

Join sale_detail_sj (small table, aliased as a) with sale_detail (large table, aliased as b) using a non-equi condition. Return rows where either the total_price from a is less than the total_price from b, or the sum of both prices is less than 500.

-- Allow a full scan on the partitioned table.
SET odps.sql.allow.fullscan=true;

-- Use MAPJOIN with a non-equi join condition.
SELECT /*+ mapjoin(a) */
    a.shop_name,
    a.total_price,
    b.total_price
FROM sale_detail_sj a JOIN sale_detail b
ON a.total_price < b.total_price OR a.total_price + b.total_price < 500;

The query returns the following result:

+-----------+-------------+--------------+
| shop_name | total_price | total_price2 |
+-----------+-------------+--------------+
| s1        | 100.1       | 100.1        |
| s2        | 100.2       | 100.1        |
| s5        | 100.2       | 100.1        |
| s2        | 100.2       | 100.1        |
| s1        | 100.1       | 100.2        |
| s2        | 100.2       | 100.2        |
| s5        | 100.2       | 100.2        |
| s2        | 100.2       | 100.2        |
| s1        | 100.1       | 100.3        |
| s2        | 100.2       | 100.3        |
| s5        | 100.2       | 100.3        |
| s2        | 100.2       | 100.3        |
+-----------+-------------+--------------+