When you join a large table with one or more small tables, a standard JOIN redistributes data across the cluster through a shuffle phase, adding network overhead that slows queries. The MAPJOIN hint bypasses the shuffle by loading small tables into memory on each worker during the map stage, so the large table never moves across the network. Use MAPJOIN to reduce network I/O and speed up these queries.
How it works
A standard JOIN runs in three stages: map, shuffle, and reduce. Tables are joined in the reduce stage, which requires redistributing data across the cluster.
MAPJOIN skips the shuffle. During the map stage, MaxCompute loads all data from the specified small tables into memory on each worker that processes the large table. The large table is read locally — the workers bring the small tables to the data, not the other way around.
Syntax
Add the /*+ mapjoin(<table_alias>) */ hint immediately after SELECT to activate MAPJOIN:
SELECT /*+ mapjoin(<small_table_alias>) */
...
FROM <large_table> ...Note the following:
Reference the alias of a table or subquery in the hint, not the original table name.
To specify multiple small tables, separate their aliases with commas:
/*+ mapjoin(a,b,c) */.Subqueries can serve as small tables.
MAPJOIN supports non-equi joins and
ORconditions in theONclause. Standard MaxCompute SQL does not support these in regular joins.To produce a Cartesian product, use
ON 1 = 1instead of a join condition — for example:SELECT /*+ mapjoin(a) */ a.id FROM shop a JOIN table_name b ON 1=1;. This can significantly increase the data volume.
Subqueries such as SCALAR, IN, NOT IN, EXISTS, and NOT EXISTS can be rewritten as JOIN operations during execution. If the result of a subquery is small, add a MAPJOIN hint to the subquery statement to explicitly use the MAPJOIN algorithm.
Limits
| Limit | Detail |
|---|---|
| Memory | The total in-memory size of all small tables after decompression must not exceed 512 MB. MaxCompute compresses data before storage, so the decompressed in-memory size is significantly larger than the stored file size. The 512 MB limit applies to the decompressed size, not the compressed file size. |
| Number of small tables | Specify at most 128 small tables in a single MAPJOIN hint. Specifying more than 128 returns a syntax error. |
| LEFT OUTER JOIN | The left table must be the large table. |
| RIGHT OUTER JOIN | The right table must be the large table. |
| INNER JOIN | Either the left or right table can be the large table. |
| FULL OUTER JOIN | MAPJOIN cannot be used. |
Sample data
The examples in this topic use two tables: sale_detail and sale_detail_sj.
-- Create a partitioned table named sale_detail.
CREATE TABLE IF NOT EXISTS sale_detail
(
shop_name STRING,
customer_id STRING,
total_price DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);
CREATE TABLE IF NOT EXISTS sale_detail_sj
(
shop_name STRING,
customer_id STRING,
total_price DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);
-- Add partitions.
ALTER TABLE sale_detail ADD PARTITION (sale_date='2013', region='china');
ALTER TABLE sale_detail_sj ADD PARTITION (sale_date='2013', region='china');
-- Insert data.
INSERT INTO sale_detail PARTITION (sale_date='2013', region='china')
VALUES ('s1','c1',100.1), ('s2','c2',100.2), ('s3','c3',100.3);
INSERT INTO sale_detail_sj PARTITION (sale_date='2013', region='china')
VALUES ('s1','c1',100.1), ('s2','c2',100.2), ('s5','c2',100.2), ('s2','c2',100.2);Examples
Basic usage
Use MAPJOIN to join sale_detail_sj (small table, alias a) with sale_detail (large table, alias b) using a standard equi-join:
SET odps.sql.allow.fullscan=true;
SELECT /*+ mapjoin(a) */
a.shop_name,
a.customer_id,
a.total_price
FROM sale_detail_sj a
JOIN sale_detail b
ON a.shop_name = b.shop_name;Non-equi join with OR condition
MAPJOIN supports non-equi joins and OR conditions, which are not available in standard MaxCompute SQL joins. The following example retrieves rows where sale_detail_sj.total_price is less than sale_detail.total_price, or where the sum of both prices is less than 500:
SET odps.sql.allow.fullscan=true;
SELECT /*+ mapjoin(a) */
a.shop_name,
a.total_price,
b.total_price
FROM sale_detail_sj a
JOIN sale_detail b
ON a.total_price < b.total_price OR a.total_price + b.total_price < 500;The query returns:
+-----------+-------------+--------------+
| shop_name | total_price | total_price2 |
+-----------+-------------+--------------+
| s1 | 100.1 | 100.1 |
| s2 | 100.2 | 100.1 |
| s5 | 100.2 | 100.1 |
| s2 | 100.2 | 100.1 |
| s1 | 100.1 | 100.2 |
| s2 | 100.2 | 100.2 |
| s5 | 100.2 | 100.2 |
| s2 | 100.2 | 100.2 |
| s1 | 100.1 | 100.3 |
| s2 | 100.2 | 100.3 |
| s5 | 100.2 | 100.3 |
| s2 | 100.2 | 100.3 |
+-----------+-------------+--------------+