This topic describes how to use a
MAPJOIN statement to join a large table with one or more small tables. It is faster than common
MAPJOINis as follows: When the data volume is small, SQL loads all your specified small tables into the memory of the program by performing the
JOINoperation to join your tables faster.
MAPJOIN, note the following:
- The left table of a
LEFT OUTER JOINclause must be a large table.
- The right table of a
RIGHT OUTER JOINclause must be a large table.
- Both the left and right tables of an
INNER JOINclause can be large tables.
MAPJOINcannot be used in a
FULL OUTER JOINclause.
MAPJOINsupports small tables as subqueries.
MAPJOINis used and a small table or subquery must be referenced, the alias must be referenced.
JOINconditions or multiple conditions connected by using
ORstatements. You can choose not to use
ONstatements. You can also use mapjoin on 1 = 1 to express a Cartesian product, for example, select /* + mapjoin(a) */ a.id from shop a join table_name b on 1=1, which may cause data expansion.
- Currently, MaxCompute allows up to 256 small tables to be specified in a
MAPJOINstatement. If you specify more than 256 small tables, a syntax error is returned.
MAPJOINis used, the total memory occupied by all small tables cannot exceed 640 MB. Note that MaxCompute uses compressed storage, so the data size is sharply expanded after small tables are loaded into the memory. The limit of 640 MB refers to the size after small tables are loaded into the memory.
select /* + mapjoin(a) */ a.shop_name, b.customer_id, b.total_price from shop a join sale_detail b on a.shop_name = b.shop_name;
MaxCompute SQL does not support complex
JOIN conditions, such as non-equivalent expressions and the
OR logic, in the
ON conditions of common JOIN operations. However,
MAPJOIN supports such operations.
select /*+ mapjoin(a) */ a.total_price, b.total_price from shop a join sale_detail b on a.total_price < b.total_price or a.total_price + b.total_price < 500;