This topic describes how to use a MAPJOIN statement to join a large table with one or more small tables. A MAPJOIN operation is faster than common JOIN operations.
MAPJOINaccelerates the execution process by using SQL to load all the specified small tables into the program memory through the
- The left table of a
LEFT OUTER JOINclause must be a large table.
- The right table of a
RIGHT OUTER JOINclause must be a large table.
- Both the left and right tables of an
INNER JOINclause can be large tables.
MAPJOINcannot be used in a
FULL OUTER JOINclause.
MAPJOINsupports small tables in subqueries.
- If you need to reference a small table or a subquery when you perform
MAPJOIN, you must reference the alias of the table or subquery.
MAPJOIN, you can use non-equi joins or combine multiple conditions by using OR. You can calculate the Cartesian product by leaving the
ONcondition unspecified or by using
mapjoin on 1 = 1, for example,
select /* + mapjoin(a) */ a.id from shop a join table_name b on 1=1. However, this calculation method may lead to data expansion.
- MaxCompute allows you to specify a maximum of 128 small tables for the
MAPJOINoperation. If the number of small tables exceeds 128, a syntax error is returned. In MAPJOIN HINT, multiple small tables are separated with commas (,), for example,
MAPJOIN, the total memory occupied by small tables cannot exceed 640 MB. MaxCompute compresses your data before storage. As a result, the data volume of small tables is sharply increased after they are loaded into the memory. 640 MB indicates the maximum data volume after small tables are loaded into the memory.
select /* + mapjoin(a) */ a.shop_name, b.customer_id, b.total_price from shop a join sale_detail b on a.shop_name = b.shop_name;
ORlogic in the
ONcondition for a common
JOINoperation. However, you can do this in
MAPJOIN. An example is as follows:
select /*+ mapjoin(a) */ a.total_price, b.total_price from shop a join sale_detail b on a.total_price < b.total_price or a.total_price + b.total_price < 500;