This topic describes how to use a MAPJOIN statement to join a large table with one or more small tables. A MAPJOIN operation is faster than common JOIN operations.

When the volume of data is small, MAPJOIN accelerates the execution process by using SQL to load all the specified small tables into the program memory through the JOIN operation.
Note Note the following points when you use a MAPJOIN statement:
  • The left table of a LEFT OUTER JOIN clause must be a large table.
  • The right table of a RIGHT OUTER JOIN clause must be a large table.
  • Both the left and right tables of an INNER JOIN clause can be large tables.
  • MAPJOIN cannot be used in a FULL OUTER JOIN clause.
  • MAPJOIN supports small tables in subqueries.
  • If you need to reference a small table or a subquery when you perform MAPJOIN, you must reference the alias of the table or subquery.
  • In MAPJOIN, you can use non-equi joins or combine multiple conditions by using OR. You can calculate the Cartesian product by leaving the ON condition unspecified or by using mapjoin on 1 = 1, for example, select /* + mapjoin(a) */ from shop a join table_name b on 1=1. However, this calculation method may lead to data expansion.
  • MaxCompute allows you to specify a maximum of 128 small tables for the MAPJOIN operation. If the number of small tables exceeds 128, a syntax error is returned. In MAPJOIN HINT, multiple small tables are separated with commas (,), for example, /*+mapjoin(a,b,c)*/.
  • In MAPJOIN, the total memory occupied by small tables cannot exceed 640 MB. MaxCompute compresses your data before storage. As a result, the data volume of small tables is sharply increased after they are loaded into the memory. 640 MB indicates the maximum data volume after small tables are loaded into the memory.
select /* + mapjoin(a) */
from shop a join sale_detail b
on a.shop_name = b.shop_name;
In MaxCompute SQL, you cannot use non-equi joins or the OR logic in the ON condition for a common JOIN operation. However, you can do this in MAPJOIN. An example is as follows:
select /*+ mapjoin(a) */
from shop a join sale_detail b
on a.total_price < b.total_price or a.total_price + b.total_price < 500;