Large table small table join query optimization-DISTRIBUTED MAPJOIN - MaxCompute

When you join a small table (1--100 GB) against a large table (over 10 TB), standard joins shuffle and sort the entire large table. DISTRIBUTED MAPJOIN avoids this overhead by splitting the small table into shards and distributing them across compute nodes. Each node processes only its assigned shards, which reduces memory pressure and speeds up query execution.

When to use DISTRIBUTED MAPJOIN

DISTRIBUTED MAPJOIN is designed for joins where the table sizes differ significantly:

The large table exceeds 10 TB.
The small table is between 1 GB and 100 GB.
Data in the small table is evenly distributed (no significant skew).

If an SQL task takes more than 20 minutes to complete, consider using DISTRIBUTED MAPJOIN for optimization.

If the small table has long-tail data (skewed distribution), a single shard may receive disproportionate data. This can cause an out-of-memory (OOM) error or a remote procedure call (RPC) timeout.

How it works

DISTRIBUTED MAPJOIN splits the small table into a configurable number of shards and distributes them across compute nodes. Each node processes its assigned shards against the corresponding portion of the large table. This avoids loading the entire small table onto every node.

Task parallelism is determined by the following formula:

Parallelism = shard_count x replica_count

Syntax

Add a hint to your SELECT statement:

/*+distmapjoin(<table_name>(shard_count=<n>,replica_count=<m>))*/

Parameters

Parameter	Description	Default	Recommended range
`table_name`	Name of the small table to join.	--	--
`shard_count`	Number of shards for the small table. Shards are distributed across compute nodes for processing. Set to an odd number.	Not specified by default. Manual specification is recommended.	Size each shard to hold 200--500 MB of data.
`replica_count`	Number of replicas per shard. Additional replicas reduce access pressure on individual nodes and prevent full task failure if a single node fails.	1	2 or 3 for environments with frequent node restarts.

Manually specify shard_count based on the small table size. Divide the table size by the target shard size (200--500 MB) to estimate an appropriate value. Setting shard_count too high degrades performance and stability. Setting it too low may cause OOM errors.

Syntax variants

-- Specify shard_count only (replica_count defaults to 1)
/*+distmapjoin(a(shard_count=5))*/

-- Specify both shard_count and replica_count
/*+distmapjoin(a(shard_count=5,replica_count=2))*/

-- Join multiple small tables with DISTRIBUTED MAPJOIN
/*+distmapjoin(a(shard_count=5,replica_count=2),b(shard_count=5,replica_count=2))*/

-- Combine DISTRIBUTED MAPJOIN with MAPJOIN
/*+distmapjoin(a(shard_count=5,replica_count=2)),mapjoin(b)*/

Use DISTRIBUTED MAPJOIN in a query

Before you begin, make sure that you have:

A MaxCompute project with sufficient quota resources
Tables that meet the size requirements (large table > 10 TB, small table between 1 GB and 100 GB)

DISTRIBUTED MAPJOIN is resource-intensive. Avoid running tasks in a small quota group. Change the quota group on the Quotas page. For more information, see Manage quotas in the new MaxCompute console.

The following examples insert data into a partitioned table named tmall_dump_lasttable. The large table is search_ods.dump_lasttable and the small table is a filtered subset of tbcdm.dim_tb_itm.

Standard JOIN:

INSERT OVERWRITE TABLE tmall_dump_lasttable PARTITION (ds='20211130')
SELECT t1.*
FROM
(
    SELECT nid, doc, type
    FROM search_ods.dump_lasttable
    WHERE ds = '20211203'
) t1
JOIN
(
    SELECT DISTINCT item_id
    FROM tbcdm.dim_tb_itm
    WHERE ds = '20211130'
      AND bc_type = 'B'
      AND is_online = 'Y'
) t2
ON t1.nid = t2.item_id;

With DISTRIBUTED MAPJOIN:

Add the distmapjoin hint to the SELECT statement. In this example, the small table t2 is split into 35 shards:

INSERT OVERWRITE TABLE tmall_dump_lasttable PARTITION (ds='20211130')
SELECT /*+ distmapjoin(t2(shard_count=35)) */ t1.*
FROM
(
    SELECT nid, doc, type
    FROM search_ods.dump_lasttable
    WHERE ds = '20211203'
) t1
JOIN
(
    SELECT DISTINCT item_id
    FROM tbcdm.dim_tb_itm
    WHERE ds = '20211130'
      AND bc_type = 'B'
      AND is_online = 'Y'
) t2
ON t1.nid = t2.item_id;

Best practices

Choose the right shard_count

Estimate the size of the small table after filtering.
Divide by the target shard size (200--500 MB). For example, a 7 GB small table needs 14--35 shards.
Set shard_count to an odd number.

Increase replica_count for stability

If nodes restart frequently due to high parallelism or an unstable environment, set replica_count to 2 or 3. Additional replicas prevent full task failure when a single node goes down.

Avoid data skew in the small table

Verify that the small table data is evenly distributed before using DISTRIBUTED MAPJOIN. Skewed data concentrates traffic on specific shards, leading to OOM errors or RPC timeouts.

Troubleshooting

Symptom	Likely cause	Solution
OOM error	`shard_count` too low; individual shards exceed available memory.	Increase `shard_count` so each shard holds 200--500 MB.
OOM error	Small table has skewed data (long tails).	Check data distribution. Filter or pre-aggregate the small table to reduce skew.
RPC timeout	A single shard receives disproportionate data due to skew.	Increase `shard_count` and verify data distribution.
Degraded performance or instability	`shard_count` too high.	Reduce `shard_count`. Each shard should hold at least 200 MB.
Node restarts	High parallelism without sufficient replicas.	Set `replica_count` to 2 or 3.
Task runs in a small quota group	Insufficient resources for the workload.	Move the task to a larger quota group on the Quotas page.