When you join a small table (1--100 GB) against a large table (over 10 TB), standard joins shuffle and sort the entire large table. DISTRIBUTED MAPJOIN avoids this overhead by splitting the small table into shards and distributing them across compute nodes. Each node processes only its assigned shards, which reduces memory pressure and speeds up query execution.
When to use DISTRIBUTED MAPJOIN
DISTRIBUTED MAPJOIN is designed for joins where the table sizes differ significantly:
The large table exceeds 10 TB.
The small table is between 1 GB and 100 GB.
Data in the small table is evenly distributed (no significant skew).
If an SQL task takes more than 20 minutes to complete, consider using DISTRIBUTED MAPJOIN for optimization.
If the small table has long-tail data (skewed distribution), a single shard may receive disproportionate data. This can cause an out-of-memory (OOM) error or a remote procedure call (RPC) timeout.
How it works
DISTRIBUTED MAPJOIN splits the small table into a configurable number of shards and distributes them across compute nodes. Each node processes its assigned shards against the corresponding portion of the large table. This avoids loading the entire small table onto every node.
Task parallelism is determined by the following formula:
Parallelism = shard_count x replica_countSyntax
Add a hint to your SELECT statement:
/*+distmapjoin(<table_name>(shard_count=<n>,replica_count=<m>))*/Parameters
| Parameter | Description | Default | Recommended range |
|---|---|---|---|
table_name | Name of the small table to join. | -- | -- |
shard_count | Number of shards for the small table. Shards are distributed across compute nodes for processing. Set to an odd number. | Not specified by default. Manual specification is recommended. | Size each shard to hold 200--500 MB of data. |
replica_count | Number of replicas per shard. Additional replicas reduce access pressure on individual nodes and prevent full task failure if a single node fails. | 1 | 2 or 3 for environments with frequent node restarts. |
Manually specifyshard_countbased on the small table size. Divide the table size by the target shard size (200--500 MB) to estimate an appropriate value. Settingshard_counttoo high degrades performance and stability. Setting it too low may cause OOM errors.
Syntax variants
-- Specify shard_count only (replica_count defaults to 1)
/*+distmapjoin(a(shard_count=5))*/
-- Specify both shard_count and replica_count
/*+distmapjoin(a(shard_count=5,replica_count=2))*/
-- Join multiple small tables with DISTRIBUTED MAPJOIN
/*+distmapjoin(a(shard_count=5,replica_count=2),b(shard_count=5,replica_count=2))*/
-- Combine DISTRIBUTED MAPJOIN with MAPJOIN
/*+distmapjoin(a(shard_count=5,replica_count=2)),mapjoin(b)*/Use DISTRIBUTED MAPJOIN in a query
Before you begin, make sure that you have:
A MaxCompute project with sufficient quota resources
Tables that meet the size requirements (large table > 10 TB, small table between 1 GB and 100 GB)
DISTRIBUTED MAPJOIN is resource-intensive. Avoid running tasks in a small quota group. Change the quota group on the Quotas page. For more information, see Manage quotas in the new MaxCompute console.
The following examples insert data into a partitioned table named tmall_dump_lasttable. The large table is search_ods.dump_lasttable and the small table is a filtered subset of tbcdm.dim_tb_itm.
Standard JOIN:
INSERT OVERWRITE TABLE tmall_dump_lasttable PARTITION (ds='20211130')
SELECT t1.*
FROM
(
SELECT nid, doc, type
FROM search_ods.dump_lasttable
WHERE ds = '20211203'
) t1
JOIN
(
SELECT DISTINCT item_id
FROM tbcdm.dim_tb_itm
WHERE ds = '20211130'
AND bc_type = 'B'
AND is_online = 'Y'
) t2
ON t1.nid = t2.item_id;With DISTRIBUTED MAPJOIN:
Add the distmapjoin hint to the SELECT statement. In this example, the small table t2 is split into 35 shards:
INSERT OVERWRITE TABLE tmall_dump_lasttable PARTITION (ds='20211130')
SELECT /*+ distmapjoin(t2(shard_count=35)) */ t1.*
FROM
(
SELECT nid, doc, type
FROM search_ods.dump_lasttable
WHERE ds = '20211203'
) t1
JOIN
(
SELECT DISTINCT item_id
FROM tbcdm.dim_tb_itm
WHERE ds = '20211130'
AND bc_type = 'B'
AND is_online = 'Y'
) t2
ON t1.nid = t2.item_id;Best practices
Choose the right shard_count
Estimate the size of the small table after filtering.
Divide by the target shard size (200--500 MB). For example, a 7 GB small table needs 14--35 shards.
Set
shard_countto an odd number.
Increase replica_count for stability
If nodes restart frequently due to high parallelism or an unstable environment, set replica_count to 2 or 3. Additional replicas prevent full task failure when a single node goes down.
Avoid data skew in the small table
Verify that the small table data is evenly distributed before using DISTRIBUTED MAPJOIN. Skewed data concentrates traffic on specific shards, leading to OOM errors or RPC timeouts.
Troubleshooting
| Symptom | Likely cause | Solution |
|---|---|---|
| OOM error | shard_count too low; individual shards exceed available memory. | Increase shard_count so each shard holds 200--500 MB. |
| OOM error | Small table has skewed data (long tails). | Check data distribution. Filter or pre-aggregate the small table to reduce skew. |
| RPC timeout | A single shard receives disproportionate data due to skew. | Increase shard_count and verify data distribution. |
| Degraded performance or instability | shard_count too high. | Reduce shard_count. Each shard should hold at least 200 MB. |
| Node restarts | High parallelism without sufficient replicas. | Set replica_count to 2 or 3. |
| Task runs in a small quota group | Insufficient resources for the workload. | Move the task to a larger quota group on the Quotas page. |