Use RIGHT JOIN to optimize LEFT JOIN query performance - AnalyticDB

When the right table in a LEFT JOIN is significantly larger than the left table, the hash join algorithm that AnalyticDB for MySQL uses by default can cause high memory consumption and slow query execution. Rewriting the query as a RIGHT JOIN—or enabling an optimizer hint—puts the smaller table on the build side and reduces memory pressure.

How it works

AnalyticDB for MySQL uses hash join as the default algorithm for joining tables. In a hash join, the right table always builds the hash table in memory. The left table is then streamed through to probe that hash table.

Unlike inner joins, outer joins (LEFT JOIN and RIGHT JOIN) cannot exchange the order of the left and right tables without changing query semantics. This means that a LEFT JOIN with a large right table forces a large hash table to be built, which can lead to:

Slow query execution
High memory consumption
In extreme cases, degraded cluster performance or the Out of Memory Pool size pre cal error

Rewriting the query as a RIGHT JOIN flips which table is on the build side. When the new right table is small, the hash table fits comfortably in memory and performance improves significantly.

The following table shows how the two forms are semantically equivalent:

Form	SQL pattern	Build side (hash table)
LEFT JOIN	`small_table LEFT JOIN large_table ON ...`	`large_table` (inefficient)
RIGHT JOIN	`large_table RIGHT JOIN small_table ON ...`	`small_table` (efficient)

Apply this optimization when the left table is small and the right table is large.

Diagnose the problem

Run EXPLAIN ANALYZE on the query to check the execution plan metrics for the join operator:

EXPLAIN ANALYZE
SELECT COUNT(*)
FROM nation t1
LEFT JOIN customer t2 ON t1.n_nationkey = t2.c_nationkey;

In the execution plan output, locate the LEFT Join operator and check these two metrics:

PeakMemory: A high percentage (for example, 93.68%) indicates that the LEFT JOIN is the performance bottleneck.
Left (probe) Input avg. / Right (build) Input avg.: If the build-side input average is orders of magnitude larger than the probe-side input average, the right table is too large to build the hash table efficiently.

For more information about reading execution plans, see Use the EXPLAIN and EXPLAIN ANALYZE commands to analyze execution plans.

Fix the problem

Use one of the following methods to move the smaller table to the right (build) side.

Method 1: Rewrite the SQL statement

Swap the table positions and change LEFT JOIN to RIGHT JOIN directly in the SQL:

-- Before: nation (25 rows) is on the left, customer (15,000,000 rows) is on the right
SELECT COUNT(*)
FROM nation t1
LEFT JOIN customer t2 ON t1.n_nationkey = t2.c_nationkey;

-- After: customer is now on the left (probe), nation is on the right (build)
SELECT COUNT(*)
FROM customer t2
RIGHT JOIN nation t1 ON t1.n_nationkey = t2.c_nationkey;

Method 2: Add an optimizer hint

Add a hint before the SQL statement to let the optimizer decide whether to swap the join sides automatically, based on estimated table sizes.

For clusters running V3.1.8 or later (table change feature is enabled by default):

If the feature has been disabled, re-enable it with the following hint:

/*+O_CBO_RULE_SWAP_OUTER_JOIN=true*/
SELECT COUNT(*)
FROM nation t1
LEFT JOIN customer t2 ON t1.n_nationkey = t2.c_nationkey;

For clusters running a version earlier than V3.1.8 (table change feature is disabled by default):

/*+LEFT_TO_RIGHT_ENABLED=true*/
SELECT COUNT(*)
FROM nation t1
LEFT JOIN customer t2 ON t1.n_nationkey = t2.c_nationkey;

Note

To check the minor version of your cluster, see How do I query the version of an AnalyticDB for MySQL cluster?. To upgrade the minor version, contact technical support.

Example

This example uses two tables from the TPC-H benchmark:

nation: 25 rows (small table)
customer: 15,000,000 rows (large table)

Before optimization — LEFT JOIN execution plan (Fragment 2)

Fragment 2 [HASH]
    Output: 48 rows (432B), PeakMemory: 516MB, WallTime: 6.52us, Input: 15000025 rows (200.27MB); per task: avg.: 2500004.17 std.dev.: 2410891.74
    Output layout: [count_0_2]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_2:bigint]
    │   Estimates: {rows: ? (?)}
    │   Output: 96 rows (864B), PeakMemory: 96B (0.00%), WallTime: 88.21ms (0.88%)
    │   count_2 := count(*)
    └─ LEFT Join[(`n_nationkey` = `c_nationkey`)][$hashvalue, $hashvalue_0_4]
       │   Outputs: []
       │   Estimates: {rows: 15000000 (0B)}
       │   Output: 30000000 rows (200.27MB), PeakMemory: 515MB (93.68%), WallTime: 4.34s (43.05%)
       │   Left (probe) Input avg.: 0.52 rows, Input std.dev.: 379.96%
       │   Right (build) Input avg.: 312500.00 rows, Input std.dev.: 380.00%
       │   Distribution: PARTITIONED
       ├─ RemoteSource[3]
       │      Outputs: [n_nationkey:integer, $hashvalue:bigint]
       │      Estimates:
       │      Output: 25 rows (350B), PeakMemory: 64KB (0.01%), WallTime: 63.63us (0.00%)
       │      Input avg.: 0.52 rows, Input std.dev.: 379.96%
       └─ LocalExchange[HASH][$hashvalue_0_4] ("c_nationkey")
          │   Outputs: [c_nationkey:integer, $hashvalue_0_4:bigint]
          │   Estimates: {rows: 15000000 (57.22MB)}
          │   Output: 30000000 rows (400.54MB), PeakMemory: 10MB (1.84%), WallTime: 1.81s (17.93%)
          └─ RemoteSource[4]
                 Outputs: [c_nationkey:integer, $hashvalue_0_5:bigint]
                 Estimates:
                 Output: 15000000 rows (200.27MB), PeakMemory: 3MB (0.67%), WallTime: 191.32ms (1.90%)
                 Input avg.: 312500.00 rows, Input std.dev.: 380.00%

The LEFT Join operator shows PeakMemory: 515MB (93.68%), confirming it is the performance bottleneck. The build-side input average (312,500 rows per task) is far larger than the probe-side (0.52 rows per task).

After optimization — RIGHT JOIN execution plan (Fragment 2)

After applying either method above, run EXPLAIN ANALYZE again:

Fragment 2 [HASH]
    Output: 96 rows (864B), PeakMemory: 12MB, WallTime: 4.27us, Input: 15000025 rows (200.27MB); per task: avg.: 2500004.17 std.dev.: 2410891.74
    Output layout: [count_0_2]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_2:bigint]
    │   Estimates: {rows: ? (?)}
    │   Output: 192 rows (1.69kB), PeakMemory: 456B (0.00%), WallTime: 5.31ms (0.08%)
    │   count_2 := count(*)
    └─ RIGHT Join[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue_0_4]
       │   Outputs: []
       │   Estimates: {rows: 15000000 (0B)}
       │   Output: 15000025 rows (350B), PeakMemory: 889KB (3.31%), WallTime: 3.15s (48.66%)
       │   Left (probe) Input avg.: 312500.00 rows, Input std.dev.: 380.00%
       │   Right (build) Input avg.: 0.52 rows, Input std.dev.: 379.96%
       │   Distribution: PARTITIONED
       ├─ RemoteSource[3]
       │      Outputs: [c_nationkey:integer, $hashvalue:bigint]
       │      Estimates:
       │      Output: 15000000 rows (200.27MB), PeakMemory: 3MB (15.07%), WallTime: 634.81ms (9.81%)
       │      Input avg.: 312500.00 rows, Input std.dev.: 380.00%
       └─ LocalExchange[HASH][$hashvalue_0_4] ("n_nationkey")
          │   Outputs: [n_nationkey:integer, $hashvalue_0_4:bigint]
          │   Estimates: {rows: 25 (100B)}
          │   Output: 50 rows (700B), PeakMemory: 461KB (1.71%), WallTime: 942.37us (0.01%)
          └─ RemoteSource[4]
                 Outputs: [n_nationkey:integer, $hashvalue_0_5:bigint]
                 Estimates:
                 Output: 25 rows (350B), PeakMemory: 64KB (0.24%), WallTime: 76.34us (0.00%)
                 Input avg.: 0.52 rows, Input std.dev.: 379.96%

The execution plan now shows a RIGHT Join operator. PeakMemory drops from 515 MB to 889 KB—a reduction of over 99%. The small nation table (25 rows) is now on the build side, and the large customer table probes the hash table.