Left join is used to join tables. Hash join builds right tables, and left join does not reorder the left and right tables. This way, large amounts of data in the right table causes problems such as slow execution and excessive memory resource consumption. This topic uses examples to describe the scenarios in which right join can replace left join.

Background information

By default, AnalyticDB MySQL uses hash join to join tables. Hash join is implemented by using the right table to build a hash table, which is a resource-consuming computing process. Outer join that includes left join and right join is different from inner join. The order of left and right tables cannot be semantically exchanged. In a scenario where the right table contains large amounts of data, slow execution and high memory resource consumption may occur. In extreme scenarios such as large amount of data in the right table, the performance of the cluster may be affected, or an error Out of Memory Pool size pre cal may be reported during execution. In such cases, you can use the optimization methods provided in this topic to reduce resource consumption.

Scenarios

You can change left join to right join by modifying SQL statements or adding hints. The left table in the original left join is changed to the right table to build a hash table. In this case, if the right table is too large, the performance is affected. Therefore, we recommended that you optimize the table in the scenario where the left join is smaller and the right table is larger.

The concepts of small and large are relative, and are related to the associated columns and cluster resources. You can view the relevant parameters of an execution plan by using Explain Analyze, and determine whether right join can be used by focusing on the changes of parameters such as PeakMemory and WallTime. For more information, see Explain & Explain Analyze.

Usage notes

You can use one of the following methods to change left join to right join:
  • Modify SQL statements. For example, change the a left join b on a.col1 = b.col2 statement to the b right join a on a.col1 = b.col2 statement.
  • Specify the optimizer by adding a hint to convert left join to right join based on the resource consumption. In this method, the optimizer determines whether to convert left join to right join based on the estimated sizes of the left and right tables. The syntax is to add the /*+LEFT_TO_RIGHT_ENABLED=true*/ hint at the top of the SQL statement.

Examples

In the following example, nation is a small table that has 25 rows, and customer is a large table that has 15,000,000 rows. You can use Explain Analyze to view the execution plan of an SQL statement that includes left join.

explain analyze
select
  count(*)
from
  nation t1
  left join customer t2 on t1.n_nationkey = t2.c_nationkey
The following plan shows the join calculation of stage2. The left join operator contains the following information:
  • PeakMemory: 515MB (93.68%), WallTime: 4.34s (43.05%): The peak memory usage is 93.68%. You can determine that left join is the performance bottleneck of the SQL statement.
  • Left (probe) Input avg.: 0.52 rows; Right (build) Input avg.: 312500.00 rows: The right table is a large table, and the left table is a small table.
In this scenario, you can convert left join to right join to optimize this SQL statement.
Fragment 2 [HASH]
    Output: 48 rows (432B), PeakMemory: 516MB, WallTime: 6.52us, Input: 15000025 rows (200.27MB); per task: avg.: 2500004.17 std.dev.: 2410891.74
    Output layout: [count_0_2]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_2:bigint]
    │   Estimates: {rows: ? (?)}
    │   Output: 96 rows (864B), PeakMemory: 96B (0.00%), WallTime: 88.21ms (0.88%)
    │   count_2 := count(*)
    └─ LEFT Join[(`n_nationkey` = `c_nationkey`)][$hashvalue, $hashvalue_0_4]
       │   Outputs: []
       │   Estimates: {rows: 15000000 (0B)}
       │   Output: 30000000 rows (200.27MB), PeakMemory: 515MB (93.68%), WallTime: 4.34s (43.05%)
       │   Left (probe) Input avg.: 0.52 rows, Input std.dev.: 379.96%
       │   Right (build) Input avg.: 312500.00 rows, Input std.dev.: 380.00%
       │   Distribution: PARTITIONED
       ├─ RemoteSource[3]
       │      Outputs: [n_nationkey:integer, $hashvalue:bigint]
       │      Estimates: 
       │      Output: 25 rows (350B), PeakMemory: 64KB (0.01%), WallTime: 63.63us (0.00%)
       │      Input avg.: 0.52 rows, Input std.dev.: 379.96%
       └─ LocalExchange[HASH][$hashvalue_0_4] ("c_nationkey")
          │   Outputs: [c_nationkey:integer, $hashvalue_0_4:bigint]
          │   Estimates: {rows: 15000000 (57.22MB)}
          │   Output: 30000000 rows (400.54MB), PeakMemory: 10MB (1.84%), WallTime: 1.81s (17.93%)
          └─ RemoteSource[4]
                 Outputs: [c_nationkey:integer, $hashvalue_0_5:bigint]
                 Estimates: 
                 Output: 15000000 rows (200.27MB), PeakMemory: 3MB (0.67%), WallTime: 191.32ms (1.90%)
                 Input avg.: 312500.00 rows, Input std.dev.: 380.00%
Modify SQL statements to convert left join to right join:
select
  count(*)
from
  customer t2
  right join nation t1 on t1.n_nationkey = t2.c_nationkey
Add a hint to convert left join to right join:
/*+LEFT_TO_RIGHT_ENABLED=true,CASCADES_OPTIMIZER_ENABLED=false*/
select
  count(*)
from
  nation t1
  left join customer t2 on t1.n_nationkey = t2.c_nationkey

After one preceding SQL statement is executed by using Explain Analyze, left join is changed to right join in the execution plan. And the hint can be determined to take effect. After PeakMemory: 889KB (3.31%) is adjusted, the PeakMemory is reduced from 515 MB to 889 KB, which is no longer a computing hotspot.

Fragment 2 [HASH]
    Output: 96 rows (864B), PeakMemory: 12MB, WallTime: 4.27us, Input: 15000025 rows (200.27MB); per task: avg.: 2500004.17 std.dev.: 2410891.74
    Output layout: [count_0_2]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_2:bigint]
    │   Estimates: {rows: ? (?)}
    │   Output: 192 rows (1.69kB), PeakMemory: 456B (0.00%), WallTime: 5.31ms (0.08%)
    │   count_2 := count(*)
    └─ RIGHT Join[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue_0_4]
       │   Outputs: []
       │   Estimates: {rows: 15000000 (0B)}
       │   Output: 15000025 rows (350B), PeakMemory: 889KB (3.31%), WallTime: 3.15s (48.66%)
       │   Left (probe) Input avg.: 312500.00 rows, Input std.dev.: 380.00%
       │   Right (build) Input avg.: 0.52 rows, Input std.dev.: 379.96%
       │   Distribution: PARTITIONED
       ├─ RemoteSource[3]
       │      Outputs: [c_nationkey:integer, $hashvalue:bigint]
       │      Estimates: 
       │      Output: 15000000 rows (200.27MB), PeakMemory: 3MB (15.07%), WallTime: 634.81ms (9.81%)
       │      Input avg.: 312500.00 rows, Input std.dev.: 380.00%
       └─ LocalExchange[HASH][$hashvalue_0_4] ("n_nationkey")
          │   Outputs: [n_nationkey:integer, $hashvalue_0_4:bigint]
          │   Estimates: {rows: 25 (100B)}
          │   Output: 50 rows (700B), PeakMemory: 461KB (1.71%), WallTime: 942.37us (0.01%)
          └─ RemoteSource[4]
                 Outputs: [n_nationkey:integer, $hashvalue_0_5:bigint]
                 Estimates: 
                 Output: 25 rows (350B), PeakMemory: 64KB (0.24%), WallTime: 76.34us (0.00%)
                 Input avg.: 0.52 rows, Input std.dev.: 379.96%