All Products
Search
Document Center

PolarDB:Join condition pushdown

Last Updated:Jan 30, 2024

PolarDB for MySQL supports the join condition pushdown feature. After the outer join conditions of a derived table are pushed down, execution plans of the derived table can use indexes more efficiently, which can greatly improve the performance of complex queries.

Prerequisites

The cluster is of PolarDB for MySQL 8.0 and the revision version is 8.0.2.2.10 or later.

Background information

Derived tables (or inline views) are widely used in complex analytical queries. It can simplify SQL statement construction and describe query semantics in a straightforward manner. In the native MySQL, if a derived table cannot be expanded in outer queries (including operations such as GROUP BY and aggregate functions), it must be executed in materialized mode. If an SQL statement involves enormous data (for example, a large amount of table data needs to be scanned), the execution efficiency is very low. After the outer join conditions of the derived table are pushed down, execution plans of the derived table can use indexes more efficiently, which can greatly improve the performance of complex queries.

Scenarios

  • In complex queries, derived tables and outer tables that are joined in a nested loop manner. Because joined columns are at the inner layer of the derived table, indexes can be used to accelerate the materialization of the inner table. Accurate statistics are required to ensure that a large amount of data can be filtered out at the inner layer after join conditions are pushed down.

  • When join conditions are pushed down in inner queries and if indexes can be effectively used and a large amount of data is filtered out, a more efficient execution plan is generated. The join condition pushdown feature depends on the cost calculation capability of the optimizer to intelligently determine whether to push down outer join conditions.

Usage

You can use the loose_join_predicate_pushdown_opt_mode parameter to enable the join condition pushdown feature. For more information, see Specify cluster and node parameters.

Parameter

Level

Description

loose_join_predicate_pushdown_opt_mode

Global

Specifies whether to enable the join condition pushdown feature. Default value: REPLICA_ON. Valid values:

  • ON: enables the join condition pushdown feature.

  • REPLICA_ON: enables the join condition pushdown feature only for read-only nodes.

  • OFF: disables the join condition pushdown feature.

Examples

Original query

In the original query, because the os derived table needs to be fully materialized and no filter conditions can provide a high selection rate, it takes a long time to materialize the os derived table. This query takes about 65 seconds.

SELECT *
FROM (
  SELECT *
  FROM sample_table.tb_order
  WHERE create_date >= DATE_SUB(CAST('2022-12-05 15:12:05' AS datetime), INTERVAL 5 MINUTE)
    AND product_type IN (2, 4)
) o
  LEFT JOIN (
    SELECT *
    FROM sample_table.tb_order_detailed
    WHERE update_time >= DATE_SUB('2022-12-05 15:12:05', INTERVAL 50 MINUTE)
  ) od
  ON o.order_id = od.order_id
  LEFT JOIN (
    SELECT t.*, row_number() OVER (PARTITION BY detail_id ORDER BY update_date DESC) AS rn
    FROM sample_table.tb_order_sku t
    WHERE update_date >= DATE_SUB('2022-12-05 15:12:05', INTERVAL 50 MINUTE)
      AND coalesce(product_type, '0') <> '5'
  ) os
  ON od.id = os.detail_id;

Optimized query

After the join condition pushdown feature is enabled, the join condition od.id = os.detail_id between the o and os tables is pushed down to the inner layer of the os table. In this case, the os table can more efficiently use the detail_id index to filter out a large amount of data to improve execution efficiency. This query takes about 0.5 seconds.

SELECT *
FROM (
  SELECT *
  FROM db_order.tb_order
  WHERE create_date >= DATE_SUB(CAST('2022-12-05 15:12:05' AS datetime), INTERVAL 5 MINUTE)
    AND product_type IN (2, 4)
) o
  LEFT JOIN (
    SELECT *
    FROM db_order.tb_order_detailed
    WHERE update_time >= DATE_SUB('2022-12-05 15:12:05', INTERVAL 50 MINUTE)
  ) od
  ON o.order_id = od.order_id
  LEFT JOIN LATERAL((
    SELECT t.*, row_number() OVER (PARTITION BY detail_id ORDER BY update_date DESC) AS rn
    FROM db_order.tb_order_sku t
    WHERE update_date >= DATE_SUB('2022-12-05 15:12:05', INTERVAL 50 MINUTE)
      AND coalesce(product_type, '0') <> '5'
      AND od.id = detail_id
  )) os;