Push down join conditions to materialized derived tables - PolarDB

PolarDB can push down join conditions to materialized derived tables. This feature enables the transfer of condition expressions that adhere to pushdown rules from the JOIN statement's ON clause directly to the materialized derived tables. By advancing the data filtering process, it ensures that predicate conditions are applied to the data source as early as possible, thereby reducing the data volume for subsequent processing and enhancing query performance.

Note

The condition pushdown feature to materialized derived tables discussed in this topic differs from the standard join condition pushdown (JPP) feature, with distinct applicable scenarios. The pushdown technique here relies on the equivalence of WHERE predicates, allowing the derivation of single-table conditions dependent solely on materialized derived tables and their subsequent pushdown. This heuristic rule typically boosts SQL execution performance. In contrast, JPP focuses on ON predicates involving multi-table conditions, which are transformed into related subqueries based on cost estimation during pushdown, without a guaranteed performance improvement. For more information, see join condition pushdown.

Prerequisites

This feature is supported in clusters that use the following database engines. For more information about how to query the database engine version, see Query Version Number.

MySQL 8.0.1 with revision version 8.0.1.1.44 or later.
MySQL 8.0.2 with revision version 8.0.2.2.25 or later.

Limits

Pushdown is not allowed if the materialized derived table has a LIMIT clause. For example:

SELECT * FROM t1 LEFT JOIN (SELECT c, MAX(d) FROM t2 GROUP BY c LIMIT 2) dt ON t1.a < dt.a AND t1.a = 1;

Pushdown is unsupported if the columns of the outer WHERE condition expression or the columns mapped to the corresponding columns of the materialized derived table meet any of the following conditions:
- The column references a subquery or is non-deterministic. Example:
```
SELECT * FROM t1 LEFT JOIN (SELECT c, MAX(d) FROM t2 GROUP BY c) dt ON t1.a < dt.a AND t1.a = RAND();
```
- The column is part of a stored procedure or function. Example:
```
CREATE FUNCTION f1() RETURNS INT
BEGIN
...
END;

SELECT * FROM t1 LEFT JOIN (SELECT c, MAX(d) FROM t2 GROUP BY c) dt ON t1.a < dt.a AND t1.a = f1();
```

Usage

Preparations

Before utilizing the condition pushdown feature, configure the loose_join_cond_push_into_derived_mode parameter according to your business needs. For detailed instructions, see Set Cluster and Node Parameters.

The following table describes the parameter.

Parameter name

Level

Description

loose_join_cond_push_into_derived_mode

Global

The control switch for the condition pushdown feature from join conditions to Derived Tables. Valid values:

REPLICA_ON (default): enables the condition pushdown feature only on read-only nodes.
ON: enables the condition pushdown feature.
OFF: disables the condition pushdown feature.

Examples

Note

If all columns in the condition expression (or their equivalent columns) originate from the materialized derived table, the condition can be pushed down to it.

Execute the following code to create sample tables:

CREATE TABLE t1 (a INT, b INT, c INT, d INT);
CREATE TABLE t2 (e INT, f INT, g INT);

Enable the feature by setting loose_join_cond_push_into_derived_mode to ON and execute the following code:

EXPLAIN FORMAT=TREE SELECT * FROM t1 LEFT JOIN (SELECT * FROM t2) dt ON dt.x > t1.a WHERE t1.a = 1;

Sample result:

-> Left hash join (no condition)
    -> Filter: (t1.a = 1)  (cost=0.55 rows=1)
        -> Table scan on t1  (cost=0.55 rows=3)
    -> Hash
        -> Table scan on dt
            -> Materialize
                -> Filter: (t2.x > 1)  (cost=0.45 rows=1)
                    -> Table scan on t2  (cost=0.45 rows=2)

Note

In the query plan, the equivalence t1.a = 1 from the WHERE clause is successfully passed to the JOIN predicate, resulting in dt.x > 1 being effectively pushed down to the derived table dt. Consequently, the data volume in dt is reduced, leading to improved query performance.

Set loose_join_cond_push_into_derived_mode to OFF and execute the following code:

EXPLAIN FORMAT=TREE SELECT * FROM t1 LEFT JOIN (SELECT * FROM t2) dt ON dt.x > t1.a WHERE t1.a=1;

Sample result:

-> Left hash join (no condition)
    -> Filter: (t1.a = 1)  (cost=0.55 rows=1)
        -> Table scan on t1  (cost=0.55 rows=3)
    -> Hash
        -> Filter: (dt.x > 1)
            -> Table scan on dt
                -> Materialize
                    -> Table scan on t2  (cost=0.45 rows=2)

Note

With loose_join_cond_push_into_derived_mode set to OFF, the condition dt.x > 1 is not pushed down to the Derived Tables, and the filtering is performed only after the derived table is materialized.