Optimize Join Performance with Transitive Predicate Generation - PolarDB

Without transitive predicate generation, the optimizer sees only the conditions you wrote—it cannot infer that a filter on one table also restricts another. A query joining a large derived table may compute thousands of rows before the outer WHERE clause discards most of them. Transitive predicate generation fixes this: it propagates conditions across equality and inequality chains before execution, deriving new single-table filters the optimizer can apply earlier. The result is tighter index selection, smarter join order, and less data flowing through each stage of the plan.

Prerequisites

This feature requires the MySQL 8.0.2 database engine at revision version 8.0.2.2.23 or later. To check your current version, see Query the engine version.

How it works

The optimizer's goal is to generate as many single-table conditions as possible—not to enumerate all variable relationships through permutations. When it finds a chain like v1.a > t1.a and t1.a > 1, it derives v1.a > 1 as a new single-table condition and applies it before the join runs. The more single-table conditions the optimizer has, the more aggressively it can prune data at the scan level.

This capability enhances the following types of condition pushdown:

HAVING pushdown to WHERE — converts HAVING conditions applicable before aggregation into WHERE conditions, reducing the data volume for aggregation. See Condition pushdown (HAVING to WHERE).
WHERE condition pushdown to derived table — pushes WHERE conditions from the outer query into a derived table for earlier filtering. See Condition pushdown (WHERE condition pushdown to derived table).
WHERE clause pushdown to IN subquery — pushes WHERE conditions from the outer query into an IN subquery to improve subquery execution efficiency. See Condition pushdown (WHERE clause pushdown to IN subquery).
Join condition pushdown to materialized derived table — applies join conditions directly to materialized derived tables, reducing unnecessary data scans. See Condition pushdown (join condition pushdown to materialized derived table).

Enable transitive predicate generation

Set the loose_predicate_deduce_mode parameter at the cluster level. For configuration steps, see Configure cluster and node parameters.

Parameter	Level	Valid values
`loose_predicate_deduce_mode`	Global	`OFF` (default), `ON`, `REPLICA_ON`

OFF (default): disables transitive predicate generation.
ON: enables transitive predicate generation on all nodes.
REPLICA_ON: enables transitive predicate generation on read-only nodes only.

Note

MIN/MAX derivation does not require a separate parameter. Enabling the HAVING-to-WHERE condition pushdown feature also activates MIN/MAX derivation. See Condition pushdown (HAVING to WHERE).

Supported scenarios

Simple transitive predicate generation

Applies to both non-equivalence and equivalence relations:

-- Original
SELECT * FROM t1, v1 WHERE v1.a > t1.a AND t1.a > 1;
-- Derived: v1.a > 1 is added as a new single-table condition
SELECT * FROM t1, v1 WHERE v1.a > t1.a AND t1.a > 1 AND v1.a > 1;

-- Original
SELECT * FROM t1, v1 WHERE v1.b < t1.c AND t1.c < t2.c AND t2.c = 1;
-- Derived: propagates through the chain to add v1.b < 1 and t1.c < 1
SELECT * FROM t1, v1 WHERE v1.b < t1.c AND t1.c < t2.c AND t2.c = 1 AND v1.b < 1 AND t1.c < 1;

Commutative expressions

Expressions that satisfy the commutative law are also eligible:

-- Original
SELECT * FROM t1, v2 WHERE v2.c1 < t1.c2 + t1.c1 AND t1.c1 + t1.c2 < 2;
-- t1.c2 + t1.c1 and t1.c1 + t1.c2 are recognized as equivalent; v2.c1 < 2 is derived
SELECT * FROM t1, v2 WHERE v2.c1 < t1.c2 + t1.c1 AND t1.c1 + t1.c2 < 2 AND v2.c1 < 2;

Complex transitive predicate generation based on equivalence relations

Equality chains allow more complex predicates to propagate:

-- Original
SELECT * FROM t3, v3 WHERE v3.c2 = t3.c2 AND t3.c2 LIKE "%00%";
-- The LIKE condition propagates through the equality to v3.c2
SELECT * FROM t3, v3 WHERE v3.c2 = t3.c2 AND t3.c2 LIKE "%00%" AND v3.c2 LIKE "%00%";

-- Original
SELECT * FROM t1, v2 WHERE v2.c1 = t1.c2 AND t1.c2 IN (1,5,7);
-- The IN list propagates through the equality to v2.c1
SELECT * FROM t1, v2 WHERE v2.c1 = t1.c2 AND t1.c2 IN (1,5,7) AND v2.c1 IN (1,5,7);

MIN/MAX derivation from HAVING clauses

MIN() and MAX() aggregate conditions in a HAVING clause are converted into WHERE conditions on the base table:

-- Original
SELECT * FROM t1 GROUP BY a, b, c HAVING MAX(b) > 20;
-- Derived: filters rows at scan time instead of after aggregation
SELECT * FROM t1 WHERE b > 20 GROUP BY a, b, c;

-- Original
SELECT * FROM t1 GROUP BY a, b, c HAVING MIN(b) < 20;
-- Derived
SELECT * FROM t1 WHERE b < 20 GROUP BY a, b, c;

Limitations

Transitive predicate generation is not applied when any of the following conditions are true:

Type mismatch across the derivation chain

Derivation is skipped when all of the following apply:

The same intermediate variable appears in two comparisons.
The two compared columns have different types (for example, INT and VARCHAR).
Forcing derivation would violate the original query semantics.

In the example below, t1.c2 (INT) is compared with both v2.c1 (INT) and v2.c2 (VARCHAR). Because the comparison methods differ, derivation is not applied:

CREATE TABLE t1(c1 INT, c2 INT);
CREATE TABLE t2(c1 INT, c2 VARCHAR(64));
CREATE VIEW v2 AS SELECT * FROM t2;

-- Derivation is NOT applied because t1.c2 is compared against different types
SELECT * FROM v2, t1 WHERE v2.c1 > t1.c2 AND t1.c2 > v2.c2;

Multiple aggregate functions alongside MIN/MAX

MIN/MAX derivation is skipped when all of the following apply:

The query contains a MIN() or MAX() in the HAVING clause.
At least one other aggregate function (such as AVG) is also present.
Adding a WHERE condition would alter the result of the other aggregate function.

In the example below, adding WHERE b < 20 would exclude rows from the AVG(c) calculation, changing its result:

-- Derivation is NOT applied because AVG(c) is present alongside MIN(b)
SELECT a, MIN(b), AVG(c) FROM t1 GROUP BY a HAVING MIN(b) < 20;

Verify with EXPLAIN

Use EXPLAIN FORMAT = TREE to confirm that derived conditions appear in the execution plan.

Example 1: Transitive predicate generation combined with condition pushdown

This example shows how a non-equivalence and an equivalence relation combine to derive a single-table condition, which is then pushed into a derived table.

View definition:

CREATE VIEW v1 AS
SELECT c1, c2, MAX(c3) AS max_c3, AVG(c4) AS avg_c4
FROM t2
GROUP BY c1, c2
HAVING max_c3 > 10;

Query:

EXPLAIN FORMAT = TREE SELECT * FROM v1, t1 WHERE v1.c1 >= t1.c2 AND t1.c2 = v1.c2;

Derivation steps:

Transitive predicate generation — the conditions v1.c1 >= t1.c2 and t1.c2 = v1.c2 together imply v1.c1 > v1.c2, a new single-table condition on v1. After this step, the effective query becomes:
```
SELECT * FROM v1, t1 WHERE v1.c1 >= t1.c2 AND t1.c2 = v1.c2 AND v1.c1 > v1.c2;
```
Condition pushdown — the derived condition v1.c1 > v1.c2 is pushed into the derived table v1 and translated to the base-table column t2.c1 >= t2.c2.
Early filtering — the filter runs before aggregation, reducing the rows that GROUP BY and HAVING need to process.

Expected execution plan (the filter on t2 appears inside the aggregation step):

-> Inner hash join (t1.c2 = v1.c2)
    -> Table scan on t1  (cost=0.18 rows=10)
    -> Hash
        -> Table scan on v1
            -> Materialize
                -> Filter: (max_c3 > 10)
                    -> Table scan on <temporary>
                        -> Aggregate using temporary table
                            -> Filter: (t2.c1 >= t2.c2)  (cost=0.75 rows=2)
                                -> Table scan on t2  (cost=0.75 rows=5)

Example 2: Chained MIN/MAX derivation across nested derived tables

This example shows MIN/MAX derivation applied at two levels.

Original query:

SELECT a, b, ee
FROM (
    SELECT b, a, MIN(c) AS ee
    FROM t1
    GROUP BY a, b
) dt
GROUP BY a, b
HAVING MIN(ee) < 20;