All Products
Search
Document Center

PolarDB:Transitive predicate generation (enhanced condition pushdown)

Last Updated:Jun 16, 2025

PolarDB supports transitive predicate generation. This feature allows for an in-depth analysis of predicate conditions within SQL queries, leading to the extraction and derivation of new predicates. The process encompasses both equivalence and non-equivalence relations. As a result, PolarDB can produce more precise predicate conditions, enhancing the query optimizer's ability to make critical decisions, such as index selection and join order. By deriving new predicates, the optimizer can explore a wider range of execution plans, thus optimizing query performance.

Prerequisites

This feature is supported in clusters that use the database engine MySQL 8.0.2 with revision version 8.0.2.2.23 or later. For more information about how to query the database engine version, see Query the engine version.

Scenarios

Predicate pushdown is enhanced by transitive predicate generation in PolarDB. This includes, but is not limited to, the following implementations:

Note

The primary objective of transitive predicate generation is to generate as many single-table conditions as possible, rather than to derive relationships between variables through permutations and combinations. This approach allows the system to identify and create new, valid filter conditions for use in subsequent optimization stages, thus refining the dataset and enhancing query performance and efficiency. The process not only bolsters PolarDB predicate pushdown capabilities but also equips the query optimizer to more intelligently handle complex queries, ultimately boosting system performance.

Transitive predicate generation can be applied in the following scenarios:

  • Simple transitive predicate generation for both non-equivalence and equivalence relations. Example:

    -- Original SQL
    SELECT * FROM t1, v1 WHERE v1.a > t1.a AND t1.a > 1
    -- Derived SQL
    SELECT * FROM t1, v1 WHERE v1.a > t1.a AND t1.a > 1 AND v1.a > 1
    
    -- Original SQL
    SELECT * FROM t1, v1 WHERE v1.b < t1.c AND t1.c < t2.c AND t2.c = 1;
    -- Derived SQL
    SELECT * FROM t1, v1 WHERE v1.b < t1.c AND t1.c < t2.c AND t2.c = 1 AND v1.b < 1 AND t1.c < 1;
  • Expressions adhering to the commutative law are also eligible for transitive predicate generation. Example:

    -- Original SQL
    SELECT * FROM t1, v2 WHERE v2.c1 < t1.c2 + t1.c1 AND t1.c1 + t1.c2 < 2;
    -- Derived SQL
    SELECT * FROM t1, v2 WHERE v2.c1 < t1.c2 + t1.c1 AND t1.c1 + t1.c2 < 2 AND v2.c1 < 2;
  • Complex transitive predicate generation based on equivalence relations. Example:

    -- Original SQL
    SELECT * FROM t3, v3 WHERE v3.c2 = t3.c2 AND t3.c2 LIKE "%00%";
    -- Derived SQL
    SELECT * FROM t3, v3 WHERE v3.c2 = t3.c2 AND t3.c2 LIKE "%00%" AND v3.c2 LIKE "%00%";
    
    -- Original SQL
    SELECT * FROM t1, v2 WHERE v2.c1 = t1.c2 AND t1.c2 IN (1,5,7);
    -- Derived SQL
    SELECT * FROM t1, v2 WHERE v2.c1 = t1.c2 AND t1.c2 IN (1,5,7) AND  v2.c1 IN (1,5,7);
  • Derivation of MIN/MAX in HAVING clauses. Example:

    -- Original SQL
    SELECT * FROM t1 GROUP BY a,b,c HAVING MAX(b) > 20;
    -- Derived SQL
    SELECT * FROM t1 WHERE b > 20 GROUP BY a,b,c;
    
    -- Original SQL
    SELECT * FROM t1 GROUP BY a,b,c HAVING MIN(b) < 20;
    -- Derived SQL
    SELECT * FROM t1 WHERE b < 20 GROUP BY a,b,c;

Limits

  • Transitive predicate generation must be uniform when the same type of comparison conditions are present. Example:

    CREATE TABLE t1(c1 INT, c2 INT);
    CREATE TABLE t2(c1 INT, c2 VARCHAR(64));
    CREATE view v2 AS SELECT * FROM t2;
    SELECT * FROM v2,t1 WHERE v2.c1 > t1.c2 AND t1.c2 > v2.c2;
    Note

    In the sample SQL, v2.c2 is of VARCHAR type. Although it seems possible to transmit non-equivalence relations through the intermediate variable t1.c2, because v2.c1 is of INT type and v2.c2 is of VARCHAR type, t1.c2 as an INT type is compared with v2.c1 and v2.c2 using different value methods. Therefore, forced derivation does not conform to the original SQL semantics.

  • The derivation of MIN/MAX functions is limited to queries with a single MIN/MAX aggregate function. If other similar aggregate functions are present, derived predicate conditions in the WHERE clause may alter the results of these functions, leading to discrepancies between the derived and original SQL. Example:

    SELECT a, MIN(b), AVG(c) FROM t1 GROUP BY a HAVING MIN(b) < 20;
    Note

    In the sample SQL, two aggregate functions are used: AVG(c) and MIN(b). Applying the rule to include b < 20 in the WHERE clause will filter out rows not satisfying b < 20 during the scan of table t1. This filtration does not impact the calculation of AVG(c). However, for a given group, it can influence the resulting value of AVG(c), leading to a discrepancy with the original query semantics. Consequently, scenarios that affect the outcomes of other aggregate functions do not support the derivation of MIN/MAX.

Usage

Preparations

To utilize transitive predicate generation, configure the loose_predicate_deduce_mode parameter to enable predicate derivation. For configuration steps, see Configure cluster and node parameters.

Parameter name

Level

Description

loose_predicate_deduce_mode

Global

Control switch for transitive predicate generation feature. Valid values:

  • REPLICA_ON: enables the transitive predicate generation feature only on read-only nodes.

  • ON: enables the transitive predicate generation feature.

  • OFF (default): disables the transitive predicate generation feature.

Note

The MIN/MAX derivation does not require a separate parameter. Enabling the condition pushdown feature for the HAVING clause to WHERE clause also activates the MIN/MAX derivation. For more information, see condition pushdown (HAVING to WHERE).

Example 1

-- Create example table
CREATE VIEW v1
AS
SELECT c1, c2, MAX(c3) AS max_c3
    , AVG(c4) AS avg_c4
FROM t2
GROUP BY c1, c2
HAVING max_c3 > 10;

EXPLAIN FORMAT = TREE SELECT * FROM v1, t1 WHERE v1.c1 >= t1.c2 AND t1.c2 = v1.c2;

EXPLAIN
-> Inner hash join (t1.c2 = v1.c2)
    -> Table scan on t1  (cost=0.18 rows=10)
    -> Hash
        -> Table scan on v1
            -> Materialize
                -> Filter: (max_c3 > 10)
                    -> Table scan on <temporary>
                        -> Aggregate using temporary table
                            -> Filter: (t2.c1 >= t2.c2)  (cost=0.75 rows=2)
                                -> Table scan on t2  (cost=0.75 rows=5)
Note

Prior to condition pushdown (WHERE condition pushdown to derived table), incorporate the step of transitive predicate generation. The expression v1.c1 >= t1.c2 and t1.c2 = v1.c2 allows for the derivation of the single-table condition v1.c1 > v1.c2 by transmitting non-equivalence and equivalence relations. Subsequently, the condition v1.c1 > v1.c2 is pushed down to the derived table through condition pushdown, transforming into the corresponding field t2.c1 >= t2.c2 within the derived table, thus facilitating early data filtering.

Example 2

SELECT a, b, ee
FROM (
    SELECT b, a, MIN(c) AS ee
    FROM t1
    GROUP BY a, b
) dt
GROUP BY a, b
HAVING MIN(ee) < 20;

-- After derivation transformation ========>
SELECT a, b, ee
FROM (
    SELECT b, a, MIN(c) AS ee
    FROM t1
  WHERE c < 20
    GROUP BY a, b
) dt
GROUP BY a, b;


-- The following execution plan also shows the result after applying MIN/MAX derivation
EXPLAIN FORMAT = TREE SELECT a, b, ee
FROM (
    SELECT b, a, MIN(c) AS ee
    FROM t1
    GROUP BY a, b
) dt
GROUP BY a, b
HAVING MIN(ee) < 20;

EXPLAIN
-> Table scan on <temporary>
    -> Aggregate using temporary table
        -> Table scan on dt
            -> Materialize
                -> Table scan on <temporary>
                    -> Aggregate using temporary table
                        -> Filter: (t1.c < 20)  (cost=*** rows=***)
                            -> Table scan on t1  (cost=*** rows=***)
Note

In the outer query, the condition MIN(ee) < 20 is initially determined using MIN/MAX condition to establish dt.ee < 20. This condition is then pushed down to the derived table dt. Within this table, the HAVING condition MIN(c) < 20 leads to the derivation of t1.c < 20 via MIN/MAX. Through this sequence of transformations, the filter condition is brought closer to the data source, which speeds up the query process.