PolarDB supports transitive predicate generation. This feature allows for an in-depth analysis of predicate conditions within SQL queries, leading to the extraction and derivation of new predicates. The process encompasses both equivalence and non-equivalence relations. As a result, PolarDB can produce more precise predicate conditions, enhancing the query optimizer's ability to make critical decisions, such as index selection and join order. By deriving new predicates, the optimizer can explore a wider range of execution plans, thus optimizing query performance.
Prerequisites
This feature is supported in clusters that use the database engine MySQL 8.0.2 with revision version 8.0.2.2.23 or later. For more information about how to query the database engine version, see Query the engine version.
Scenarios
Predicate pushdown is enhanced by transitive predicate generation in PolarDB. This includes, but is not limited to, the following implementations:
HAVINGpushdown toWHEREcondition pushdown: This convertsHAVINGconditions suitable for pre-aggregation application intoWHEREconditions, reducing the data volume for aggregation. For more details, see condition pushdown (HAVING to WHERE).WHEREcondition pushdown to derived table: TheWHEREcondition from the outer query is pushed down to the derived table for earlier data filtering. For more details, see condition pushdown (WHERE condition pushdown to derived table).WHEREclause pushdown toINsubquery: TheWHEREcondition from the outer query is pushed down to theINsubquery to enhance subquery execution efficiency. For more details, see condition pushdown (WHERE clause pushdown to IN subquery).Join condition pushdown to materialized derived table: The join condition is applied directly to the materialized derived table, minimizing unnecessary data scans. For more details, see condition pushdown (join condition pushdown to materialized derived table).
The primary objective of transitive predicate generation is to generate as many single-table conditions as possible, rather than to derive relationships between variables through permutations and combinations. This approach allows the system to identify and create new, valid filter conditions for use in subsequent optimization stages, thus refining the dataset and enhancing query performance and efficiency. The process not only bolsters PolarDB predicate pushdown capabilities but also equips the query optimizer to more intelligently handle complex queries, ultimately boosting system performance.
Transitive predicate generation can be applied in the following scenarios:
Simple transitive predicate generation for both non-equivalence and equivalence relations. Example:
-- Original SQL SELECT * FROM t1, v1 WHERE v1.a > t1.a AND t1.a > 1 -- Derived SQL SELECT * FROM t1, v1 WHERE v1.a > t1.a AND t1.a > 1 AND v1.a > 1 -- Original SQL SELECT * FROM t1, v1 WHERE v1.b < t1.c AND t1.c < t2.c AND t2.c = 1; -- Derived SQL SELECT * FROM t1, v1 WHERE v1.b < t1.c AND t1.c < t2.c AND t2.c = 1 AND v1.b < 1 AND t1.c < 1;Expressions adhering to the commutative law are also eligible for transitive predicate generation. Example:
-- Original SQL SELECT * FROM t1, v2 WHERE v2.c1 < t1.c2 + t1.c1 AND t1.c1 + t1.c2 < 2; -- Derived SQL SELECT * FROM t1, v2 WHERE v2.c1 < t1.c2 + t1.c1 AND t1.c1 + t1.c2 < 2 AND v2.c1 < 2;Complex transitive predicate generation based on equivalence relations. Example:
-- Original SQL SELECT * FROM t3, v3 WHERE v3.c2 = t3.c2 AND t3.c2 LIKE "%00%"; -- Derived SQL SELECT * FROM t3, v3 WHERE v3.c2 = t3.c2 AND t3.c2 LIKE "%00%" AND v3.c2 LIKE "%00%"; -- Original SQL SELECT * FROM t1, v2 WHERE v2.c1 = t1.c2 AND t1.c2 IN (1,5,7); -- Derived SQL SELECT * FROM t1, v2 WHERE v2.c1 = t1.c2 AND t1.c2 IN (1,5,7) AND v2.c1 IN (1,5,7);Derivation of MIN/MAX in HAVING clauses. Example:
-- Original SQL SELECT * FROM t1 GROUP BY a,b,c HAVING MAX(b) > 20; -- Derived SQL SELECT * FROM t1 WHERE b > 20 GROUP BY a,b,c; -- Original SQL SELECT * FROM t1 GROUP BY a,b,c HAVING MIN(b) < 20; -- Derived SQL SELECT * FROM t1 WHERE b < 20 GROUP BY a,b,c;
Limits
Transitive predicate generation must be uniform when the same type of comparison conditions are present. Example:
CREATE TABLE t1(c1 INT, c2 INT); CREATE TABLE t2(c1 INT, c2 VARCHAR(64)); CREATE view v2 AS SELECT * FROM t2; SELECT * FROM v2,t1 WHERE v2.c1 > t1.c2 AND t1.c2 > v2.c2;NoteIn the sample SQL,
v2.c2is ofVARCHARtype. Although it seems possible to transmit non-equivalence relations through the intermediate variablet1.c2, becausev2.c1is ofINTtype andv2.c2is ofVARCHARtype,t1.c2as anINTtype is compared withv2.c1andv2.c2using different value methods. Therefore, forced derivation does not conform to the original SQL semantics.The derivation of
MIN/MAXfunctions is limited to queries with a singleMIN/MAXaggregate function. If other similar aggregate functions are present, derived predicate conditions in theWHEREclause may alter the results of these functions, leading to discrepancies between the derived and original SQL. Example:SELECT a, MIN(b), AVG(c) FROM t1 GROUP BY a HAVING MIN(b) < 20;NoteIn the sample SQL, two aggregate functions are used:
AVG(c)andMIN(b). Applying the rule to includeb < 20in theWHEREclause will filter out rows not satisfyingb < 20during the scan of tablet1. This filtration does not impact the calculation ofAVG(c). However, for a given group, it can influence the resulting value ofAVG(c), leading to a discrepancy with the original query semantics. Consequently, scenarios that affect the outcomes of other aggregate functions do not support the derivation ofMIN/MAX.
Usage
Preparations
To utilize transitive predicate generation, configure the loose_predicate_deduce_mode parameter to enable predicate derivation. For configuration steps, see Configure cluster and node parameters.
Parameter name | Level | Description |
loose_predicate_deduce_mode | Global | Control switch for transitive predicate generation feature. Valid values:
|
The MIN/MAX derivation does not require a separate parameter. Enabling the condition pushdown feature for the HAVING clause to WHERE clause also activates the MIN/MAX derivation. For more information, see condition pushdown (HAVING to WHERE).
Example 1
-- Create example table
CREATE VIEW v1
AS
SELECT c1, c2, MAX(c3) AS max_c3
, AVG(c4) AS avg_c4
FROM t2
GROUP BY c1, c2
HAVING max_c3 > 10;
EXPLAIN FORMAT = TREE SELECT * FROM v1, t1 WHERE v1.c1 >= t1.c2 AND t1.c2 = v1.c2;
EXPLAIN
-> Inner hash join (t1.c2 = v1.c2)
-> Table scan on t1 (cost=0.18 rows=10)
-> Hash
-> Table scan on v1
-> Materialize
-> Filter: (max_c3 > 10)
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Filter: (t2.c1 >= t2.c2) (cost=0.75 rows=2)
-> Table scan on t2 (cost=0.75 rows=5)Prior to condition pushdown (WHERE condition pushdown to derived table), incorporate the step of transitive predicate generation. The expression v1.c1 >= t1.c2 and t1.c2 = v1.c2 allows for the derivation of the single-table condition v1.c1 > v1.c2 by transmitting non-equivalence and equivalence relations. Subsequently, the condition v1.c1 > v1.c2 is pushed down to the derived table through condition pushdown, transforming into the corresponding field t2.c1 >= t2.c2 within the derived table, thus facilitating early data filtering.
Example 2
SELECT a, b, ee
FROM (
SELECT b, a, MIN(c) AS ee
FROM t1
GROUP BY a, b
) dt
GROUP BY a, b
HAVING MIN(ee) < 20;
-- After derivation transformation ========>
SELECT a, b, ee
FROM (
SELECT b, a, MIN(c) AS ee
FROM t1
WHERE c < 20
GROUP BY a, b
) dt
GROUP BY a, b;
-- The following execution plan also shows the result after applying MIN/MAX derivation
EXPLAIN FORMAT = TREE SELECT a, b, ee
FROM (
SELECT b, a, MIN(c) AS ee
FROM t1
GROUP BY a, b
) dt
GROUP BY a, b
HAVING MIN(ee) < 20;
EXPLAIN
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Table scan on dt
-> Materialize
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Filter: (t1.c < 20) (cost=*** rows=***)
-> Table scan on t1 (cost=*** rows=***)In the outer query, the condition MIN(ee) < 20 is initially determined using MIN/MAX condition to establish dt.ee < 20. This condition is then pushed down to the derived table dt. Within this table, the HAVING condition MIN(c) < 20 leads to the derivation of t1.c < 20 via MIN/MAX. Through this sequence of transformations, the filter condition is brought closer to the data source, which speeds up the query process.