×
Community Blog About Database Kernel | PolarDB Query Optimization: Predicate Pushdown

About Database Kernel | PolarDB Query Optimization: Predicate Pushdown

This article describes the definition of predicate pushdown and why databases should perform predicate pushdown.

By Zhuo Ren (Xiaozhuo)

The database query optimizer is like the brain of the entire system. The efficiency of executing an SQL statement can vary significantly based on different optimization decisions. Therefore, the optimizer is one of the core components and competitive advantages of a database system. Alibaba Cloud PolarDB for MySQL is a cloud-native database that can handles various user scenarios and loads, helping enterprises go online and increase their data value. To enhance its capabilities, we have focused on improving the query transformation functionality of PolarDB for MySQL.

This series of articles will focus on the query transformation capability of PolarDB for MySQL and introduce some of our accumulated work in this direction.

This is the fifth article in the PolarDB optimizer query transformation series. The first four articles describe the following topics:

1. Join elimination
2. Window functions
3. Join condition pushdown
4. IN-List transformation

Background

Predicate pushdown is an optimization technique that aims to push down the predicates (conditions) in a query to the data source for processing as much as possible. By doing so, it reduces the amount of data that needs to be read and processed, resulting in improved query efficiency. The main benefit of predicate pushdown is that it enables the early filtering of ineligible data during the query process, thus reducing data transmission and processing. As a result, query performance is significantly improved.

Why do databases perform predicate pushdown?

Databases perform predicate pushdown to address the challenges posed by large amounts of data and frequent query operations. Without predicate pushdown, query operations would be time-consuming and inefficient. Therefore, predicate condition pushdown is an important rule in the query transformation process of the database optimizer. It helps reduce the cost of subsequent query calculations and significantly improves query performance.

  • The above part is referenced from ChatGPT.

Pushing Predicate Conditions to Derived Tables

To achieve these goals, MySQL 8.0.22 and later versions support pushing predicate conditions to derived tables. When a derived table cannot be merged into an external query (such as when the derived table uses aggregation), pushing the external WHERE condition to the derived table can reduce the number of rows that need to be processed, thus accelerating the execution of the query. Here's an example:

SELECT i, j     
FROM (
        SELECT i           
        FROM t1     
        GROUP BY i         
) dt, t2     
WHERE i > 2         
AND j < 3;

====After transformation ====>

SELECT i, j
FROM (
        SELECT i
        FROM t1
        WHERE i > 2
        GROUP BY i
) dt
WHERE j < 3;

The implementation principle of the MySQL community version is that, during the prepare phase, after all transformations are completed, the conditions of the WHERE Clause in the current block are recursively evaluated from outer to inner. It is determined whether these conditions can be pushed down or partially pushed down to any materialized derived table. The specific code logic is as follows:

-> SELECT_LEX::prepare      
   -> push_conditions_to_derived_tables()         
      -> Loop each materialized table to process WHERE condition
            -> make_cond_for_derived() // Loop each materialized table to process WHERE condition
               ->extract_cond_for_table() // Extract conditions that are only relevant to the derived tables
               - push_past_window_functions()// Generate the condition that is pushed to the derived table HAVING Clause
               - push_past_group_by()// Generate the condition that is pushed to the derived table WHERE Clause
               - make_remainder_cond() // Generate the condition that is pushed to the derived table WHERE Clause

       -> The generated pushdown conditions can be pushed from top to down to the derived tables nested inside the derived tables
            ->push_conditions_to_derived_tables() //recursion

MySQL 8.0.29 and later versions introduce the optimization of conditional pushdown for derived tables that are used in UNION queries. While the restriction on materialized tables being limited to UNION is lifted, the following limitations are added:
• Any materialized derived table in the UNION must be a recursive common table expression.
• Predicate conditions containing nondeterministic expressions cannot be pushed down to derived tables.

Conditional Pushdown Based on Predicates of PolarDB Versions

Enhanced Pushdown of Predicate Conditions to Derived Tables

Based on the complex query scenarios of users, we have identified the need for enhanced pushdown capabilities in databases to accelerate user queries. In response, PolarDB has made significant improvements to the original implementation of predicate condition pushdown based on MySQL 8.0.2. These enhancements provide better and more powerful pushdown capabilities, including:

Pushdown of predicate conditions that include equivalent condition passing

The MySQL community version does not consider the impact of conditional equivalence passing when evaluating predicate condition pushdown. In practice, if a column satisfies the criteria for predicate condition pushdown, its equivalent column should also meet the pushdown criteria to further optimize the reduction of intermediate data and subsequent processing costs. To account for the impact of equivalent conditions, PolarDB retains the WHERE condition at the current level and preserves the conditions that have already been pushed down. In comparison to the cost of filters, maximizing the possibilities of additional pushdown brings greater performance benefits. For example, consider the following query scenario:

SELECT *
       FROM t1, (
                  SELECT x
                  FROM t2
                  GROUP BY x
               ) d_tab, t2
       WHERE t1.a = d_tab.x
               AND t1.a > 6;

       ==== After transformation ====>

      SELECT *
        FROM t1, (
                   SELECT x
                   FROM t2
                   WHERE x > 6
                   GROUP BY x
                ) d_tab
        WHERE t1.a = d_tab.x
                AND t1.a > 6;

For the condition t1.a > 6, the column t1.a does not rely on the derived table d_tab. However, due to the equivalent condition t1.a = d_tab.x, we can deduce that the condition t1.a > 6 can be pushed down to the derived table. Additionally, based on the mapping relationship, the condition t1.a > 6 is converted to the condition x > 6 and is used to filter the data in the materialized table d_tab. This reduces the amount of data and the computational cost of subsequent operations. In scenarios with large amounts of data and effective filtering conditions, the overall query performance is significantly improved.

Pushdown of predicate conditions to derived tables in UNION queries

The MySQL community version initially did not support pushdown of conditions to derived tables in UNION queries due to implementation restrictions. However, this limitation has been lifted in the PolarDB version. If a derived table is part of a UNION query, the conditions that can be pushed down are pushed down to the relevant query block based on the conditions of each sub-query block in the UNION.

SELECT f1
    FROM (
      SELECT (
          SELECT f1
          FROM t1
          LIMIT 1
        ) AS f1
      FROM t1
      UNION
      SELECT f2
      FROM t2
    ) dt
WHERE f1 = 1;

==== After transformation ====>

    SELECT f1
    FROM (
      SELECT (
          SELECT f1
          FROM t1
          LIMIT 1
        ) AS f1
      FROM t1
      UNION
      SELECT f2
      FROM t2
      WHERE f2 = 1
    ) dt
    WHERE f1 = 1

In the given SQL statement, we need to determine whether the WHERE condition f1 = 1 can be pushed down if the derived table is a UNION of two SELECT statements. In the case of SELECT#1 with a LIMIT, pushing down the condition would impact the number of rows in the result, so it cannot be pushed down to SELECT#1. However, SELECT#2 meets the requirements for pushdown. Therefore, the condition f1 = 1 can be pushed down to the WHERE clause of SELECT#2 and mapped as f2 = 1.

MySQL 8.0.29 and later versions introduce an optimization called "optimize when the condition is pushed down to UNION". If a sub-SELECT statement in the UNION does not support pushdown, the condition cannot be pushed down to any sub-SELECT statements in the UNION. In contrast, PolarDB supports partial pushdown to the relevant part of the UNION, ensuring semantic correctness while allowing for greater condition pushdown.

The pushed conditions can be further cascaded based on equivalence relationships

PolarDB also allows you to push conditions from the HAVING clause to the WHERE clause of the current query block. This enables data filtering before performing the GROUP BY operation, reducing subsequent computational costs and greatly improving query performance.

During the process of pushing predicate conditions down to derived tables, we only include the conditions that can be pushed down to the HAVING clause of the derived table. We then evaluate whether the conditions pushed down to the HAVING clause can be further pushed down to the WHERE clause. To maximize data filtering at an earlier stage, PolarDB checks if the conditions in all HAVING clauses of each query block can be pushed down to the WHERE clause. Additionally, we consider the passing of equivalent conditions and connect the logic of conditions pushed to derived tables, allowing conditions to be cascaded to inner query blocks as much as possible. Here's an example:

SELECT t1.a, MAX(t1.b)
    FROM t1
    GROUP BY t1.a
    HAVING t1.a > 2
    AND MAX(c) > 12;

==== After transformation ====>

    SELECT t1.a, MAX(t1.b)
    FROM t1
    WHERE t1.a > 2
    GROUP BY t1.a
    HAVING MAX(c) > 12;

Therefore, before considering pushing the WHERE condition to derived tables, PolarDB checks whether the HAVING condition can be pushed down to the WHERE condition. It also checks, from the outermost to the innermost query block, whether the condition can be pushed down to the innermost layer. Here's an example:

SELECT *
   FROM (
     SELECT f1, f2
     FROM t1
   ) dt
   GROUP BY f1
   HAVING f1 < 3
   AND f2 > 11
   AND MAX(f3) > 12;

   ==== After transformation ====>

   SELECT *
   FROM (
     SELECT f1, f2
     FROM t1
     WHERE f1 < 3
   ) dt
   WHERE f1 < 3
   GROUP BY f1
   HAVING f2 > 11
   AND MAX(f3) > 12;

Summary

The cloud-native PolarDB database has established a comprehensive transformation logic for pushing down predicate conditions. This logic also considers equivalence conditions during the pushdown checks. Moreover, in order to maximize the utilization of the relationship between conditions, the original query block retains the pushed-down conditions even after the predicate conditions are pushed down to the new query block. This allows for the utilization of more filter conditions in subsequent optimizations.

0 1 0
Share on

ApsaraDB

421 posts | 90 followers

You may also like

Comments

ApsaraDB

421 posts | 90 followers

Related Products