Query Performance Optimization – Runtime Filter

By Bairui

JOIN is a common operation in relational database queries. By joining several tables, users can obtain information more efficiently on the basis of complying with the database design paradigm. In analysis queries, the Join operation between large tables or between large and small tables is usually implemented through a Hash Join algorithm, which is usually one of the bottlenecks of query performance. Therefore, the most important task of the computing engine is to find a way to optimize the query performance of the Join operation.

Runtime Filter Introduction

Basic Principles

Runtime Filter is a widely used optimization technology in databases mentioned in [4]. Its basic principle is to significantly reduce data transmission and computing in the join operation by filtering out input data that does not match the join in advance for the probe side, thus reducing the overall execution time. The following example shows the original execution plan, where sales is a fact table and items is a latitude table:

SELECT * FROM sales JOIN items ON sales.item_id = items.id WHERE items.price > 100

As shown in the left half of the above figure, when performing the join, it is not only necessary to transfer the full sales data to operators of the join but also each row of sales data requires join operations, including hash values and comparison operations. If here items.price > 100 has a high selectivity (for example, 50%), so most of the data in the sales table will definitely not be joined. If the data is filtered out in advance, the data transmission and computing overhead can be reduced.

The right half of the preceding figure is the added execution plan after the runtime filter. As shown in the figure, a new RuntimeFilterBuilder operator is added to the process of pulling data from the build side of the join. This operator is used to collect build-side information, form a runtime filter, and send the collected information to the scan nodes of the probe side. By doing so, the probe nodes can reduce the amount of input data in scan operation, achieving performance improvement.

Effect of Runtime Filters on Join Reorder

In most current systems, the operators required by a runtime filter are inserted into a physical execution plan after CBO phase of the optimizer by using a rule-based optimization method. However, as pointed out in [3], if the effect of runtime filter on the execution plan is taken into account in the CBO phase, the execution plan can be further optimized. The following shows an example:

In this example, figure (a) shows an original query and the join operation is performed on the three tables (k, mk, and t). Figure (b) shows the physical execution plan obtained from the CBO phase regardless of the runtime filter. The preceding figure (c) shows how to add the runtime filter to the physical execution plan based on the rules in (b). Figure (d) shows the physical execution plan obtained by placing the runtime filter in the CBO phase. It can be seen that the optimal physical execution plan obtained in figure (d) has a smaller final cost than the plan obtained in figure (c).

However, if the runtime filter is directly added to CBO, an exponential increase in the search space of the optimizer will be caused, since the CBO phase of existing optimizers is mostly based on dynamic programming algorithms. If a runtime filter is put into CBO, the optimal solution for the child plan depends on the combination of the filter pushed by the parent nodes in the query plan and ways that the runtime filter is applied to the table. Such a combination will cause an explosion of the search space. For star queries and snowflake queries, which are queries that join by associating latitude and fact tables by primary and foreign keys, certain join operations are equivalent after adding runtime filters. Therefore, the linear growth of the search space of the optimizer in the CBO phase is ensured.

Runtime Filter in PolarDB-X

As an HTAP database, PolarDB-X not only meets the requirements of high-performance online transaction processing (OLTP) scenarios, but also supports high-performance analysis of massive data. To meet customers' requirements for big data analysis, Runtime Filters in our exclusive MPP engine have been implemented. Its basic principles are the same as above. However, some special optimizations for distributed database scenarios have been made.

Selection of the Runtime Filter Type

In PolarDB-X, the bloom filter[1] is selected to filter data, which has several advantages as follow:

No dependency on types: This feature reduces the complexity of handling multiple types.
Low space complexity: This feature improves transmission efficiency and memory overhead.
Low time complexity: The time complexity includes both the overhead of generating the bloom filter and the time overhead of checking for its presence. The lower time complexity ensures that too much overhead is not introduced

Certainly, other systems also contain some other kinds of filters. For example, in Spark SQL, if the data is filtered by partitioned columns and the data volume on the build side is small, the full amount of input data will be used for dynamic partition pruning. If the queried data is indexed, such as Parquet or ORC, it generates a simple filter such as min/max to filter data. However, most of these filters are scenario-specific and not general enough.

Cost Estimation for Runtime Filter Generation

The generation, transmission, and checking of a Runtime Filter introduce additional overhead. If the filter id abused without restraint, performance degradation will occur. Due to the complexity of cost estimation and implementation, most open-source systems only support implementing Runtime Filter in broadcast join, for example, Trino (formerly Presto). The advantage of this approach is that it is easy to implement with relatively small changes to the existing system, but many optimization opportunities will be lost.

In PolarDB-X, Runtime Filter generation rules are effectively combined with the optimizer statistics to determine whether a Runtime Filter needs to be generated based on data from multiple dimensions:

The size of the data on the probe side. If the amount of data on the probe side is too small, the performance improvement cannot make up for the additional overhead of bloom filters even if a lot of data is filtered. If this is the case, the generation of bloom filters will be dropped.
The size of the bloom filter. The size of a bloom filter is determined by the number of inputs and FPP, namely error rate, and is proportional to the number of inputs. A too-large bloom filter will not only increase the amount of data transmitted over the network, but also increase memory usage. Therefore, the size of the bloom filter will be limited to a certain range.
Filter percentage. When the filter ratio of the generated bloom filter is too small, pushing it down to the probe side of the join will not have any effect, and the computing of the accurate filter ratio is a more complicated process. Here an approximate formula is used to estimate the filterability: 1 – buildNdv * (1+fpp)/probeNdv. A runtime filter is generated only when the filter ratio is greater than a certain threshold.

Execution of the Runtime Filter

The MPP engine of PolarDB-X is a distributed computing engine that is created for interactive analysis. Unlike Spark and Flink, it uses the push-based execution model. The advantage of this model is that the intermediate data is not written to disk, greatly reducing the latency of waiting during computing, yet also increasing the complexity of developing this feature of the Runtime Filter. Different from most open-source computing engines, the Runtime Filter in PolarDB-X supports broadcast join as well as other distributed join algorithms. Here one of the above SQL statements will be used as an example:

SELECT * FROM sales JOIN items ON sales.item_id = items.id WHERE items.price > 100

The physical execution logic after the runtime filter is enabled is as follows:

As shown in the figure, the build side sends the generated bloom filters to the coordinator. The coordinator waits for each partition's bloom filter to be sent before performing a merge operation and sends the merged bloom filter to the FilterExec operator to achieve the filtering effect. The bloom filter merged by the coordinator is as large as the bloom filter of a single partition, but is transmitted only once for each probe side, greatly reducing data transmission. Meanwhile, FilterExec does not block the process of waiting for bloom filters but receives bloom filters asynchronously, thus minimizing the impact of bloom filter generation on latency.

To further reduce data transmission, the bloom filter will be pushed down to the DN layer by implementing UDFs and data will be filtered on the DN side, greatly reducing the network overhead. As shown in the following figure, PolarDB-X pushes the bloom filter down to the DN side, reducing the amount of data pulled from DN and the overhead of network transmission and data parsing.

Effect Evaluation

The comparison results of the Runtime Filter on the TPC-H 100G dataset are as shown below:

It can be seen that the performance on time-consuming large queries such as Q9 and Q21 has been improved by two to three times. For other medium-sized queries, the performance is also doubled, with the overall performance improved by about 20%.

Community

Query Performance Optimization – Runtime Filter

Runtime Filter Introduction

Basic Principles

Effect of Runtime Filters on Join Reorder

Runtime Filter in PolarDB-X

Selection of the Runtime Filter Type

Cost Estimation for Runtime Filter Generation

Execution of the Runtime Filter

Effect Evaluation

References

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for PostgreSQL

PolarDB for Xscale

PolarDB for MySQL

ApsaraDB for HBase