By Renjie Lou (Renjie)
RuntimeFilter is a method that passes dynamically generated filter conditions through equivalence relationships to reduce the amount of data involved in Join computing. It is widely used in the industry.
The process for RuntimeFilter optimization to take effect is listed below:
The filtering principle of RuntimeFilter Objects belongs to True Negative, which does not guarantee that all irrelevant data can be filtered out in advance. Common RuntimeFilter Objects in the industry include MinMax Filter, InSet Filter, and Bloom Filter. Their construction costs, memory usage, and time consumption for a single search are much smaller than those of HashTables used for HashBasedJoin computing. This is the fundamental reason why RuntimeFilter can prefilter coarse data before joining.
This article describes the challenges of the entire RuntimeFilter technology from the following aspects: creating RuntimeFilter delivery rules, delivering RuntimeFilter Objects, and generating and using RuntimeFilter Objects.
RuntimeFilter is a filter condition generated by the producer and passed to the consumer for consumption during execution. In a broad definition, the computing node on each data link can be used as the production end of a RuntimeFilter and passed to the consumer using equivalent conditions. The lower left figure shows a fully connected Bushy Tree Inner Join Graph (with the same Join conditions). Each subquery produces a RuntimeFilter. In the lower right figure, the black line indicates the data flow, and the execution of Join depends on the data flow of Probe and Build. The orange line indicates the information flow. For example, 5 and 6 can mutually generate and consume RuntimeFilters.
In actual implementation, we often do not generate such full connectivity dependencies. On the one hand, this will generate circular dependencies and may not all take effect. On the other hand, both the production and consumption of RuntimeFilter have overhead, and we should choose the one with lower overhead and greater benefits. This includes the following three aspects:
The relationship between the producer and consumer cannot form a circular reference.
We introduced RuntimeFilter Object as a single logical entity before, but common distributed MPP database execution engines have single-server and multi-server parallelism. In the previous section, a logical RuntimeFilter Object in the execution plan diagram may be composed of M*N memory objects on N machine nodes. This complicates the propagation of RuntimeFilter Objects between the producer and consumer. This propagation breaks the original bottom-up data propagation model of the SQL execution engine.
RuntimeFilter Service is the most common implementation of RuntimeFilter Object delivery in the industry, including Impala and Doris. This implementation idea is to set up a dedicated service on the coordinator node (frontend node), which is responsible for receiving partial RuntimeFilter Objects generated by all producers, merging RuntimeFilter Objects, and broadcasting to consumers.
There are many choices to implement RuntimeFilter Objects. They are used to represent the result set on the production side. The following table describes several common implementations and applicable scenarios:
|Degree||Speed||Memory Usage/Number of Rows in Producer Result Set|
The preceding RuntimeFilter Objects are used in different scenarios. The system needs to estimate the result set of the producer to determine which type to build. However, in a real SQL operating environment, the distribution of data on the consumer side, fluctuating system loads, and machine hardware levels all have an impact on the optimal selection strategy.
For the consumer, it depends on the RuntimeFilter Objects generated by the producer to a certain extent, so consumers have the logic of waiting for the producer to complete. This is not a mandatory waiting logic. The worst situation is that consumers give up using RuntimeFilter to accelerate. This mainly depends on the waiting strategy of the system on the consumer side. The system can suspend Task delivery on the consumer data flow during the waiting period, or the system can immediately issue the Task, but the internal logic of the Task blocks itself. Finally, in terms of the final response time of the entire SQL, there are many trade-off scenarios for consumers to wait for producers. Waiting for producers is not necessarily the best choice.
AnalyticDB for MySQL SIP framework is used for collecting and transmitting information during SQL execution. The information transmitted can be data characteristics, statistics, or a temporary memory table. An abstract SIP framework does not need to be aware of the specific content or category of information it collects and transmits but only needs to have the ability to collect and send corresponding runtime information, eliminate Sessions, and produce optimal procedures for information transmission.
Runtime information is defined by type and granularity, independent of the subscriber of the information, and is a description of the information itself.
The information has the following two characteristics to reduce the cost of repeated production:
As shown in the following figure, Agg generates a HashSet during running. Agg can be used as a SIP message to the left table for SemiJoin to reduce the work of TableScan. It can derive a Histogram and use it as the input for the Radix HashJoin algorithm to adjust the slice size of Hash Builder.
There are three requirements for our implementation. First, define the derivation relationship between different information types. Second, define that the relationship between the publishing subscribers is a one-to-many relationship. Third, manage the information lifecycle and clean up the data after all subscribers have consumed it.
The entire framework consists of the publisher, subscriber, and channel. The information publisher is an operator or optimizer that collects information during execution and publishes it to the channel. The channel is a centralized module with a manager that manages and matches publish-subscribe relationships. After receiving information from the publisher, it determines which consumer or consumers to send to. The channel also has a service module that receives, processes, and distributes information. The subscriber can be an operator, optimizer, and scheduler, which is customized by rules based on application scenarios.
Generating information has a certain overhead, and we hope to use some characteristics of existing operator algorithms to collect runtime information at the minimum cost without interrupting the operator channel calculation. Depending on the characteristics of the operator algorithm, different types of information are generated in different ways.
Depending on the application scenario, there are various forms of the subscriber. Currently, our applications include operators, optimizers, and schedulers. There may be multiple subscribers in different modules of the optimizer, each subscribing to different information from different publishers. When executing, the subscriber has a weak dependency on the publisher's information flow. The subscriber has a blocking effect on the entire channel. Under normal circumstances, the subscriber waits for receiving information and making decisions before continuing channel execution. However, the information we transmit is used for tuning. Therefore, if no information is received during timeout under abnormal circumstances, in order not to affect the overall response time of the query, a short-circuit mechanism is required to cancel the blocking and continue execution.
The channel is divided into two modules:
AnalyticDB for MySQL is a new-generation cloud-native data warehouse that supports high concurrency and low latency queries. It is highly compatible with the MySQL protocol and SQL:92, SQL:99, and SQL:2003 standards. It allows multi-dimensional and real-time analysis and business exploration of large amounts of data. AnalyticDB for MySQL integrates database and big data technologies. It supports high-throughput real-time data addition, deletion, and modification, low-latency real-time analysis, and complex ETL workloads. In addition, it is compatible with upstream and downstream eco-link tools and is used to build enterprise-level report systems, data warehouses, and data service engines.
In most industry implementations, RuntimeFilter takes effect in the sub-graph dimension of a Join rather than the entire query dimension. For example, in Figure 2, RTF5 only takes effect in the sub-graph dimension of the Join in Node 2. The consumer is Node 5. In the global dimension, if the conditions of Join-0 and Join-2 are the same, RTF5 can at least be consumed by Node1 because traditional RuntimeFilters are applied to star models or snowflake schema and often join in different dimensions. Global is different from subqueries, and there are fewer optimizable scenarios. In practice, there are more complex query scenarios, and there are still many scenarios that can be optimized through global delivery. For example, in Q24 of TPC-DS, the result set of the filtered item table can be passed to filter the two large tables - store_returns and store_sales. Based on the 99 queries of TPC-DS, we investigated the application scenario of global equivalence relation. We found that compared with supporting join subquery dimension, the number of RuntimeFilter valid is extended by 23% (620 vs. 798) if supporting the global dimension.
This idea of making RuntimeFilter more applicable through global equivalence is similar to the global idea in Data/Join-Induced Predicates (DIP). DIP uses the global equivalence relationship between data distribution features (data statistics/data layout) and Join columns to determine partition cutting during the planning phase to reduce the amount of computing data. The global equivalence relationship here is not limited to determining partition cutting by data distribution characteristics but applies to generalized RuntimeFilter.
The RumtimeFilter Objects of AnalyticDB for MySQL are published and subscribed through the SIP framework. As shown in Figure 5, the SIP framework reduces the overhead of distributed delivery of the entire RuntimeFilter object by sharing, merging, and short-circuiting information and significantly reduces the impact of RuntimeFilter Object propagation on QPS and RT in high-concurrency online query scenarios.
We tested the optimization effect of RuntimeFilter on the Benchmark dataset TPC-DS 1TB. The following table calculates the total execution time, the amount of scanned data, and the percentage improved when RuntimeFilter is turned on and off. Figure 6 and Figure 7 are bar charts comparing the execution time and the amount of scanned data in the query dimension.
|RuntimeFilter On||RuntimeFilter Off||Percentage|
AnalyticDB for MySQL RuntimeFilter provides the following benefits:
ApsaraDB - January 9, 2023
Alibaba Clouder - December 21, 2020
ApsaraDB - July 3, 2019
ApsaraDB - July 2, 2020
ApsaraDB - July 23, 2021
ApsaraDB - October 21, 2020
AnalyticDB for MySQL is a real-time data warehousing service that can process petabytes of data with high concurrency and low latency.Learn More
ApsaraDB: Faster, Stronger, More SecureLearn More
An online MPP warehousing service based on the Greenplum Database open source programLearn More
An on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
More Posts by ApsaraDB