Alibaba Cloud AnalyticDB for MySQL: Create Ultimate RuntimeFilter Capability

This article describes the challenges of the RuntimeFilter technology from creating RuntimeFilter delivery rules, delivering RuntimeFilter Objects, and using RuntimeFilter Objects.

By Renjie Lou (Renjie)

1. RuntimeFilter

1.1 What Is RuntimeFilter?

RuntimeFilter is a method that passes dynamically generated filter conditions through equivalence relationships to reduce the amount of data involved in Join computing. It is widely used in the industry.

Figure 1

The process for RuntimeFilter optimization to take effect is listed below:

Generate a dynamic filter condition delivery rule based on the Join condition of a query. The rule determines the producer (the subquery that generates the filter condition) and the consumer (the subquery that applies the filter condition).
Execute the producer first to dynamically build a RuntimeFilter Object based on the real-time result set
Pass this RuntimeFilter Object to the consumer
The consumer uses the RuntimeFilter Object for filtering. This reduces the amount of data involved in subsequent computing in advance. Typically, consumers can use a RuntimeFilter Object to filter out some data in advance before the data is shuffled in a distributed manner. This reduces the amount of data transmitted over the network. In extreme cases, RuntimeFilter Objects may be pushed down to the storage side to hit index cutting, directly reducing the amount of disk-scanned data in TableScan.

The filtering principle of RuntimeFilter Objects belongs to True Negative, which does not guarantee that all irrelevant data can be filtered out in advance. Common RuntimeFilter Objects in the industry include MinMax Filter, InSet Filter, and Bloom Filter. Their construction costs, memory usage, and time consumption for a single search are much smaller than those of HashTables used for HashBasedJoin computing. This is the fundamental reason why RuntimeFilter can prefilter coarse data before joining.

1.2 Difficulties with RuntimeFilter

This article describes the challenges of the entire RuntimeFilter technology from the following aspects: creating RuntimeFilter delivery rules, delivering RuntimeFilter Objects, and generating and using RuntimeFilter Objects.

1.2.1 How to Select the Producer and Consumer of RuntimeFilter Objects

RuntimeFilter is a filter condition generated by the producer and passed to the consumer for consumption during execution. In a broad definition, the computing node on each data link can be used as the production end of a RuntimeFilter and passed to the consumer using equivalent conditions. The lower left figure shows a fully connected Bushy Tree Inner Join Graph (with the same Join conditions). Each subquery produces a RuntimeFilter. In the lower right figure, the black line indicates the data flow, and the execution of Join depends on the data flow of Probe and Build. The orange line indicates the information flow. For example, 5 and 6 can mutually generate and consume RuntimeFilters.

Figure 2

In actual implementation, we often do not generate such full connectivity dependencies. On the one hand, this will generate circular dependencies and may not all take effect. On the other hand, both the production and consumption of RuntimeFilter have overhead, and we should choose the one with lower overhead and greater benefits. This includes the following three aspects:

Select valuable producers to quickly generate RuntimeFilters with good filtering effects
Select the most root consumers to maximize the filtering effect of this RuntimeFilter at the SQL execution level

The relationship between the producer and consumer cannot form a circular reference.

1.2.2 How RuntimeFilter Objects Are Passed in the Distributed Execution Framework

We introduced RuntimeFilter Object as a single logical entity before, but common distributed MPP database execution engines have single-server and multi-server parallelism. In the previous section, a logical RuntimeFilter Object in the execution plan diagram may be composed of M*N memory objects on N machine nodes. This complicates the propagation of RuntimeFilter Objects between the producer and consumer. This propagation breaks the original bottom-up data propagation model of the SQL execution engine.

RuntimeFilter Service is the most common implementation of RuntimeFilter Object delivery in the industry, including Impala and Doris. This implementation idea is to set up a dedicated service on the coordinator node (frontend node), which is responsible for receiving partial RuntimeFilter Objects generated by all producers, merging RuntimeFilter Objects, and broadcasting to consumers.

1.2.3 Efficient Generation and Use of RuntimeFilter Objects

There are many choices to implement RuntimeFilter Objects. They are used to represent the result set on the production side. The following table describes several common implementations and applicable scenarios:

	Degree	Speed	Memory Usage/Number of Rows in Producer Result Set
InSet Filter	Precise	Fast	Large
BloomFilter	Blur	Moderate	Moderate
MinMax Filter	Rough	Fast	Small

The preceding RuntimeFilter Objects are used in different scenarios. The system needs to estimate the result set of the producer to determine which type to build. However, in a real SQL operating environment, the distribution of data on the consumer side, fluctuating system loads, and machine hardware levels all have an impact on the optimal selection strategy.

For the consumer, it depends on the RuntimeFilter Objects generated by the producer to a certain extent, so consumers have the logic of waiting for the producer to complete. This is not a mandatory waiting logic. The worst situation is that consumers give up using RuntimeFilter to accelerate. This mainly depends on the waiting strategy of the system on the consumer side. The system can suspend Task delivery on the consumer data flow during the waiting period, or the system can immediately issue the Task, but the internal logic of the Task blocks itself. Finally, in terms of the final response time of the entire SQL, there are many trade-off scenarios for consumers to wait for producers. Waiting for producers is not necessarily the best choice.

2. Sideways Information Passing Framework

AnalyticDB for MySQL SIP framework is used for collecting and transmitting information during SQL execution. The information transmitted can be data characteristics, statistics, or a temporary memory table. An abstract SIP framework does not need to be aware of the specific content or category of information it collects and transmits but only needs to have the ability to collect and send corresponding runtime information, eliminate Sessions, and produce optimal procedures for information transmission.

2.1 The Concept of Runtime Information Transmitted by SIP

Runtime information is defined by type and granularity, independent of the subscriber of the information, and is a description of the information itself.

Type: The type of information (such as RowCount, NDV, Histogram, HashSet, and MagicSet)
Granularity: In a distributed computing engine, data is divided into different shards by a certain algorithm for parallel execution. During execution, each operator can obtain a part of statistical information. If the information is transmitted to operators with the same distribution attributes, it can directly use the shard statistics for some optimization. If it is transmitted to operators with different distribution attributes (or global optimizers, schedulers, and other modules), it is necessary to aggregate all the information to generate global information. The words shard and global describe the granularity of information.

The information has the following two characteristics to reduce the cost of repeated production:

Derivable: It supports complex to simple derivation based on the information of the same result set. For example, HashSet can be used to derive Histogram and NDV.
Reusable: The same information can be subscribed to by different subscribers. The data is only generated once and can be consumed multiple times.

As shown in the following figure, Agg generates a HashSet during running. Agg can be used as a SIP message to the left table for SemiJoin to reduce the work of TableScan. It can derive a Histogram and use it as the input for the Radix HashJoin algorithm to adjust the slice size of Hash Builder.

Figure 3

There are three requirements for our implementation. First, define the derivation relationship between different information types. Second, define that the relationship between the publishing subscribers is a one-to-many relationship. Third, manage the information lifecycle and clean up the data after all subscribers have consumed it.

2.2 Composition Concepts of SIP Framework

The entire framework consists of the publisher, subscriber, and channel. The information publisher is an operator or optimizer that collects information during execution and publishes it to the channel. The channel is a centralized module with a manager that manages and matches publish-subscribe relationships. After receiving information from the publisher, it determines which consumer or consumers to send to. The channel also has a service module that receives, processes, and distributes information. The subscriber can be an operator, optimizer, and scheduler, which is customized by rules based on application scenarios.

Figure 4

2.2.1 Publisher

Generating information has a certain overhead, and we hope to use some characteristics of existing operator algorithms to collect runtime information at the minimum cost without interrupting the operator channel calculation. Depending on the characteristics of the operator algorithm, different types of information are generated in different ways.

All operators can generate basic statistics. As components of the execution channel, operators can generate some basic statistics (such as RowCount) with negligible overhead when applying memory and computing based on their semantics.
Some operators can generate Derived statistics (such as Hash Agg and Hash Builder Operator). Hash Table needs to be generated during computing. We can directly use this Hash Table to construct Hash Sets or derive Histograms and NDV.
Derived statistics can be generated by inserting operators but not all operators can build Hash Table. We define an InfoCollectOperator that can be used to collect statistics when existing operators cannot be reused to collect statistics.

2.2.2 Subscriber

Depending on the application scenario, there are various forms of the subscriber. Currently, our applications include operators, optimizers, and schedulers. There may be multiple subscribers in different modules of the optimizer, each subscribing to different information from different publishers. When executing, the subscriber has a weak dependency on the publisher's information flow. The subscriber has a blocking effect on the entire channel. Under normal circumstances, the subscriber waits for receiving information and making decisions before continuing channel execution. However, the information we transmit is used for tuning. Therefore, if no information is received during timeout under abnormal circumstances, in order not to affect the overall response time of the query, a short-circuit mechanism is required to cancel the blocking and continue execution.

2.2.3 Channel

The channel is divided into two modules:

Channel Manager: It is responsible for building logical bridges and information management, deciding and publishing the correspondence of the subscriber before execution, and guiding where to send and when to clean up the received runtime information during execution.
Channel Service: It is responsible for building physical bridges for Data Transfer and transferring information between different nodes (according to the manager).

3. Implementation of AnalyticDB for MySQL RuntimeFilter

AnalyticDB for MySQL is a new-generation cloud-native data warehouse that supports high concurrency and low latency queries. It is highly compatible with the MySQL protocol and SQL:92, SQL:99, and SQL:2003 standards. It allows multi-dimensional and real-time analysis and business exploration of large amounts of data. AnalyticDB for MySQL integrates database and big data technologies. It supports high-throughput real-time data addition, deletion, and modification, low-latency real-time analysis, and complex ETL workloads. In addition, it is compatible with upstream and downstream eco-link tools and is used to build enterprise-level report systems, data warehouses, and data service engines.

3.1 Producer-Consumer Relationship Discovery Based on Global Equivalence Relation

In most industry implementations, RuntimeFilter takes effect in the sub-graph dimension of a Join rather than the entire query dimension. For example, in Figure 2, RTF5 only takes effect in the sub-graph dimension of the Join in Node 2. The consumer is Node 5. In the global dimension, if the conditions of Join-0 and Join-2 are the same, RTF5 can at least be consumed by Node1 because traditional RuntimeFilters are applied to star models or snowflake schema and often join in different dimensions. Global is different from subqueries, and there are fewer optimizable scenarios. In practice, there are more complex query scenarios, and there are still many scenarios that can be optimized through global delivery. For example, in Q24 of TPC-DS, the result set of the filtered item table can be passed to filter the two large tables - store_returns and store_sales. Based on the 99 queries of TPC-DS, we investigated the application scenario of global equivalence relation. We found that compared with supporting join subquery dimension, the number of RuntimeFilter valid is extended by 23% (620 vs. 798) if supporting the global dimension.

This idea of making RuntimeFilter more applicable through global equivalence is similar to the global idea in Data/Join-Induced Predicates (DIP). DIP uses the global equivalence relationship between data distribution features (data statistics/data layout) and Join columns to determine partition cutting during the planning phase to reduce the amount of computing data. The global equivalence relationship here is not limited to determining partition cutting by data distribution characteristics but applies to generalized RuntimeFilter.

3.2 Subscribing and Publishing of SIP-Based RuntimeFilter Objects

The RumtimeFilter Objects of AnalyticDB for MySQL are published and subscribed through the SIP framework. As shown in Figure 5, the SIP framework reduces the overhead of distributed delivery of the entire RuntimeFilter object by sharing, merging, and short-circuiting information and significantly reduces the impact of RuntimeFilter Object propagation on QPS and RT in high-concurrency online query scenarios.

Sharing: SIP allows a message to be shared by multiple subscribers. This message only needs to be generated once and transmitted to the coordinator node once. If subscribers are on the same compute node, this message only needs to be transmitted once.
Merging: If multiple messages are generated at the same time, we only send one network request when transmitting the messages among nodes through information merging, reducing network connections.
Short-Circuiting: When the information granularity is sharded, which means the global information is not required, we directly pass it to the corresponding subscriber within the node to short-circuit the network overhead between the nodes.

Figure 5

3.3 Test Results of TPC-DS

We tested the optimization effect of RuntimeFilter on the Benchmark dataset TPC-DS 1TB. The following table calculates the total execution time, the amount of scanned data, and the percentage improved when RuntimeFilter is turned on and off. Figure 6 and Figure 7 are bar charts comparing the execution time and the amount of scanned data in the query dimension.

	RuntimeFilter On	RuntimeFilter Off	Percentage
Execution Time	990.25s	430.25s	56.55%
Scanned Data	10.51TB	2.60TB	75.30%

Figure 6

Figure 7

4. Summary

AnalyticDB for MySQL RuntimeFilter provides the following benefits:

It can deduce based on global equivalence relations, which increases the number of application scenarios where RuntimeFilter helps to decrease execution time.
Based on the SIP distributed runtime information publishing and subscription framework, it maximizes the broadcasting efficiency of RuntimeFilter Objects in highly concurrent online query scenarios.
Based on the task-level DAG scheduling engine, the dependencies between the producer and consumer are integrated into the task scheduling sequence to properly control the timing of tasks on the consumer side to avoid self-blocking and idling of tasks.

Community

Alibaba Cloud AnalyticDB for MySQL: Create Ultimate RuntimeFilter Capability

1. RuntimeFilter

1.1 What Is RuntimeFilter?

1.2 Difficulties with RuntimeFilter

1.2.1 How to Select the Producer and Consumer of RuntimeFilter Objects

1.2.2 How RuntimeFilter Objects Are Passed in the Distributed Execution Framework

1.2.3 Efficient Generation and Use of RuntimeFilter Objects

2. Sideways Information Passing Framework

2.1 The Concept of Runtime Information Transmitted by SIP

2.2 Composition Concepts of SIP Framework

2.2.1 Publisher

2.2.2 Subscriber

2.2.3 Channel

3. Implementation of AnalyticDB for MySQL RuntimeFilter

3.1 Producer-Consumer Relationship Discovery Based on Global Equivalence Relation

3.2 Subscribing and Publishing of SIP-Based RuntimeFilter Objects

3.3 Test Results of TPC-DS

4. Summary

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

AnalyticDB for MySQL

PolarDB for MySQL

Database Overview

AnalyticDB for PostgreSQL