By Jiang Chen
Oracle released a paper entitled Cost-Based Query Transformation in Oracle [1] in 2006. It explained how Oracle conducted cost-based query transformation. This article tries to fill the gaps between the paper and the real implementation.
Oracle Rule Iterator
First, the diagram was summarized by Riyaj (a Senior Oracle DBA) based on years of tracing experience. However, since he is not a developer of relevant features, we should selectively learn from it. First, let's take a look at the various abbreviations in the diagram:
CSE: Common Subexpression Elimination
eg. where (c1=3) or (c1=3 and c2='A') => where (c1=3)
CNT: count(col) to count(*) transformation
OBYE: order by elimination
JE: Join Elimination
SU: subquery unnesting
FPD: filter push-down
CVM: complex view merging
JPPD: join predicate push-down
SJC: Set Join Conversion
PM: predicate move-around
GBP: groupby placement
kkoqbc: cbo optimizer or something alike
If you are unfamiliar with these transformation rules, you can refer to the rule examples at the end of this article for further understanding.
As you can see from the figure above, Oracle divides the transformation rules into two categories: heuristic-based rules and cost-based rules. The heuristic-based transformation rules can be done once. Cost-based rules will be iterated for multiple rounds, and heuristic-based rules will be applied again if it is necessary. For example, after JPPD is done, the heuristic-based rules (such as JE and CNT) are applied again.
Note that the position of the rules is important. In the diagram above, you can see the SU is followed by the CVM rule. Let's expand on this point. After SU is performed, there is a solution that will generate a derived table. Then, perform CVM. The prerequisite for applying CVM is that a derived table is generated. If you want to build a Cost-Based Query Transformation (CBQT) framework, you can refer to the rule placement order above.
The rule placement as a whole is in sequential order, but the interleaving and juxtaposing relationship between individual rules should be considered.
Sequential Order
Interleaving
cost(QB)=40;Cost(SU(QB)=50; Cost(CVM(SU(QB)))=30 => do SU
Juxtaposing
Identify Objects
Each SQL statement may have many objects. For example, for the SU rule, a subquery is an object that traverses the entire statement to obtain all objects in the entire rule.
Apply Rule
Try to apply rules to each object. Each rule has its own features. Choose pre-order or post-order.
Bottom-Up (Pre-Order): For example, in SU, subqueries are resolved from the innermost layer to the top layer. The same goes for CVM.
Top-Down (Post-Order): For example, in FPD, operators are pushed down to the innermost layer.
cbo
In addition to the common join order and calculation cost in the cbo process, after carefully observing Riyaj's slides, we can find that each query block in the Oracle CBQT process has a signature. If a query block has not changed, the previously calculated cost can be reused without recalculating the cost every time.
Search Epace Algorithm
I won't go into detail here. You can read the original Oracle paper for more details about two pass, linear, and exhaustive.
CBQT only helps you find the most accurate and optimal transformation rule. The framework itself does not reduce SQL latency, but specific transformation rules can. The value of the CBQT framework is that when you have many rules, it evaluates whether to apply the rule through cost estimation. Since our energy is limited, we can give priority to the following transformations with higher ROI.
Various Rule Benefits
Among them, the SU transformation has the largest gain, with an average runtime improvement of 3.8 times, which guides us to spend more on eliminating subqueries. MySQL SU has relatively fewer transformations. There are only common derived table transformations and transformations of spj subquery to semi join. PolarDB for MySQL will significantly enhance this part. More technical disclosures will be available in the future. Stay tuned.
In addition, some transformation rules require a deep understanding of relational algebra over time, which is a boon for senior database practitioners. We can look at the SJC transformation rule, which is quite complicated. Oracle has made many complex SQL transformations. For example, in GBVM, Oracle supports view merging with groupby. In contrast, MySQL does not support that. It only supports simple spj view merging, which is still a long way from Oracle.
The cloud-native database PolarDB has implemented CBQT, but more transformation rules are still needed. This is a long-term improvement project. We are constantly adding more transformation rules. We believe we will be able to reduce the latency exponentially for complex queries and give users a smooth experience in the near future. In addition, the product has significantly improved query performance in parallel computing ePQ (multi-core, multi-machine). Official performance data is attached. The performance of PolarDB ePQ is up to 150 times higher than the native MySQL in a single execution.
More on PolarDB Parallel Query: https://www.alibabacloud.com/blog/cloud-forward-episode-3-cloud-native-database---polardb-%7C-parallel-query_599223
SU: subquery unnesting
FPD: filter push-down
CVM: complex view merging
JPPD: join predicate push-down
SJC: Set Join Conversion
PM: Predicate move-around (predicate pull-up and push-down)
GBP: Groupby placement. GB operators are pulled up and pushed down. Please see the GBVM example in which operators are pulled up to the outer block along with the view merging for more information about pull-up.
[1] riyaj_cost_based_query_transformation.pdf:https://orainternals.files.wordpress.com/2008/10/riyaj_cost_based_query_transformation.pdf
[2] Cost-based query transformation in Oracle – VLDB Sept 06 ACM 1-59593-385-9/06/0
The Intelligent Filter Condition Pushdown Principle of AnalyticDB for MySQL
Hands-On Lab | Migrate User-Created MySQL to ApsaraDB RDS for MySQL with DTS
ApsaraDB - March 26, 2024
ApsaraDB - August 7, 2023
ApsaraDB - October 20, 2023
ApsaraDB - December 25, 2023
ApsaraDB - March 3, 2020
ApsaraDB - October 24, 2023
Alibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreAlibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAlibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.
Learn MoreAlibaba Cloud enables digital retail transformation to fuel growth and realize an omnichannel customer experience throughout the consumer journey.
Learn MoreMore Posts by ApsaraDB