Paper Interpretation | PolarDB Cost-Based Query Transformation

By Jiang Chen

Introduction

Oracle released a paper entitled Cost-Based Query Transformation in Oracle [1] in 2006. It explained how Oracle conducted cost-based query transformation. This article tries to fill the gaps between the paper and the real implementation.

A Diagram of Oracle Transformation Rule

Oracle Rule Iterator

First, the diagram was summarized by Riyaj (a Senior Oracle DBA) based on years of tracing experience. However, since he is not a developer of relevant features, we should selectively learn from it. First, let's take a look at the various abbreviations in the diagram:

CSE: Common Subexpression Elimination

eg. where (c1=3) or (c1=3 and c2='A') => where (c1=3)

CNT: count(col) to count(*) transformation

OBYE: order by elimination

JE: Join Elimination

SU: subquery unnesting

FPD: filter push-down

CVM: complex view merging

JPPD: join predicate push-down

SJC: Set Join Conversion

PM: predicate move-around

GBP: groupby placement

kkoqbc: cbo optimizer or something alike

If you are unfamiliar with these transformation rules, you can refer to the rule examples at the end of this article for further understanding.

As you can see from the figure above, Oracle divides the transformation rules into two categories: heuristic-based rules and cost-based rules. The heuristic-based transformation rules can be done once. Cost-based rules will be iterated for multiple rounds, and heuristic-based rules will be applied again if it is necessary. For example, after JPPD is done, the heuristic-based rules (such as JE and CNT) are applied again.

Note that the position of the rules is important. In the diagram above, you can see the SU is followed by the CVM rule. Let's expand on this point. After SU is performed, there is a solution that will generate a derived table. Then, perform CVM. The prerequisite for applying CVM is that a derived table is generated. If you want to build a Cost-Based Query Transformation (CBQT) framework, you can refer to the rule placement order above.

Transformation Interaction

The rule placement as a whole is in sequential order, but the interleaving and juxtaposing relationship between individual rules should be considered.

Sequential Order

Rules are applied one by one until the exit conditions are met (such as SQL statements become immutable or timeout). However, for a single rule, each query block has to apply this rule, and the next rule cannot be applied until the rule has been applied to all SQL statements.
If the cost of applying the current rule is lower than applying the original SQL statements, apply the current rule. For example, if Cost(QB) = 60 and Cost(SU(QB)) = 30, apply the SU rule.

Interleaving

Rules are interleaved. After a query block applies a rule, it can try to apply the next rule. It does not require the SQL statement to apply the whole rule.
Let’s take the example of [CVM + SU rule combination] to decide whether to apply SU. If a query block only applies SU, the cost is higher than the original one, but the cost of applying CVM is smaller, and the effect is better, and SU can be applied in this case. If only the first rule is applied in sequential order, the SU rule will be missed.

cost(QB)=40;Cost(SU(QB)=50; Cost(CVM(SU(QB)))=30 => do SU

Juxtaposing

GBVM (a kind of CVM) and JPPD rules are mutually exclusive. GBVM will eliminate the view and add more groupby columns + join condition on the outer layer. JPPD pushes the outer view-related join condition down to the interior of the view. They are mutually exclusive, so only one should be chosen.
Cost(QB) = 60; Cost(GBVM(QB)) = 45; Cost(JPPD(QB)) = 35 implies No GBVM, although the GBVM rule is placed first.

Transformation Detail

Identify Objects

Each SQL statement may have many objects. For example, for the SU rule, a subquery is an object that traverses the entire statement to obtain all objects in the entire rule.

Apply Rule

Try to apply rules to each object. Each rule has its own features. Choose pre-order or post-order.

Bottom-Up (Pre-Order): For example, in SU, subqueries are resolved from the innermost layer to the top layer. The same goes for CVM.

Top-Down (Post-Order): For example, in FPD, operators are pushed down to the innermost layer.

cbo

In addition to the common join order and calculation cost in the cbo process, after carefully observing Riyaj's slides, we can find that each query block in the Oracle CBQT process has a signature. If a query block has not changed, the previously calculated cost can be reused without recalculating the cost every time.

Search Epace Algorithm

I won't go into detail here. You can read the original Oracle paper for more details about two pass, linear, and exhaustive.

Summary

CBQT only helps you find the most accurate and optimal transformation rule. The framework itself does not reduce SQL latency, but specific transformation rules can. The value of the CBQT framework is that when you have many rules, it evaluates whether to apply the rule through cost estimation. Since our energy is limited, we can give priority to the following transformations with higher ROI.

Various Rule Benefits

Among them, the SU transformation has the largest gain, with an average runtime improvement of 3.8 times, which guides us to spend more on eliminating subqueries. MySQL SU has relatively fewer transformations. There are only common derived table transformations and transformations of spj subquery to semi join. PolarDB for MySQL will significantly enhance this part. More technical disclosures will be available in the future. Stay tuned.

In addition, some transformation rules require a deep understanding of relational algebra over time, which is a boon for senior database practitioners. We can look at the SJC transformation rule, which is quite complicated. Oracle has made many complex SQL transformations. For example, in GBVM, Oracle supports view merging with groupby. In contrast, MySQL does not support that. It only supports simple spj view merging, which is still a long way from Oracle.

An Introduction to the Related Work of PolarDB for MySQL

The cloud-native database PolarDB has implemented CBQT, but more transformation rules are still needed. This is a long-term improvement project. We are constantly adding more transformation rules. We believe we will be able to reduce the latency exponentially for complex queries and give users a smooth experience in the near future. In addition, the product has significantly improved query performance in parallel computing ePQ (multi-core, multi-machine). Official performance data is attached. The performance of PolarDB ePQ is up to 150 times higher than the native MySQL in a single execution.

More on PolarDB Parallel Query: https://www.alibabacloud.com/blog/cloud-forward-episode-3-cloud-native-database---polardb-%7C-parallel-query_599223

Examples

SU: subquery unnesting

FPD: filter push-down

CVM: complex view merging

JPPD: join predicate push-down

SJC: Set Join Conversion

PM: Predicate move-around (predicate pull-up and push-down)

GBP: Groupby placement. GB operators are pulled up and pushed down. Please see the GBVM example in which operators are pulled up to the outer block along with the view merging for more information about pull-up.

References

[1] riyaj_cost_based_query_transformation.pdf：https://orainternals.files.wordpress.com/2008/10/riyaj_cost_based_query_transformation.pdf

[2] Cost-based query transformation in Oracle – VLDB Sept 06 ACM 1-59593-385-9/06/0

Community

Paper Interpretation | PolarDB Cost-Based Query Transformation

Introduction

A Diagram of Oracle Transformation Rule

Transformation Interaction

Transformation Detail

Summary

An Introduction to the Related Work of PolarDB for MySQL

Examples

References

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for MySQL

PolarDB for PostgreSQL

PolarDB for Xscale

Database Overview