All Products
Search
Document Center

PolarDB:Introduction to the query optimizer

Last Updated:Mar 28, 2026

The query optimizer converts a logical plan into a physical plan in two main phases: SQL rewrite and plan enumeration.

Execution process after an SQL statement is received

When PolarDB-X receives an SQL statement:

  1. The syntax parser parses the SQL text into an abstract syntax tree (AST).

  2. The AST is converted into a relational algebra-based logical plan.

  3. The optimizer rewrites and optimizes the logical plan to produce a physical plan.

  4. The executor runs the physical plan and returns the query result to the client.

This topic covers the three building blocks of the optimizer:

  • Relational algebra operators

  • SQL rewrite (the Rule-Based Optimizer (RBO) phase)

  • Query plan enumeration (the Cost-Based Optimizer (CBO) phase)

Relational algebra operators

PolarDB-X represents SQL queries as trees of relational algebra operators. Each operator maps to one or more physical implementations.

Logical operatorWhat it representsPhysical operators
ProjectSELECT column list, including function computations
FilterWHERE conditions
JOINJOIN clausesHashJoin, BKAJoin, Nested-Loop Join, SortMergeJoin
AggGROUP BY and aggregate functionsHashAgg, SortAgg
SortORDER BY and LIMITTopN, MemSort
LogicalViewSQL sent to the MySQL storage layer; may wrap one or more logical operators
GatherCollects results from multiple data streams; typically appears above a LogicalView node. When parallel execution is enabled, the optimizer pulls Gather up during the parallel optimization step.

Example: reading an EXPLAIN output

The following query is adapted from TPC-H Query 3:

SELECT l_orderkey, sum(l_extendedprice *(1 - l_discount)) AS revenue
FROM CUSTOMER, ORDERS, LINEITEM
WHERE c_mktsegment = 'AUTOMOBILE'
  and c_custkey = o_custkey
  and l_orderkey = o_orderkey
  and o_orderdate < '1995-03-13'
  and l_shipdate > '1995-03-13'
GROUP BY l_orderkey;

Run EXPLAIN to view the execution plan:

HashAgg(group="l_orderkey", revenue="SUM(*)")
  HashJoin(condition="o_custkey = c_custkey", type="inner")
    Gather(concurrent=true)
      LogicalView(tables="ORDERS_[0-7],LINEITEM_[0-7]", shardCount=8, sql="SELECT `ORDERS`.`o_custkey`, `LINEITEM`.`l_orderkey`, (`LINEITEM`.`l_extendedprice` * (? - `LINEITEM`.`l_discount`)) AS `x` FROM `ORDERS` AS `ORDERS` INNER JOIN `LINEITEM` AS `LINEITEM` ON (((`ORDERS`.`o_orderkey` = `LINEITEM`.`l_orderkey`) AND (`ORDERS`.`o_orderdate` < ?)) AND (`LINEITEM`.`l_shipdate` > ?))")
    Gather(concurrent=true)
      LogicalView(tables="CUSTOMER_[0-7]", shardCount=8, sql="SELECT `c_custkey` FROM `CUSTOMER` AS `CUSTOMER` WHERE (`c_mktsegment` = ?)")

The plan tree looks like this:

Treemap chart that illustrates the relational algebra operators
The left LogicalView node wraps the JOIN of ORDERS and LINEITEM. The sql attribute in the EXPLAIN output shows the full SQL that PolarDB-X pushes down to the storage layer, including that JOIN.

EXPLAIN output fields

FieldWhat it meansExample
group=Grouping columns for an Agg operatorgroup="l_orderkey"
condition=Join or filter conditioncondition="o_custkey = c_custkey"
tables=Shard table names accessed by a LogicalViewtables="ORDERS_[0-7]"
shardCount=Number of shards accessedshardCount=8
sql=SQL sent to the MySQL storage layerFull SQL string
concurrent=Whether Gather collects streams in parallelconcurrent=true

SQL rewrite (RBO)

The Rule-Based Optimizer (RBO) takes a logical plan as input and produces a rewritten logical plan. It applies a fixed set of heuristic rules. Because the rules are deterministic, this phase is also called the RBO phase.

SQL rewrite (RBO)

Subquery unnesting

Subquery unnesting converts correlated subqueries into SemiJoin or equivalent operators. This makes it possible to push the subquery to the MySQL storage layer or to choose an efficient execution algorithm at the PolarDB-X layer.

In the following example, an IN subquery is rewritten as a SemiHashJoin:

> explain  select id from t1 where id in (select id from t2 where t2.name = 'hello');
SemiHashJoin(condition="id = id", type="semi")
  Gather(concurrent=true)
    LogicalView(tables="t1", shardCount=2, sql="SELECT `id` FROM `t1` AS `t1`")
  Gather(concurrent=true)
    LogicalView(tables="t2_[0-3]", shardCount=4, sql="SELECT `id` FROM `t2` AS `t2` WHERE (`name` = ?)")

Operator pushdown

Operator pushdown moves work as close to the data as possible. Pushing operators to the MySQL storage layer filters and aggregates data early, reducing the volume of data transferred over the network.

PolarDB-X applies the following pushdown rules:

RuleWhat it does
Predicate pushdown and column pruningPushes Filter and Project operators to the MySQL storage layer to filter out unneeded rows and columns
JOIN clusteringRe-sorts and groups JOIN operations by shard key equality conditions, enabling subsequent JOIN pushdown
JOIN pushdownPushes JOIN operations that meet shard key conditions to the MySQL storage layer
Agg pushdownSplits Agg into LocalAgg (runs on each shard) and FinalAgg (runs at the PolarDB-X layer); pushes LocalAgg to MySQL
Sort pushdownSplits Sort into LocalSort (runs on each shard) and MergeSort (runs at the PolarDB-X layer); pushes LocalSort to MySQL

For a detailed walkthrough of pushdown behavior, see SQL rewrite and pushdown.

Query plan enumeration (CBO)

The CBO takes the rewritten logical plan from the RBO phase and selects the lowest-cost physical execution plan from many alternatives. It selects the query plan with the minimum cost from multiple feasible query plans using the predefined cost model. Unlike the RBO, the CBO does not apply rules unconditionally — different rule orderings can produce better or worse plans, so the CBO compares costs and picks the best option. Therefore, Plan Enumerator is also called CBO.

How the CBO works

The CBO — also called the Plan Enumerator — runs in three steps:

  1. Build the search space. The search engine applies transform rules to the logical plan to generate a set of candidate physical plans.

  2. Estimate costs. The cost model evaluates each candidate plan in the search space and selects the one with the lowest estimated cost.

  3. Estimate cardinality. Cost estimation depends on cardinality estimation: for each operator, the optimizer estimates the number of input rows and the selectivity based on table and column statistics. These estimates feed into the cost model.

CBO components

ComponentRole
StatisticsPer-table and per-column data distributions collected by PolarDB-X
Cardinality estimationEstimates the number of rows and selectivity for each operator, based on statistics
Transform rulesRules that rewrite the logical plan into alternative physical plan shapes (for example, reordering joins)
Cost modelAssigns a cost to each candidate plan based on estimated row counts and operator characteristics
Plan space search engineEnumerates candidate plans using transform rules and selects the minimum-cost plan

What's next

  • SQL rewrite and pushdown: A deeper look at how PolarDB-X uses sharding information to push operators to MySQL and optimize data flow.