The query optimizer converts a logical plan into a physical plan in two main phases: SQL rewrite and plan enumeration.

When PolarDB-X receives an SQL statement:
The syntax parser parses the SQL text into an abstract syntax tree (AST).
The AST is converted into a relational algebra-based logical plan.
The optimizer rewrites and optimizes the logical plan to produce a physical plan.
The executor runs the physical plan and returns the query result to the client.
This topic covers the three building blocks of the optimizer:
Relational algebra operators
SQL rewrite (the Rule-Based Optimizer (RBO) phase)
Query plan enumeration (the Cost-Based Optimizer (CBO) phase)
Relational algebra operators
PolarDB-X represents SQL queries as trees of relational algebra operators. Each operator maps to one or more physical implementations.
| Logical operator | What it represents | Physical operators |
|---|---|---|
| Project | SELECT column list, including function computations | — |
| Filter | WHERE conditions | — |
| JOIN | JOIN clauses | HashJoin, BKAJoin, Nested-Loop Join, SortMergeJoin |
| Agg | GROUP BY and aggregate functions | HashAgg, SortAgg |
| Sort | ORDER BY and LIMIT | TopN, MemSort |
| LogicalView | SQL sent to the MySQL storage layer; may wrap one or more logical operators | — |
| Gather | Collects results from multiple data streams; typically appears above a LogicalView node. When parallel execution is enabled, the optimizer pulls Gather up during the parallel optimization step. | — |
Example: reading an EXPLAIN output
The following query is adapted from TPC-H Query 3:
SELECT l_orderkey, sum(l_extendedprice *(1 - l_discount)) AS revenue
FROM CUSTOMER, ORDERS, LINEITEM
WHERE c_mktsegment = 'AUTOMOBILE'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < '1995-03-13'
and l_shipdate > '1995-03-13'
GROUP BY l_orderkey;Run EXPLAIN to view the execution plan:
HashAgg(group="l_orderkey", revenue="SUM(*)")
HashJoin(condition="o_custkey = c_custkey", type="inner")
Gather(concurrent=true)
LogicalView(tables="ORDERS_[0-7],LINEITEM_[0-7]", shardCount=8, sql="SELECT `ORDERS`.`o_custkey`, `LINEITEM`.`l_orderkey`, (`LINEITEM`.`l_extendedprice` * (? - `LINEITEM`.`l_discount`)) AS `x` FROM `ORDERS` AS `ORDERS` INNER JOIN `LINEITEM` AS `LINEITEM` ON (((`ORDERS`.`o_orderkey` = `LINEITEM`.`l_orderkey`) AND (`ORDERS`.`o_orderdate` < ?)) AND (`LINEITEM`.`l_shipdate` > ?))")
Gather(concurrent=true)
LogicalView(tables="CUSTOMER_[0-7]", shardCount=8, sql="SELECT `c_custkey` FROM `CUSTOMER` AS `CUSTOMER` WHERE (`c_mktsegment` = ?)")The plan tree looks like this:

The left LogicalView node wraps the JOIN of ORDERS and LINEITEM. The sql attribute in the EXPLAIN output shows the full SQL that PolarDB-X pushes down to the storage layer, including that JOIN.EXPLAIN output fields
| Field | What it means | Example |
|---|---|---|
group= | Grouping columns for an Agg operator | group="l_orderkey" |
condition= | Join or filter condition | condition="o_custkey = c_custkey" |
tables= | Shard table names accessed by a LogicalView | tables="ORDERS_[0-7]" |
shardCount= | Number of shards accessed | shardCount=8 |
sql= | SQL sent to the MySQL storage layer | Full SQL string |
concurrent= | Whether Gather collects streams in parallel | concurrent=true |
SQL rewrite (RBO)
The Rule-Based Optimizer (RBO) takes a logical plan as input and produces a rewritten logical plan. It applies a fixed set of heuristic rules. Because the rules are deterministic, this phase is also called the RBO phase.

Subquery unnesting
Subquery unnesting converts correlated subqueries into SemiJoin or equivalent operators. This makes it possible to push the subquery to the MySQL storage layer or to choose an efficient execution algorithm at the PolarDB-X layer.
In the following example, an IN subquery is rewritten as a SemiHashJoin:
> explain select id from t1 where id in (select id from t2 where t2.name = 'hello');
SemiHashJoin(condition="id = id", type="semi")
Gather(concurrent=true)
LogicalView(tables="t1", shardCount=2, sql="SELECT `id` FROM `t1` AS `t1`")
Gather(concurrent=true)
LogicalView(tables="t2_[0-3]", shardCount=4, sql="SELECT `id` FROM `t2` AS `t2` WHERE (`name` = ?)")Operator pushdown
Operator pushdown moves work as close to the data as possible. Pushing operators to the MySQL storage layer filters and aggregates data early, reducing the volume of data transferred over the network.
PolarDB-X applies the following pushdown rules:
| Rule | What it does |
|---|---|
| Predicate pushdown and column pruning | Pushes Filter and Project operators to the MySQL storage layer to filter out unneeded rows and columns |
| JOIN clustering | Re-sorts and groups JOIN operations by shard key equality conditions, enabling subsequent JOIN pushdown |
| JOIN pushdown | Pushes JOIN operations that meet shard key conditions to the MySQL storage layer |
| Agg pushdown | Splits Agg into LocalAgg (runs on each shard) and FinalAgg (runs at the PolarDB-X layer); pushes LocalAgg to MySQL |
| Sort pushdown | Splits Sort into LocalSort (runs on each shard) and MergeSort (runs at the PolarDB-X layer); pushes LocalSort to MySQL |
For a detailed walkthrough of pushdown behavior, see SQL rewrite and pushdown.
Query plan enumeration (CBO)
The CBO takes the rewritten logical plan from the RBO phase and selects the lowest-cost physical execution plan from many alternatives. It selects the query plan with the minimum cost from multiple feasible query plans using the predefined cost model. Unlike the RBO, the CBO does not apply rules unconditionally — different rule orderings can produce better or worse plans, so the CBO compares costs and picks the best option. Therefore, Plan Enumerator is also called CBO.
How the CBO works
The CBO — also called the Plan Enumerator — runs in three steps:
Build the search space. The search engine applies transform rules to the logical plan to generate a set of candidate physical plans.
Estimate costs. The cost model evaluates each candidate plan in the search space and selects the one with the lowest estimated cost.
Estimate cardinality. Cost estimation depends on cardinality estimation: for each operator, the optimizer estimates the number of input rows and the selectivity based on table and column statistics. These estimates feed into the cost model.
CBO components
| Component | Role |
|---|---|
| Statistics | Per-table and per-column data distributions collected by PolarDB-X |
| Cardinality estimation | Estimates the number of rows and selectivity for each operator, based on statistics |
| Transform rules | Rules that rewrite the logical plan into alternative physical plan shapes (for example, reordering joins) |
| Cost model | Assigns a cost to each candidate plan based on estimated row counts and operator characteristics |
| Plan space search engine | Enumerates candidate plans using transform rules and selects the minimum-cost plan |
What's next
SQL rewrite and pushdown: A deeper look at how PolarDB-X uses sharding information to push operators to MySQL and optimize data flow.