Pushdown is an important optimization on SQL rewrite. The sharding information about is used to optimize execution plans. This way, operators can be pushed down as much as possible to achieve the purposes, such as filtering data in advance, reducing the amount of network transmission, and implementing parallel computing.
Background information
As many computations as possible PolarDB-X 1.0 can be pushed down to the storage layer MySQL for execution based on the basic principles of optimizing SQL statements of PolarDB-X 1.0.
The computations that can be pushed down include:
- JOIN
- Filter conditions, such as the conditions in
WHERE
orHAVING
- Computing, such as
COUNT
andGROUP BY
- Sorting, such as
ORDER BY
- Removing duplicates, such as
DISTINCT
- Function computations, such as the
NOW()
function - Subqueries
explain optimizer + sql
.
Project and Filter pushdown
In the following process of generating an execution plan for an SQL statement, the
Filter and Project operators are successively pushed down to the LogicalView
operator.
Filter and Project pushdown can achieve effects, such as filtering data in advance and reducing the amount of network transmission.
> explain optimizer select c_custkey,c_name from customer where c_custkey = 1;
c_custkey
: the shard key of customer
. c_name
: the name of customer
.

Limit and Sort pushdown
In the following process of generating an execution plan for an SQL statement, the
Sort and Limit operators are successively pushed down to the LogicalView
operator. Sort and Limit pushdown can achieve effects, such as filtering data in
advance, reducing the amount of network transmission, implementing parallel execution,
and reducing PolarDB-X 1.0 the memory usage of PolarDB-X 1.0.
> explain optimizer select * from customer order by c_custkey limit 10

Agg pushdown
In the following process of generating an execution plan for an SQL statement, the
Agg operator is pushed down to the LogicalView
operator.
Agg pushdown can achieve effects, such as filtering data in advance, reducing the amount of network transmission, implementing parallel executions, and reducing PolarDB-X 1.0 the memory usage of PolarDB-X 1.0.
> explain optimizer select count(*) from customer group by c_nationkey;
If the shard key is c_nationkey
:

If the shard key is not c_nationkey
:

JOIN pushdown
The following conditions must be met for JOIN pushdown:
- The sharding methods of tables t1 and t2 are consistent, including database sharding keys, table sharding keys, sharding functions, and the number of shardings.
- The JOIN condition includes the equivalence relation between the shard keys of tables t1 and t2.
In addition, JOIN between a table and a broadcast table can always be pushed down.
> explain optimizer select * from t1, t2 where t1.id = t2.id;
In the following process of generating an execution plan for an SQL statement, the
JOIN operator is pushed down to the LogicalView
operator. JOIN pushdown can bring compute closer to storage and accelerate parallel
execution.

JoinClustering
When a JOIN operation is performed on multiple tables,PolarDB-X 1.0 uses the optimization technology of join clustering to re-sort JOIN operations and place the JOIN operations that can be pushed down in adjacent positions. This way, the JOIN operations can be pushed down as expected. The following example is provided.
Assume that the original JOIN order is t2, t1, and l2. After the JOIN operations are
re-sorted, the JOIN operation on t2 and l2 can still be pushed down to LogicalView
.
> explain select t2.id from t2 join t1 on t2.id = t1.id join l2 on t1.id = l2.id;
Project(id="id")
HashJoin(condition="id = id AND id = id0", type="inner")
Gather(concurrent=true)
LogicalView(tables="t2_[0-3],l2_[0-3]", shardCount=4, sql="SELECT `t2`.`id`, `l2`.`id` AS `id0` FROM `t2` AS `t2` INNER JOIN `l2` AS `l2` ON (`t2`.`id` = `l2`.`id`) WHERE (`t2`.`id` = `l2`.`id`)")
Gather(concurrent=true)
LogicalView(tables="t1", shardCount=2, sql="SELECT `id` FROM `t1` AS `t1`")
Subquery pushdown
In the following process of generating an execution plan for an SQL statement, the
subquery is pushed down to the LogicalView
operator.
Subquery pushdown can bring compute closer to storage and accelerate parallel execution.
- A subquery is first converted to
Semi Join
orAnti Join
. - After that, if the conditions that are described in the preceding section for JOIN
pushdown are met,
Semi Join
orAnti Join
is pushed down toLogicalView
. Semi Join
orAnti Join
that is pushed down is restored to the subquery.
explain optimizer select * from t1 where id in (select id from t2);
