Pushdown is an important optimization on SQL rewrite.The sharding information about DRDS is used to optimize execution plans. This way, operators can be pushed down as much as possible to achieve the purposes, such as filtering data in advance, reducing the amount of network transmission, and implementing parallel computing.

Background information

As many computations as possiblecan be pushed down to the storage layer MySQL for execution based on the basic principles of optimizing SQL statements of DRDS.

The computations that can be pushed down include:

  • JOIN
  • Filter conditions, such as the conditions in WHERE or HAVING
  • Computing, such as COUNT and GROUP BY
  • Sorting, such as ORDER BY
  • Removing duplicates, such as DISTINCT
  • Function computations, such as the NOW() function
  • Subqueries
Note You can view the detailed process of SQL rewrite by using explain optimizer + sql.

Project and Filter pushdown

In the following process of generating an execution plan for an SQL statement, the Filter and Project operators are successively pushed down to the LogicalView operator.

Filter and Project pushdown can achieve effects, such as filtering data in advance and reducing the amount of network transmission.

> explain optimizer select c_custkey,c_name from customer where c_custkey = 1;

c_custkey: the shard key of customer. c_name: the name of customer.

Project and Filter pushdown

Limit and Sort pushdown

In the following process of generating an execution plan for an SQL statement, the Sort and Limit operators are successively pushed down to the LogicalView operator. Sort and Limit pushdown can achieve effects, such as filtering data in advance, reducing the amount of network transmission, implementing parallel execution, and reducingthe memory usage of DRDS.

> explain optimizer select * from customer order by c_custkey limit 10
Limit and Sort pushdown

Agg pushdown

In the following process of generating an execution plan for an SQL statement, the Agg operator is pushed down to the LogicalView operator.

Agg pushdown can achieve effects, such as filtering data in advance, reducing the amount of network transmission, implementing parallel executions, and reducingthe memory usage of DRDS.

> explain optimizer select count(*) from customer group by c_nationkey;

If the shard key is c_nationkey:

Agg pushdown for which the shard key is c_nationkey

If the shard key is not c_nationkey:

Agg pushdown for which the shard key is not c_nationkey

JOIN pushdown

The following conditions must be met for JOIN pushdown:

  • The sharding methods of tables t1 and t2 are consistent, including database sharding keys, table sharding keys, sharding functions, and the number of shardings.
  • The JOIN condition includes the equivalence relation between the shard keys of tables t1 and t2.

In addition, JOIN between a table and a broadcast table can always be pushed down.

> explain optimizer select * from t1, t2 where t1.id = t2.id;

In the following process of generating an execution plan for an SQL statement, the JOIN operator is pushed down to the LogicalView operator. JOIN pushdown can bring compute closer to storage and accelerate parallel execution.

JOIN pushdown

JoinClustering

When a JOIN operation is performed on multiple tables,DRDS uses the optimization technology of join clustering to re-sort JOIN operations and place the JOIN operations that can be pushed down in adjacent positions. This way, the JOIN operations can be pushed down as expected. The following example is provided.

Assume that the original JOIN order is t2, t1, and l2. After the JOIN operations are re-sorted, the JOIN operation on t2 and l2 can still be pushed down to LogicalView.

> explain select t2.id from t2 join t1 on t2.id = t1.id join l2 on t1.id = l2.id;

Project(id="id")
  HashJoin(condition="id = id AND id = id0", type="inner")
    Gather(concurrent=true)
      LogicalView(tables="t2_[0-3],l2_[0-3]", shardCount=4, sql="SELECT `t2`.`id`, `l2`.`id` AS `id0` FROM `t2` AS `t2` INNER JOIN `l2` AS `l2` ON (`t2`.`id` = `l2`.`id`) WHERE (`t2`.`id` = `l2`.`id`)")
    Gather(concurrent=true)
      LogicalView(tables="t1", shardCount=2, sql="SELECT `id` FROM `t1` AS `t1`")

Subquery pushdown

In the following process of generating an execution plan for an SQL statement, the subquery is pushed down to the LogicalView operator.

Subquery pushdown can bring compute closer to storage and accelerate parallel execution.

  • A subquery is first converted to Semi Join or Anti Join.
  • After that, if the conditions that are described in the preceding section for JOIN pushdown are met, Semi Join or Anti Join is pushed down to LogicalView.
  • Semi Join or Anti Join that is pushed down is restored to the subquery.
explain optimizer select * from t1 where id in (select id from t2);
Subquery pushdown