All Products
Search
Document Center

Query rewriting and push-down

Last Updated: Jan 08, 2021

Pushdown is an important optimization on query statement rewriting. Pushdown can optimize execution plans based on the sharding information of a Distributed Relational Database Service (DRDS) instance so that operators can be pushed down as much as possible to filter data in advance, reduce the loads of network transfer, and implement parallel computing.

Therefore, the basic principle for optimizing SQL statements in DRDS is pushing down as much computing as possible to Relational Database Service (RDS) for MySQL instances.

Computations that can be pushed down include:

  • Joins
  • Filter conditions, such as WHERE or HAVING conditions
  • Aggregate computing, such as COUNT and GROUP BY
  • Sorting, such as ORDER BY
  • Deduplication, such as DISTINCT
  • Function computing, such as the NOW() function
  • Subqueries

You can run explain optimizer + sql to view the specific process of query statement rewriting.

Push down Project and Filter

The following is an example of how to generate an execution plan for an SQL statement. Filter and Project are successively pushed down to the LogicalView operator.

Filter and Project pushdown can filter data in advance to reduce the loads of network transfer.

 
  1. > explain optimizer select c_custkey,c_name from customer where c_custkey = 1;
  • c_custkey: the shard key of customer.
  • c_name: the name of customer.

x

Push down Limit and Sort

The following is an example of how to generate an execution plan for an SQL statement. Sort and Limit are successively pushed down to the LogicalView operator.

Sort and Limit pushdown can filter data in advance, reduce the loads of network transfer, implement parallel execution, and reduce the memory usage of a DRDS instance.

 
  1. > explain optimizer select * from customer order by c_custkey limit 10


x

Push down Agg

The following is an example of how to generate an execution plan for an SQL statement. Agg is pushed down to the LogicalView operator.

Agg pushdown can filter data in advance, reduce the loads of network transfer, implement parallel execution, and reduce the memory usage of a DRDS instance.

 
  1. > explain optimizer select count(*) from customer group by c_nationkey;

If the shard key is c_nationkey, the query statement rewriting process is as follows:
x

If the shard key is not c_nationkey, the query statement rewriting process is as follows:
x

Push down Join

The DRDS instance must meet the following conditions before pushing down Join:

  • The sharding methods of tables t1 and t2 are the same, including the database shard key, table shard key, sharding function, the number of database shards, and the number of table shards.
  • The Join condition includes the equivalence relationship between shard keys of tables t1 and t2.

In addition, Join between any table and broadcast tables can always be pushed down.

 
  1. > explain optimizer select * from t1, t2 where t1.id = t2.id;

The following is an example of how to generate an execution plan for an SQL statement. Join is pushed down to the LogicalView operator.Join pushdown can achieve calculations closer to storage, so that parallel execution can be accelerated.
x

Join Clustering

When multiple tables need to be joined, DRDS instances use an optimization technology called Join Clustering to re-sort Joins, placing Joins that can be pushed down in adjacent positions so that the Joins can be pushed down. The following shows an example:

Assume that the original join order is t2, t1, and l2. After re-sorting, t2-l2 join can still be pushed down to LogicalView.

 
  1. > explain select t2.id from t2 join t1 on t2.id = t1.id join l2 on t1.id = l2.id;
  2. Project(id="id")
  3. HashJoin(condition="id = id AND id = id0", type="inner")
  4. Gather(concurrent=true)
  5. LogicalView(tables="t2_[0-3],l2_[0-3]", shardCount=4, sql="SELECT `t2`.`id`, `l2`.`id` AS `id0` FROM `t2` AS `t2` INNER JOIN `l2` AS `l2` ON (`t2`.`id` = `l2`.`id`) WHERE (`t2`.`id` = `l2`.`id`)")
  6. Gather(concurrent=true)
  7. LogicalView(tables="t1", shardCount=2, sql="SELECT `id` FROM `t1` AS `t1`")

Push down a subquery

The following is an example of how to generate an execution plan for an SQL statement. Subqueries are pushed down to the LogicalView operator.

Subquery pushdown can achieve calculations closer to storage, so that parallel execution can be accelerated.

  • A subquery is first converted to the Semi Join or Anti Join statement.
  • After that, if the preceding pushdown conditions for Join are met, the Semi Join or Anti Join statement is pushed down to LogicalView.
  • The Semi Join or Anti Join statement that is pushed down is restored to the subquery.
 
  1. explain optimizer select * from t1 where id in (select id from t2);


1