All Products
Search
Document Center

PolarDB:Push down and rewrite queries

Last Updated:Dec 11, 2023

PolarDB-X can push down physical SQL statements to the storage layer to be performed. This is an optimization method of query rewrite. The automatic split feature that is provided by PolarDB-X is used to optimize execution plans to push down operations to the storage layer. If specific operations that are included in a query are pushed down to the storage layer to be performed, the query is optimized. This way, the queried data is filtered at the storage layer, the data transmission traffic is reduced, and the operations are performed in a parallel manner.

Background

PolarDB-X pushes down specific operations to the storage layer based on the basic principles of query optimization. The following operations can be pushed down to the storage layer:

  • Join operations.

  • Filter operations that are specified by using operators such as WHERE and HAVING.

  • Calculate operations that are specified by using operators such as COUNT and GROUP BY.

  • Sort operations that are specified by using operators such as ORDER BY.

  • Deduplicate operations that are specified by using operators such as DISTINCT.

  • Operations that are performed by using functions such as NOW().

  • Subqueries.

You can use the EXPLAIN OPTIMIZER <SQL statement> to query the process of query rewrite for an SQL statement.

Push down project operations and filter operations

The following example shows the process in which an execution plan for an SQL statement is generated. The filter operation and the project operation are pushed down to the LogicalView operator. When the filter operation and the project operation are pushed down to the storage layer to be performed, the query is optimized. This way, the queried data is filtered at the storage layer and the data transmission traffic is reduced.

EXPLAIN OPTIMIZER select c_custkey,c_name from customer where c_custkey = 1;

c_custkey is a partition key of the table. 456789

Push down limit operations and sort operations

The following example shows the process in which an execution plan for an SQL statement is generated. The sort operation and the limit operation are pushed down to the LogicalView operator. When the sort operation and the limit operation are pushed down to the storage layer to be performed, the query is optimized. This way, the queried data is filtered at the storage layer, the data transmission traffic is reduced, the operations are performed in a parallel manner, and the memory usage of the PolarDB-X database is reduced.

EXPLAIN OPTIMIZER select * from customer order by c_name limit 10

Push down limit operations and sort operations

Push down aggregate operations

The following example shows the process in which an execution plan for an SQL statement is generated. The aggregate operation is pushed down to the LogicalView operator.

When the aggregate operation is pushed down to the storage layer to be performed, the query is optimized. This way, the queried data is filtered at the storage layer, the data transmission traffic is reduced, the operations that are included in the query are performed in a parallel manner, and the memory usage of the PolarDB-X database is reduced.

EXPLAIN OPTIMIZER select count(*) from customer group by c_nationkey;

The following figure shows how an execution plan is generated for the SQL statement when c_nationkey is a partition key of the table.XXX.pngThe following figure shows how an execution plan is generated for the SQL statement when c_nationkey is not a partition key of the table.456789

JoinClustering

When an SQL statement specifies that the queried data must be obtained by joining multiple tables, PolarDB-X uses the JoinClustering operator to re-sort the join operations and places the JOIN clauses that can be pushed down to the storage layer in adjacent positions. This way, the join operations can be pushed down as expected. The following example shows how to join three tables.

In this example, the original join order is that the system joins the t2 table and the t1 table, and then joins the t1 table and the l2 table. After the JoinClustering operator re-sorts the join operations, the join operation that is performed on the t2 and l2 tables is pushed down to the LogicalView operator.

EXPLAIN select t2.id from t2 join t1 on t2.id = t1.id join l2 on t1.id = l2.id;

Project(id="id")
  HashJoin(condition="id = id AND id = id0", type="inner")
    Gather(concurrent=true)
      LogicalView(tables="t2_[0-3],l2_[0-3]", shardCount=4, sql="SELECT `t2`.`id`, `l2`.`id` AS `id0` FROM `t2` AS `t2` INNER JOIN `l2` AS `l2` ON (`t2`.`id` = `l2`.`id`) WHERE (`t2`.`id` = `l2`.`id`)")
    Gather(concurrent=true)
      LogicalView(tables="t1", shardCount=2, sql="SELECT `id` FROM `t1` AS `t1`")

Push down join operations

If you want to push down join operations, the following conditions must be met:

  • The sharding strategies of the two tables that are joined must be consistent. A sharding strategy specifies the database sharding keys, the table sharding keys, the sharding function, and the number of shards.

  • In the join condition, the shard key values that are specified for the two tables must be consistent.

Join operations that are used to join a broadcast table and a table of another type are pushed down to the storage layer to be performed.

EXPLAIN OPTIMIZER select * from t1, t2 where t1.id = t2.id;

The following example shows the process in which an execution plan for an SQL statement is generated. The join operation is pushed down to the LogicalView operator. When the join operation is pushed down to the storage layer to be performed, the query can be performed in a parallel manner and the query efficiency is accelerated. 456789

Push down subqueries

The following example shows the process in which an execution plan for an SQL statement is generated. The subquery statement that is included in the SQL statement is pushed down to the LogicalView operator. When the subquery statement is pushed down to the storage layer to be executed, the query can be performed in a parallel manner and the query efficiency is accelerated.

  1. The system converts the subquery to a SemiJoin or AntiJoin operation.

  2. If the conditions that are described in the "Push down join operations" section are met, the SemiJoin or AntiJoin operation is pushed down to the LogicalView operator.

  3. In the storage layer, the SemiJoin or AntiJoin is converted back to a subquery.

EXPLAIN OPTIMIZER select * from t1 where id in (select id from t2);

456789