Whenyou use DRDS, slow SQL queries whose performance is not as expected may occur. The process of SQL tuning is to analyze information, such as SQL execution plans and the runtime of each phase, to identify causes of slow SQL queries and solve problems.

DRDS architecture

DRDS can be regarded as middleware and a database product that supports an architecture where computing is decoupled from storage. When a query SQL statement (called a logical SQL statement) is sent to DRDS nodes, DRDS divides it into two parts. One part can be pushed down and the other part cannot be pushed down. The part that can be pushed down is also called a physical SQL statement.

In principle, DRDS follows the following rules:

  • Pushes down the SQL statement of a user to MySQL for execution as much as possible.
  • Selects the optimal method to run the operators that cannot be pushed down.

Pushdown and execution plans

The EXPLAIN command prints SQL execution plans. This command is easy to use. You need only to add EXPLAIN in front of an SQL statement. The following examples show the execution methods of DRDS.

Example 1:

> explain select c_custkey, c_name, c_address from customer where c_custkey = 42;

LogicalView(tables="customer_2", sql="SELECT `c_custkey`, `c_name`, `c_address` FROM `customer` AS `customer` WHERE (`c_custkey` = ?)")

For queries that have primary key conditions, DRDS needs only to directly deliver SQL statements to the shard that corresponds to the primary key. Therefore, the execution plan contains only one LogicalView operator. The delivered physical SQL statement is basically the same as the logical SQL statement.

The LogicalView operator represents the query that is pushed down to MySQL for execution. For more information, see Execution plans and basic operators.

Example 2:

> explain select c_nationkey, count(*) from customer group by c_nationkey;

HashAgg(group="c_nationkey", count(*)="SUM(count(*))")
  Gather(concurrent=true)
    LogicalView(tables="customer_[0-7]", shardCount=8, sql="SELECT `c_nationkey`, COUNT(*) AS `count(*)` FROM `customer` AS `customer` GROUP BY `c_nationkey`")

The preceding query collects statistics on the number of customers in each country or region. The query can run in two phases in consideration of sharding.

  1. Run COUNT(*) on each table sharding to collect statistics. This step can be pushed down to MySQL for execution.
  2. Summarize the results. Run SUM() for the results of COUNT(*) to obtain the final results. This step must be completed by the DRDS node.

You can also find out the two phases from the execution plan. In the execution plan:

  • The LogicalView operator indicates that statistics are collected by COUNT(*) after data is grouped by nation for the SQL statements that are delivered to each shard.
  • The Gather operator is used to gather the results of each shard.
  • The HashAgg operator is a method to implement aggregation. This operator runs SUM() for the results of COUNT(*) by using c_nationkey as the grouping key.

For more examples, see Execution plans and basic operators.