All Products
Search
Document Center

EXPLAIN and execution plans

Last Updated: Oct 09, 2020

Similar to most database systems, Distributed Relational Database Service (DRDS) uses an optimizer to generate an execution plan when it processes an SQL statement. This execution plan has a tree structure of relational operators, which reflects how DRDS executes the SQL statement. The difference is that DRDS does not store data but pushes computations down to each ApsaraDB RDS for MySQL database for execution while the network I/O overheads is considered in a distributed environment. In this way, the efficiency of SQL statement execution is improved. You can execute the EXPLAIN statement to view an SQL execution plan.
This topic describes the meanings of the operators used in a DRDS execution plan so that you can understand the SQL execution process by using the execution plan. This helps you optimize SQL statements. The examples in this topic are based on the following table structure:

  1. CREATE TABLE `sbtest1` (
  2. `id` INT(10) UNSIGNED NOT NULL,
  3. `k` INT(10) UNSIGNED NOT NULL DEFAULT '0',
  4. `c` CHAR(120) NOT NULL DEFAULT '',
  5. `pad` CHAR(60) NOT NULL DEFAULT '',
  6. KEY `xid` (`id`),
  7. KEY `k_1` (`k`)
  8. ) dbpartition BY HASH (`id`) tbpartition BY HASH (`id`) tbpartitions 4


The following example helps you understand the tree structure of an execution plan in DRDS:

  1. mysql> explain select a.k, count(*) cnt from sbtest1 a, sbtest1 b where a.id = b.k and a.id > 1000 group by k having cnt > 1300 order by cnt limit 5, 10;
  2. +---------------------------------------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +---------------------------------------------------------------------------------------------------------------------------------------------------+
  5. | TmpSort(sort="cnt ASC", offset=?2, fetch=?3) |
  6. | Filter(condition="cnt > ?1") |
  7. | Aggregate(group="k", cnt="COUNT()") |
  8. | BKAJoin(id="id", k="k", c="c", pad="pad", id0="id0", k0="k0", c0="c0", pad0="pad0", condition="id = k", type="inner") |
  9. | MergeSort(sort="k ASC") |
  10. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?) ORDER BY `k`") |
  11. | UnionAll(concurrent=true) |
  12. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE ((`k` > ?) AND (`k` IN ('?')))") |
  13. | HitCache:false |
  14. +---------------------------------------------------------------------------------------------------------------------------------------------------+
  15. 9 rows in set (0.01 sec)


As shown in the preceding example, the overall results of a DRDS EXPLAIN statement are divided into two parts: the execution plan and other information.

  • Execution plan
    The execution plan represents the parent-child relationships between operators in indent form. In this example, the Filter is a child operator of TmpSort and a parent operator of Aggregate. From the perspective of execution, each operator pulls data from its child operators, processes the pulled data, and then exports the processed data to its parent operator. To better understand the preceding execution plan, we convert the preceding execution plan into a tree structure:

execution-plan

  • Other information
    In addition to the execution plan, other information is included in the EXPLAIN results. In this example, only HitCache is included. DRDS enables the PlanCache function by default. HitCache indicates whether the current SQL statement hits PlanCache.
    After PlanCache is enabled, DRDS parameterizes the SQL statement by replacing most constants with a question mark (?), and constructing a parameter list. For example, in the execution plan, LogicalView’s SQL has a question mark (?), and certain operators may have some characters like ?2. The 2 here indicates the subscript of operators in the parameter list. This will be further elaborated with specific examples later.

EXPLAIN syntax


The EXPLAIN statement is used to view the execution plan of an SQL statement. The following sample code shows the syntax:

  1. EXPLAIN explainable_stmt
  2. explainable_stmt: {
  3. SELECT statement
  4. | DELETE statement
  5. | INSERT statement
  6. | REPLACE statement
  7. | UPDATE statement
  8. }

Introduction to operators

This section describes the meanings of operators in a DRDS execution plan in detail.

LogicalView

LogicalView pulls data from the underlying data source. From the perspective of database, naming with TableScan is more conventional. However, given that DRDS itself does not store data but instead obtains data from the underlying data source by using SQL statements, this operator is more like a ‘view’ as it records the pushed down SQL statement and data source information. The SQL statements in this ‘view’ are pushed down by an optimizer. This may include multiple operations such as projection, filtering, aggregation, sorting, joining, and subqueries.

The following example describes the output and meanings of LogicalView in the EXPLAIN statement:

  1. mysql> explain select * From sbtest1 where id > 1000;
  2. +-----------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +-----------------------------------------------------------------------------------------------------------------------+
  5. | UnionAll(concurrent=true) |
  6. | LogicalView (tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)") |
  7. | HitCache:false |
  8. +-----------------------------------------------------------------------------------------------------------------------+
  9. 3 rows in set (0.00 sec)

LogicalView consists of three parts of information:

  • _tables: __the name of the underlying data source table. The value uses a period (.) as a separator, which is preceded by the number of the database shard and followed by the name and number of a table shard. Consecutive numbers will be shortened, for example, [000-127], indicating all table shards with numbers ranging from 000 to 127.
  • _shardCount: __the total number of table shards that you want to access. In this example, 128 table shards with numbers ranging from 000 to 127 will be queried.
  • sql :the SQL template sent to the underlying data source. The value in the example is for reference only. DRDS replaces the table name with the physical table name during execution and replaces the constant 10 with a question mark (?). This is because DRDS enables PlanCache by default and parameterizes SQL statements.

UnionAll

UnionAll corresponds to UNION ALL. Generally, this operator has multiple inputs and a UNION operation is performed on the inputs. In the preceding example, UnionAll on LogicalView means that UNION is performed on the data in all table shards.

The concurrent in UnionAll indicates whether to run its child operators in parallel. Default value: true.

UnionDistinct

Similar to UnionAll, UnionDistinct corresponds to UNION DISTINCT. For example:

  1. mysql> explain select * From sbtest1 where id > 1000 union distinct select * From sbtest1 where id < 200;
  2. +-------------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +-------------------------------------------------------------------------------------------------------------------------+
  5. | UnionDistinct(concurrent=true) |
  6. | UnionAll(concurrent=true) |
  7. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)") |
  8. | UnionAll(concurrent=true) |
  9. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` < ?)") |
  10. | HitCache:false |
  11. +-------------------------------------------------------------------------------------------------------------------------+
  12. 6 rows in set (0.02 sec)

MergeSort

MergeSort is the merge sort operator. Generally, this operator has multiple child operators. DRDS implements merging sorting for ordered data and memory sorting for unordered data. For example:

  1. mysql> explain select *from sbtest1 where id > 1000 order by id limit 5,10;
  2. +---------------------------------------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +---------------------------------------------------------------------------------------------------------------------------------------------------+
  5. | MergeSort(sort="id ASC", offset=?1, fetch=?2) |
  6. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?) ORDER BY `id` LIMIT (? + ?)") |
  7. | HitCache:false |
  8. +---------------------------------------------------------------------------------------------------------------------------------------------------+
  9. 3 rows in set (0.00 sec)

The MergeSort operator consists of three parts of information:

  • sort: the sort field and sort order. Specifically, id ASC specifies that data is sorted in ascending order based on the id field, and DESC specifies that data is sorted in descending order.
  • offset :the offset to obtain the result set. Similarly, due to the parameterization of SQL statements, the offst in the example is expressed as?1, where the question mark (?) is a dynamic parameter, and the number that follows corresponds to the subscript of the parameter list. In this example, the parameter corresponding to the SQL statement is [1000, 5, 10], and therefore, the actual value of ?1 is 5.
  • fetch: the maximum number of returned data rows. Similar to offset, this parameter is also parameterized. The actual value is 10.

Aggregate

Aggregate is an aggregate operator, which consists of two parts: the Group By field and the aggregate function. For example:

  1. mysql> explain select k, count(*) from sbtest1 where id > 1000 group by k;
  2. +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  5. | Aggregate(group="k", count(*)="SUM(count(*))") |
  6. | MergeSort(sort="k ASC") |
  7. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k`, COUNT(*) AS `count(*)` FROM `sbtest1` WHERE (`id` > ?) GROUP BY `k` ORDER BY `k`") |
  8. | HitCache:true |
  9. +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  10. 4 rows in set (0.00 sec)

Aggregate consists of two parts of information:

  • group: the GROUP BY field, which is k in this example.
  • Aggregate function:: The equal sign (=) follows the output column name corresponding to the aggregate function and is followed by the corresponding calculation method. In count(*)="SUM(count(*))" of the example, the first count(*) corresponds to the output column name. The following SUM(count(*)) means that the final results of the count(*) column is obtained by performing a SUM operation on the input data of the count(*).

This indicates that DRDS divides aggregate operations into two parts. First, the aggregate operations are pushed down to the underlying data sources for local aggregation. Then, the global aggregation of the locally aggregated results is performed at the DRDS layer. The final aggregation of DRDS is based on sorting. Therefore, a child operator Sort is added in the optimizer, and the Sort operator is further converted to MergeSort by pushdown.

Another example of the AVG aggregate function is as follows:

  1. mysql> explain select k, avg(id) avg_id from sbtest1 where id > 1000 group by k;
  2. +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN|
  4. +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  5. | Project(k="k", avg_id="sum_pushed_sum / sum_pushed_count")|
  6. | Aggregate(group="k", sum_pushed_sum="SUM(pushed_sum)", sum_pushed_count="SUM(pushed_count)")|
  7. | MergeSort(sort="k ASC")|
  8. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k`, SUM(`id`) AS `pushed_sum`, COUNT(`id`) AS `pushed_count` FROM `sbtest1` WHERE (`id` > ?) GROUP BY `k` ORDER BY `k`")|
  9. | HitCache:false|
  10. +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  11. 5 rows in set (0.01 sec)

DRDS converts the AVG aggregate function to SUM or COUNT, and then converts it to local aggregation and global aggregation respectively based on the push rules of SUM and COUNT. You can try to understand the execution plans of other aggregate functions.

Note: DRDS converts the DISTINCT operation to the GROUP operation as follows:

  1. mysql> explain select distinct k from sbtest1 where id > 1000;
  2. +-----------------------------------------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +-----------------------------------------------------------------------------------------------------------------------------------------------------+
  5. | Aggregate(group="k") |
  6. | MergeSort(sort="k ASC") |
  7. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k` FROM `sbtest1` WHERE (`id` > ?) GROUP BY `k` ORDER BY `k`") |
  8. | HitCache:false |
  9. +-----------------------------------------------------------------------------------------------------------------------------------------------------+
  10. 4 rows in set (0.02 sec)

TmpSort

TmpSort sorts data in memory. The difference from MergeSort is that MergeSort can have multiple child operators, and the data returned by each child operator has been sorted. TmpSort has only one child operator.

The query plan information for TmpSort is consistent with that for MergeSort. For more information, see MergeSort.

Project

The Project operator indicates a projection operation to select some columns from the input data for output or to convert some columns (by using a function or expression computation) for output. The Project operator can also contain constants. In the preceding AVG example, the top-level is a Project, and its output is k and sum_pushed_sum/sum_pushed_count. The latter corresponds to a column named avg_id.

  1. mysql> explain select 'Hello, DRDS', 1 / 2, CURTIME();
  2. +-------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +-------------------------------------------------------------------------------------+
  5. Project(Hello, DRDS="_UTF-16'Hello, DRDS'", 1 / 2="1 / 2", CURTIME()="CURTIME()")
  6. | |
  7. | HitCache:false |
  8. +-------------------------------------------------------------------------------------+
  9. 3 rows in set (0.00 sec)

The Project plan includes the name of each column and the corresponding columns, values, functions, and expressions.

Filter

The Filter operator performs a filtering operation that contains some filter conditions. This operator performs filtering on the input data. The data that meets the filter conditions is output and the remaining data is discarded. The following example includes most of the operators described previously and therefore is rather complex.

  1. mysql> explain select k, avg(id) avg_id from sbtest1 where id > 1000 group by k having avg_id > 1300;
  2. +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  5. | Filter(condition="avg_id > ?1") |
  6. | Project(k="k", avg_id="sum_pushed_sum / sum_pushed_count") |
  7. | Aggregate(group="k", sum_pushed_sum="SUM(pushed_sum)", sum_pushed_count="SUM(pushed_count)") |
  8. | MergeSort(sort="k ASC") |
  9. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k`, SUM(`id`) AS `pushed_sum`, COUNT(`id`) AS `pushed_count` FROM `sbtest1` WHERE (`id` > ?) GROUP BY `k` ORDER BY `k`") |
  10. | HitCache:false |
  11. +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  12. 6 rows in set (0.01 sec)

Based on the SQL of the preceding AVG example, having avg_id > 1300 is added. A Filter operator is added at the top of the execution plan to filter all data that satisfies avg_id > 1300.

You may ask why the condition in WHERE has no corresponding Filter operator? At a stage of the DRDS optimizer, the Filter operator of the WHERE condition does exist, but it is finally pushed down to LogiacalView. Therefore, you can find id > 1000 in LogicalView’s SQL.

NlJoin

NlJoin is the NestLoop Join operator, which allows you to use the NestLoop method to join two tables. DRDS implements two JOIN policies: NlJoin and BKAJoin. The latter refers to Batched Key Access Join. When you query data by using key-value pairs, a batch of data is retrieved from the left table. An IN condition is concatenated into the SQL statement for accessing the right table to obtain a batch of data from the right table at a time.

  1. mysql> explain select a.* from sbtest1 a, sbtest1 b where a.id = b.k and a.id > 1000;
  2. +----------------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +----------------------------------------------------------------------------------------------------------------------------+
  5. | Project(id="id", k="k", c="c", pad="pad") |
  6. | NlJoin(id="id", k="k", c="c", pad="pad", k0="k0", condition="id = k", type="inner") |
  7. | UnionAll(concurrent=true) |
  8. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)") |
  9. | UnionAll(concurrent=true) |
  10. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k` FROM `sbtest1` WHERE (`k` > ?)") |
  11. | HitCache:false |
  12. +----------------------------------------------------------------------------------------------------------------------------+
  13. 7 rows in set (0.03 sec)

The NlJOIN plan includes three parts:

  • Output column info: the output column name. In this example, the JOIN statement returns five columns. id="id", k="k", c="c", pad="pad", k0="k0".
  • _condition: __the join condition. In this example, the join condition is id = k.
  • type: the connection type. In this example, the type is INNER JOIN. Therefore, the connection type is inner.

BKAJoin

BKAJoin: JOIN is performed by using batch key-value queries. That is, a batch of data is retrieved from the left table. An IN condition is concatenated into the SQL statement for accessing the right table to obtain a batch of data from the right table at a time.

  1. mysql> explain select a.* from sbtest1 a, sbtest1 b where a.id = b.k order by a.id;
  2. +-------------------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +-------------------------------------------------------------------------------------------------------------------------------+
  5. | Project(id="id", k="k", c="c", pad="pad") |
  6. | BKAJoin(id="id", k="k", c="c", pad="pad", id0="id0", k0="k0", c0="c0", pad0="pad0", condition="id = k", type="inner") |
  7. | MergeSort(sort="id ASC") |
  8. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` ORDER BY `id`") |
  9. | UnionAll(concurrent=true) |
  10. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`k` IN ('?'))") |
  11. | HitCache:false |
  12. +-------------------------------------------------------------------------------------------------------------------------------+
  13. 7 rows in set (0.01 sec)

The plan content of BKAJoin is the same as that of NlJoin. The two operators have different names and are designed to inform the executor of the method used to perform the JOIN operation. In addition, 'k' IN ('?') in LogicalView on the right table in the preceding execution plan is an IN query template created by the optimizer for querying data in the right table.

LogicalModifyView

As mentioned above, the LogicalView operator obtains data from the underlying data source. Correspondingly, the LogicalModifyView operator modifies the underlying data source and also includes an SQL statement. This SQL statement may be an INSERT, UPDATE, or DELETE statement.

  1. mysql> explain update sbtest1 set c='Hello, DRDS' where id > 1000;
  2. +--------------------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +--------------------------------------------------------------------------------------------------------------------------------+
  5. | LogicalModifyView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="UPDATE `sbtest1` SET `c` = ? WHERE (`id` > ?)") |
  6. | HitCache:false |
  7. +--------------------------------------------------------------------------------------------------------------------------------+
  8. 2 rows in set (0.03 sec)
  9. mysql> explain delete from sbtest1 where id > 1000;
  10. +-------------------------------------------------------------------------------------------------------------------------+
  11. | LOGICAL PLAN |
  12. +-------------------------------------------------------------------------------------------------------------------------+
  13. | LogicalModifyView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="DELETE FROM `sbtest1` WHERE (`id` > ?)") |
  14. | HitCache:false |
  15. +-------------------------------------------------------------------------------------------------------------------------+
  16. 2 rows in set (0.03 sec)

The query plan of the LogicalModifyView operator is similar to that of the LogicalView operator, including the delivered physical table shards, the number of table shards, and an SQL template. Similarly, PlanCache is enabled, thus the SQL statement is parameterized and constants in the SQL template are replaced with question marks (?).

PhyTableOperation

PhyTableOperation: performs an operation on a physical table shard. This operator is used only in INSERT INTO… VALUES ….

  1. mysql> explain insert into sbtest1 values(1, 1, '1', '1'),(2, 2, '2', '2');
  2. +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  5. | PhyTableOperation(tables="SYSBENCH_CORONADB_1526954857179TGMMSYSBENCH_CORONADB_VGOC_0000_RDS.[sbtest1_001]", sql="INSERT INTO ? (`id`, `k`, `c`, `pad`) VALUES(?, ?, ?, ?)", params="`sbtest1_001`,1,1,1,1") |
  6. | PhyTableOperation(tables="SYSBENCH_CORONADB_1526954857179TGMMSYSBENCH_CORONADB_VGOC_0000_RDS.[sbtest1_002]", sql="INSERT INTO ? (`id`, `k`, `c`, `pad`) VALUES(?, ?, ?, ?)", params="`sbtest1_002`,2,2,2,2") |
  7. | |
  8. | HitCache:false |
  9. +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  10. 4 rows in set (0.00 sec)

In this example, the INSERT statement is executed to insert two rows of data, with each row of data corresponding to one PhyTableOperation operator. The PhyTableOperation operator consists of the following parts of information:

  • tables: the name of a physical table. Only one physical table name is specified.
  • sql: the SQL template. Table names and constants in the SQL template are all parameterized and replaced with question marks (?) and the corresponding parameters are listed in the params parameter.
  • __params: the actual parameters corresponding to the question marks (?) in the SQL template, including table names and constants.

Other information

HitCache

DRDS enables PlanCache by default. HitCache is used to inform you about whether the query hits PlanCache. In the following example, HitCache is set to false for the first run and true for the second run.

  1. mysql> explain select * From sbtest1 where id > 1000;
  2. +-----------------------------------------------------------------------------------------------------------------------+
  3. | LOGICAL PLAN |
  4. +-----------------------------------------------------------------------------------------------------------------------+
  5. | UnionAll(concurrent=true) |
  6. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)") |
  7. | HitCache:false |
  8. +-----------------------------------------------------------------------------------------------------------------------+
  9. 3 rows in set (0.01 sec)
  10. mysql> explain select * From sbtest1 where id > 1000;
  11. +-----------------------------------------------------------------------------------------------------------------------+
  12. | LOGICAL PLAN |
  13. +-----------------------------------------------------------------------------------------------------------------------+
  14. | UnionAll(concurrent=true) |
  15. | LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)") |
  16. | HitCache:true |
  17. +-----------------------------------------------------------------------------------------------------------------------+
  18. 3 rows in set (0.00 sec)