This topic describes the operators in the execution plans of PolarDB-X 1.0, so that you can view the SQL execution process by querying the execution plan and optimize the SQL queries as needed.

Execution plans

When an SQL query is performed in PolarDB-X 1.0, the optimizer generates an execution plan that is represented as a tree structure. This is similar to most database systems. The tree structure consisting of relational operators shows how SQL statements are executed in PolarDB-X 1.0. Different from other database systems, PolarDB-X 1.0 does not store data but focuses more on the network I/O in the distributed environment. PolarDB-X V1.0 improves the execution efficiency of SQL queries by pushing computing operations down to shards of the database, such as an ApsaraDB RDS for MySQL or PolarDB for MySQL database. You can run the EXPLAIN statement to view the execution plan of an SQL query.

All the examples in this topic are based on the following schema:

CREATE TABLE `sbtest1` (
  `id`  INT(10) UNSIGNED NOT NULL,
  `k`   INT(10) UNSIGNED NOT NULL DEFAULT '0',
  `c`   CHAR(120)        NOT NULL DEFAULT '',
  `pad` CHAR(60)         NOT NULL DEFAULT '',
  KEY `xid` (`id`),
  KEY `k_1` (`k`)
) dbpartition BY HASH (`id`) tbpartition BY HASH (`id`) tbpartitions 4
            

The following example shows the tree structure of an execution plan generated in PolarDB-X 1.0.

explain select a.k, count(*) cnt from sbtest1 a, sbtest1 b where a.id = b.k and a.id > 1000 group by k having cnt > 1300 order by cnt limit 5, 10;
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                                                      |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| TmpSort(sort="cnt ASC", offset=?2, fetch=?3)                                                                                                      |
|   Filter(condition="cnt > ?1")                                                                                                                    |
|     Aggregate(group="k", cnt="COUNT()")                                                                                                           |
|       BKAJoin(id="id", k="k", c="c", pad="pad", id0="id0", k0="k0", c0="c0", pad0="pad0", condition="id = k", type="inner")                       |
|         Gather(sort="k ASC")                                                                                                                   |
|           LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?) ORDER BY `k`")        |
|         Gather(concurrent=true)                                                                                                                 |
|           LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE ((`k` > ?) AND (`k` IN ('?')))") |
| HitCache:false                                                                                                                                    |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.01 sec)
            

The return result of the EXPLAIN statement includes two parts: the execution plan and additional information.

  • Execution plan: Parent-child hierarchies among operators are represented by using indentation. In the example, Filter is the child operator of TmpSort and the parent operator of Aggregate. In the execution process of an SQL statement, each operator obtains data from the operation result of its child operator, processes the data, and then outputs the processed data to its parent operator. To facilitate understanding, the preceding execution plan is converted into a graphical tree structure:456789
  • Additional information: In addition to the execution plan, the return result of the EXPLAIN statement also contains additional information. Only HitCache is included in the additional information. By default, PolarDB-X 1.0 automatically enables the plan caching feature. HitCache indicates whether the SQL query hits the plan cache. After the plan caching feature is enabled, PolarDB-X 1.0 parameterizes the SQL statement by replacing the constants in the SQL statement with placeholders (?) and creates a list of parameters. In the execution plan, the SQL statement of the LogicalView operator contains placeholders (?). Some operators contain the following content: ?2, where 2 indicates the subscript used to mark the parameter in the parameter list. The following examples provide further explanations.

EXPLAIN syntax

The EXPLAIN statement is used to view the execution plan of an SQL statement. You can use the following syntax:

EXPLAIN
{LOGICALVIEW | LOGIC | SIMPLE | DETAIL | EXECUTE | PHYSICAL | OPTIMIZER | SHARDING
 | COST | ANALYZE | BASELINE | JSON_PLAN | ADVISOR} 
 {SELECT statement | DELETE statement | INSERT statement | REPLACE statement| UPDATE statement}         

Operators

LogicalView

The LogicalView operator obtains data from underlying data nodes. The LogicalView operator is similar to the TableScan operator that is commonly used in databases. However, PolarDB-X 1.0 does not store data but obtains data from the underlying data nodes by using SQL statements. In this case, the LogicalView operator works similarly to a view because it records the pushdown SQL statement and the information about the data nodes. The view contains an SQL statement that is pushed down by the optimizer and may involve multiple types of operations, such as projection, filtering, aggregation, sorting, joins, and subqueries.

The following example shows the output information and meaning of LogicalView in the EXPLAIN statement:

explain select * From sbtest1 where id > 1000;
+-----------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                          |
+-----------------------------------------------------------------------------------------------------------------------+
| UnionAll(concurrent=true)                                                                                             |
|   LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)") |
| HitCache:false                                                                                                        |
+-----------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
            

The LogicalView operator consists of the following three parts:

  • tables: specifies the names of table shards that correspond to the underlying data nodes. The value is split by a period (.). Information before the period (.) specifies the IDs of the database shards, and information after the period (.) specifies the table name and the table shard IDs. Consecutive IDs are abbreviated. For example, [000-127] indicates all table shards whose IDs range from 000 to 127.
  • shardCount: the total number of table shards to be accessed. In this example, 128 table shards whose IDs range from 000 to 127 are accessed.
  • sql: the template of the SQL statement that is pushed down to the underlying data nodes. The SQL statement shown in this example is not the actual pushdown statement. PolarDB-X 1.0 replaces the table name with the name of the physical table during SQL execution. In addition, the constant 10 in the SQL statement is replaced with a question mark (?). This is because the plan caching feature is enabled in PolarDB-X 1.0 by default and the SQL statement is parameterized.
UnionAll

UnionAll corresponds to the UNION ALL operator in SQL. The UnionAll operator is usually used to combine data from multiple inputs. In the preceding example, the UnionAll operator which works as the parent operator of LogicalView indicates that data in all table shards is combined.

The concurrent parameter in the UnionAll operator specifies whether to concurrently run sub-operators. The default value is true.

UnionDistinct

Similar to UnionAll, UnionDistinct corresponds to the UNION DISTINCT operator in SQL. Example:

explain select * From sbtest1 where id > 1000 union distinct select * From sbtest1 where id < 200;
+-------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                            |
+-------------------------------------------------------------------------------------------------------------------------+
| UnionDistinct(concurrent=true)                                                                                          |
|   UnionAll(concurrent=true)                                                                                             |
|     LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)") |
|   UnionAll(concurrent=true)                                                                                             |
|     LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` < ?)") |
| HitCache:false                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.02 sec)
            
MergeSort

MergeSort is a merge sort operator that usually works with multiple sub-operators. PolarDB-X 1.0 supports the following two types of sorting: merge sort on ordered data and in-memory sort on unordered data. Example:

explain select *from sbtest1 where id > 1000 order by id limit 5,10;
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                                                      |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| MergeSort(sort="id ASC", offset=?1, fetch=?2)                                                                                                     |
|   LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?) ORDER BY `id` LIMIT (? + ?)") |
| HitCache:false                                                                                                                                    |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
            

The MergeSort operator consists of the following three parts:

  • sort: the sort field and the sort order. id ASC indicates that data is sorted by the id field in ascending order, and DESC indicates that data is sorted in descending order.
  • offset: the offset when the result set is obtained. The SQL statement in this example is also parameterized. offset is set to ?1, where ? indicates a dynamic parameter and 1 represents the subscript used to mark the parameter in the parameter list. In this example, the SQL statement involves the following parameter values: 1000, 5, and 10. Therefore, the actual value of ?1 is 5.
  • fetch: specifies the maximum number of data rows that can be returned. Similar to the offset parameter, the value of this parameter is also parameterized. The actual value is 10.
Aggregate

The Aggregate operator performs aggregation operations on data and usually contains the GROUP BY clause and aggregate functions. Example:

explain select k, count(*) from sbtest1 where id > 1000 group by k;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                                                                                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Aggregate(group="k", count(*)="SUM(count(*))")                                                                                                                              |
|   MergeSort(sort="k ASC")                                                                                                                                                   |
|     LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k`, COUNT(*) AS `count(*)` FROM `sbtest1` WHERE (`id` > ?) GROUP BY `k` ORDER BY `k`") |
| HitCache:true                                                                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)
            

The Aggregate operator consists of the following two parts:

  • group: indicates the GROUP BY clause. In this example, the value is k.
  • Aggregate function: The output column name that corresponds to the aggregate function is specified before the equal sign (=). The aggregate function is specified after the equal sign. In count(*)="SUM(count(*))" of this example, the first count(*) corresponds to the output column name, and SUM(count(*)) returns the sum of the input data in the count(*) column to obtain the final result set displayed in the count(*) column.

PolarDB-X 1.0 divides the aggregation operation into two stages. First, the aggregation operation is pushed down to the underlying data node for local aggregation. Then, a global aggregation is performed on the result set of local aggregation in PolarDB-X 1.0. In addition, the global aggregation in PolarDB-X 1.0 is performed based on the data sorting. Therefore, a sort sub-operator is added by the optimizer and further converted to MergeSort after pushdown.

The following sample code provides an example of the AVG aggregate function:

explain select k, avg(id) avg_id from sbtest1 where id > 1000 group by k;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project(k="k", avg_id="sum_pushed_sum / sum_pushed_count")|
|   Aggregate(group="k", sum_pushed_sum="SUM(pushed_sum)", sum_pushed_count="SUM(pushed_count)")|
|     MergeSort(sort="k ASC")|
|       LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k`, SUM(`id`) AS `pushed_sum`, COUNT(`id`) AS `pushed_count` FROM `sbtest1` WHERE (`id` > ?) GROUP BY `k` ORDER BY `k`")|
| HitCache:false|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.01 sec)
            

PolarDB-X 1.0 converts the AVG aggregate function to a SUM or COUNT function. The SUM or COUNT function is then used to perform a local aggregate or a global aggregate based on the corresponding pushdown rules. You can learn about the execution plans of other aggregate functions as needed.

Note PolarDB-X 1.0 converts the DISTINCT operator to the GROUP BY operator. The following sample code provides an example:
explain select distinct k from sbtest1 where id > 1000;
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                                                        |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Aggregate(group="k")                                                                                                                                |
|   MergeSort(sort="k ASC")                                                                                                                           |
|     LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k` FROM `sbtest1` WHERE (`id` > ?) GROUP BY `k` ORDER BY `k`") |
| HitCache:false                                                                                                                                      |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.02 sec)
            
TmpSort

The TmpSort operator sorts data in memory. The MergeSort operator can have multiple sub-operators, and the data returned by each sub-operator is sorted. This is different from the TmpSort operator that has only one sub-operator.

The TmpSort operator and the MergeSort operator share the same information in the execution plan. You can refer to the description of the MergeSort operator.

Project

The Project operator is used to select columns from input data or compute data in specified columns by using specified functions or expressions and then returns the result. The Project operator can also contain constants. In the preceding example of the AVG aggregate function, the Project operator is at the top and returns k and sum_pushed_sum / sum_pushed_count. The output column name of sum_pushed_sum / sum_pushed_count is avg_id.

explain select 'Hello, DRDS', 1 / 2, CURTIME();
+-------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                        |
+-------------------------------------------------------------------------------------+
| Project(Hello, DRDS="_UTF-16'Hello, DRDS'", 1 / 2="1 / 2", CURTIME()="CURTIME()") |
|                                                                                     |
| HitCache:false                                                                      |
+-------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
            

The execution plan of the Project operator can contain the name of each specified column, the values that are specified in the SQL statement, and the function or expression that is used to compute the specified values.

Filter

The Filter operator is used to filter data. The execution plan for the Filter operator contains conditions that are used to filter data. This operator is used to filter input data. This operator returns the data that meets the specified conditions and discards the data that does not meet the specified conditions. In the following example, specific operators that are described in the preceding parts are used.

explain select k, avg(id) avg_id from sbtest1 where id > 1000 group by k having avg_id > 1300;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN  |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Filter(condition="avg_id > ?1")  |
|   Project(k="k", avg_id="sum_pushed_sum / sum_pushed_count")  |
|     Aggregate(group="k", sum_pushed_sum="SUM(pushed_sum)", sum_pushed_count="SUM(pushed_count)")  |
|       MergeSort(sort="k ASC")  |
|         LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k`, SUM(`id`) AS `pushed_sum`, COUNT(`id`) AS `pushed_count` FROM `sbtest1` WHERE (`id` > ?) GROUP BY `k` ORDER BY `k`") |
| HitCache:false  |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.01 sec)
            

having avg_id > 1300 is added to the SQL statement in the preceding example of the AVG aggregate function. A Filter operator is added to the top layer of the execution plan to filter all data that meets the avg_id > 1300 condition.

You may wonder why the WHERE clause does not contain the Filter operator. When the optimizer is executed in PolarDB-X 1.0, the Filter operator does exist in the WHERE clause, but the Filter operator is pushed down to the LogiacalView operator. Therefore, the id > 1000 condition is included in the value of the sql parameter of LogicalView.

NlJoin

NlJoin corresponds to the Nested Loop Join operator that joins two tables by using a nested loop. PolarDB-X 1.0 provides two types of JOIN operators: NlJoin and BKAJoin (Batched Key Access Join). BKAJoin reads a batch of data from the left table and uses the data in an IN condition. The IN condition is included in the SQL statement for accessing the right table. A batch of data is obtained from the right table at a time.

explain select a.* from sbtest1 a, sbtest1 b where a.id = b.k and a.id > 1000;
+----------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                               |
+----------------------------------------------------------------------------------------------------------------------------+
| Project(id="id", k="k", c="c", pad="pad")                                                                                  |
|   NlJoin(id="id", k="k", c="c", pad="pad", k0="k0", condition="id = k", type="inner")                                      |
|     UnionAll(concurrent=true)                                                                                              |
|       LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)")  |
|     UnionAll(concurrent=true)                                                                                              |
|       LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k` FROM `sbtest1` WHERE (`k` > ?)") |
| HitCache:false                                                                                                             |
+----------------------------------------------------------------------------------------------------------------------------+
7 rows in set (0.03 sec)
            

The execution plan of the NlJOIN operator includes the following three parts:

  • Output column information: the name of the output column. In this example, the NlJOIN operator returns the following five output columns: id="id", k="k", c="c", pad="pad", k0="k0".
  • condition: the join condition. In this example, id = k is used as the join condition.
  • type: the join type. In this example, INNER JOIN is used. Therefore, the parameter value is inner.
BKAJoin

The BKAJoin operator joins tables by accessing the table to be joined with a batch of keys. To be specific, BKAJoin reads a batch of data from the left table and uses the data in an IN condition. The IN condition is included in the SQL statement for accessing the right table. A batch of data is obtained from the right table at a time to perform a join operation.

explain select a.* from sbtest1 a, sbtest1 b where a.id = b.k order by a.id;
+-------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                                  |
+-------------------------------------------------------------------------------------------------------------------------------+
| Project(id="id", k="k", c="c", pad="pad")                                                                                     |
|   BKAJoin(id="id", k="k", c="c", pad="pad", id0="id0", k0="k0", c0="c0", pad0="pad0", condition="id = k", type="inner")       |
|     MergeSort(sort="id ASC")                                                                                                  |
|       LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` ORDER BY `id`")        |
|     UnionAll(concurrent=true)                                                                                                 |
|       LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`k` IN ('?'))") |
| HitCache:false                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------+
7 rows in set (0.01 sec)
            

The execution plan of the BKAJoin operator is the same as that of the NlJoin operator. The difference between these two operators is that the executor uses different methods to perform the JOIN operation. In the preceding execution plan, the content 'k' IN ('?') in the LogicalView of the right table is the IN statement template that is constructed by the optimizer to query data in the right table.

LogicalModifyView

The LogicalView operator is used to obtain data from data nodes. The LogicalModifyView operator is used to modify data on data nodes. The LogicalModifyView plan contains an SQL statement. The SQL statement can be an INSERT, UPDATE, or DELETE statement.

explain update sbtest1 set c='Hello, DRDS' where id > 1000;
+--------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------+
| LogicalModifyView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="UPDATE `sbtest1` SET `c` = ? WHERE (`id` > ?)") |
| HitCache:false                                                                                                                 |
+--------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.03 sec)

explain delete from sbtest1 where id > 1000;
+-------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                            |
+-------------------------------------------------------------------------------------------------------------------------+
| LogicalModifyView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="DELETE FROM `sbtest1` WHERE (`id` > ?)") |
| HitCache:false                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.03 sec)
            

The content of a LogicalModifyView plan is similar to that of a LogicalView plan. The plan contains the names of the physical table shards on which the SQL statement is executed, the number of table shards, and the SQL template. If the plan caching feature is enabled, SQL statements are parameterized and constants in the SQL template are replaced with question marks (?).

PhyTableOperation

The PhyTableOperation operator is used to perform an operation on a physical table shard. This operator is available for the INSERT INTO VALUES statement.

explain insert into sbtest1 values(1, 1, '1', '1'),(2, 2, '2', '2');
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                                                                                                                 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PhyTableOperation(tables="SYSBENCH_CORONADB_1526954857179TGMMSYSBENCH_CORONADB_VGOC_0000_RDS.[sbtest1_001]", sql="INSERT INTO ? (`id`, `k`, `c`, `pad`) VALUES(?, ?, ?, ?)", params="`sbtest1_001`,1,1,1,1") |
| PhyTableOperation(tables="SYSBENCH_CORONADB_1526954857179TGMMSYSBENCH_CORONADB_VGOC_0000_RDS.[sbtest1_002]", sql="INSERT INTO ? (`id`, `k`, `c`, `pad`) VALUES(?, ?, ?, ?)", params="`sbtest1_002`,2,2,2,2") |
|                                                                                                                                                                                                              |
| HitCache:false                                                                                                                                                                                               |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)
            

In this example, two rows of data are inserted. Each row of data corresponds to a PhyTableOperation operator. The PhyTableOperation operator consists of the following three parts:

  • tables: specifies the names of the physical tables. In the example, only one physical table is specified for each PhyTableOperation operation.
  • sql: specifies the SQL template. In an SQL template, the table name and constants are parameterized and replaced with question marks (?). The params parameter is used to specify the table name and constants.
  • params: specifies the values of parameters that are included in the SQL template, including the table name and constants.

Other Information

HitCache

By default, the plan caching feature is enabled in PolarDB-X 1.0. HitCache indicates whether the SQL query hits the plan cache. In the following example, HitCache returns false for the first execution and returns true for the second execution.

explain select * From sbtest1 where id > 1000;
+-----------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                          |
+-----------------------------------------------------------------------------------------------------------------------+
| UnionAll(concurrent=true)                                                                                             |
|   LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)") |
| HitCache:false                                                                                                        |
+-----------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.01 sec)

explain select * From sbtest1 where id > 1000;
+-----------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                          |
+-----------------------------------------------------------------------------------------------------------------------+
| UnionAll(concurrent=true)                                                                                             |
|   LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)") |
| HitCache:true                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)