This topic describes how to query execution plans by running the EXPLAIN
command, and introduces some basic operators, such as LogicalView
, Gather
, and MergeSort
. Other operators, such as Join, Agg, and Sort, are separately described in follow-up
topics.
Background information
Generally, the following two steps are required for SQL tuning:
- Analyze problems. For example, to analyze the problems, you can run the
EXPLAIN
command to view execution plans, and you can also useEXPLAIN ANALYZE
to view the actual execution status. For more information, see Introduction to the query executor. - Use
Hint
to control optimizer behavior and modify execution plans.
Execution plans and the EXPLAIN
command
The following example describes how to obtain the execution plans of queries by running
the EXPLAIN
command.
The following syntax of EXPLAIN
is available:
Hint
must be placed next to EXPLAIN
.
EXPLAIN <SQL Statement>
EXPLAIN <Hint> <SQL Statement>
All the examples in this topic are based on the following schema:
CREATE TABLE `sbtest1` (
`id` INT(10) UNSIGNED NOT NULL,
`k` INT(10) UNSIGNED NOT NULL DEFAULT '0',
`c` CHAR(120) NOT NULL DEFAULT '',
`pad` CHAR(60) NOT NULL DEFAULT '',
KEY `xid` (`id`),
KEY `k_1` (`k`)
) dbpartition BY HASH (`id`) tbpartition BY HASH (`id`) tbpartitions 4
After you run the following EXPLAIN
command,PolarDB-X 1.0 returns the corresponding execution plan.
mysql> explain select a.k, count(*) cnt from sbtest1 a, sbtest1 b where a.id = b.k and a.id > 1000 group by k having cnt > 1300 order by cnt limit 5, 10;
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| MemSort(sort="cnt ASC", offset=? 2, fetch=? 3) |
| Filter(condition="cnt > ? 1") |
| HashAgg(group="k", cnt="COUNT()") |
| BKAJoin(id="id", k="k", c="c", pad="pad", id0="id0", k0="k0", c0="c0", pad0="pad0", condition="id = k", type="inner") |
| MergeSort(sort="k ASC") |
| LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?) ORDER BY `k`") |
| Gather(concurrent=true) |
| LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE ((`k` > ?) AND (`k` IN ('?')))") |
| HitCache:false |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.01 sec)
In addition to the execution plan, the result of EXPLAIN
includes some additional information. In the preceding example, only HitCache
that indicates whether Plan Cache is hit is included as additional information. For
more information about detailed principles, see Manage execution plans.
Introduction to operators
PolarDB-X 1.0 supports the following operators.
Meaning | Physical operator |
---|---|
Delivery of queries | LogicalView , LogicalModifyView , and PhyTableOperation |
Join | BKAJoin , NLJoin , HashJoin , SortMergeJoin , HashSemiJoin , SortMergeSemiJoin , and MaterializedSemiJoin |
Sorting | MemSort and TopN |
Aggregation (Group By) | HashAgg and SortAgg |
Data exchange | Gather and MergeSort |
Others | Project , Filter , Limit , Union , and Window |
The following sections describe the meaning and implementation of a part of operators. The rest of operators are described in follow-up topics.
LogicalView
The LogicalView
operator pulls data from the storage layer MySQL that serves as a data source. LogicalView
is similar to TableScan or IndexScan in other databases, but supports more pushdown
operations. LogicalView
contains SQL statements and data source information that are pushed down and is more
like a view. The SQL statements that are pushed down may contain multiple types of
operators, such as Project, Filter, aggregation, sorting, Join, and subqueries. The
following example shows the output information and meaning of LogicalView
in the EXPLAIN
command:
> explain select * From sbtest1 where id > 1000;
Gather(concurrent=true)
LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)")
The information of LogicalView
consists of three parts:
tables
: the table name that corresponds to the storage layer MySQL. The value of the tables parameter is split by a period (.). Information before the period (.) specifies the ID that corresponds to the database sharding, and information after the period (.) specifies the table name and its ID. For example,[000-127]
indicates all the tables whose table name IDs range from 000 to 127.shardCount
: the total number of table shardings to be accessed. In this example, 128 table shardings whose IDs range from 000 to 127 are accessed.sql
: the SQL template that is delivered to the storage layer MySQL.PolarDB-X 1.0 During the execution, PolarDB-X 1.0 replaces the table name with the name of the physical table and replaces the parameterized constant question mark (?) with the actual parameter. For more information, see Manage execution plans.
Gather
Gather
merges multiple sets of data into one set of data. In the preceding example, the
Gather
operator merges the data that is retrieved from all the table shardings into one
set of data.
If parallel queries are disabled, the Gather
operator usually appears above LogicalView
. This indicates that the Gather operator collects and merges data of all the table
shardings. If parallel queries are enabled, the Gather
operator may be pulled up to a higher position. In this case, the Gather
operator collects and merges the computing results of all Workers.
The concurrent
parameter in the Gather
operator specifies whether to concurrently run sub-operators. The default value is
true
. This indicates that data is concurrently pulled. If parallel queries are enabled,
the attribute of the Gather
operator that is pulled up changes and is displayed as parallel=true
.
MergeSort
MergeSort
is a merge sort operator that merges and sorts the ordered data streams into one
ordered data stream. Example:
> explain select * from sbtest1 where id > 1000 order by id limit 5,10;
MergeSort(sort="id ASC", offset=? 1, fetch=? 2)
LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?) ORDER BY `id` LIMIT (? + ?)")
The MergeSort
operator consists of three parts:
sort
: the sorting field and the sorting order.id ASC
indicates sorting by the ID field in ascending order, andDESC
indicates sorting in descending order.offset
: the offset when the result set is obtained. The value in this example is parameterized. The actual value is 5.fetch
: the maximum number of data rows to return. The value of this parameter is parameterized. This is similar to the value of the offset parameter. The actual value is 10.
Project
Project
: the projection operation. That is, some columns are selected from input data for
output, or some columns are converted (calculated by using functions or expressions)
for output. The Project operator can also contain constants.
> explain select 'Hello, DRDS', 1 / 2, CURTIME();
Project(Hello, DRDS="_UTF-16'Hello, DRDS'", 1 / 2="1 / 2", CURTIME()="CURTIME()")
The plan of the Project
operator contains the name of each column and the corresponding columns, values,
functions, or expressions.
Filter
The Filter
operator indicates a filtering operation and contains some filter conditions. This
operator filters input data. If the data meets the conditions, the data is returned
as the output. Otherwise, the data is discarded. The following complex example contains
most of the preceding operators.
> explain select k, avg(id) avg_id from sbtest1 where id > 1000 group by k having avg_id > 1300;
Filter(condition="avg_id > ? 1")
Project(k="k", avg_id="sum_pushed_sum / sum_pushed_count")
SortAgg(group="k", sum_pushed_sum="SUM(pushed_sum)", sum_pushed_count="SUM(pushed_count)")
MergeSort(sort="k ASC")
LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k`, SUM(`id`) AS `pushed_sum`, COUNT(`id`) AS `pushed_count` FROM `sbtest1` WHERE (`id` > ?) GROUP BY `k` ORDER BY `k`")
The reason that no Filter operation corresponds to the WHERE id > 1000
condition is because the Filter operator is pushed down to LogicalView
. You can view WHERE (id > ?)
in the SQL statement of LogicalView
.
Union All and Union Distinct
As the name implies, Union All corresponds to UNIONALL
and Union Distinct corresponds to UNIONDISTINCT
. These operators typically have two or more inputs. This indicates that the operators
merge data of multiple inputs. Example:
> explain select * From sbtest1 where id > 1000 union distinct select * From sbtest1 where id < 200;
UnionDistinct(concurrent=true)
Gather(concurrent=true)
LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)")
Gather(concurrent=true)
LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` < ?)")
LogicalModifyView
The preceding section describes that the LogicalView
operator obtains data from underlying data sources. Whereas, the LogicalModifyView
operator modifies underlying data sources. LogicalModifyView also records an SQL
statement. The SQL statement may be INSERT
, UPDATE
, or DELETE
.
> explain update sbtest1 set c='Hello, DRDS' where id > 1000;
LogicalModifyView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="UPDATE `sbtest1` SET `c` = ? WHERE (`id` > ?)"
> explain delete from sbtest1 where id > 1000;
LogicalModifyView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="DELETE FROM `sbtest1` WHERE (`id` > ?)")
The content of the query plan of LogicalModifyView
is similar to that of LogicalView
, including the delivered physical table shardings, the number of table shardings,
and the SQL template. Similarly, constants in the SQL template are replaced with question
marks (?
) because the execution plan cache is enabled and SQL statements are parameterized.
PhyTableOperation
The PhyTableOperation
operator directly performs an operation on a physical table sharding.
> explain insert into sbtest1 values(1, 1, '1', '1'),(2, 2, '2', '2');
PhyTableOperation(tables="SYSBENCH_CORONADB_1526954857179TGMMSYSBENCH_CORONADB_VGOC_0000_RDS.[sbtest1_001]", sql="INSERT INTO ? (`id`, `k`, `c`, `pad`) VALUES(?, ?, ?, ?)", params="`sbtest1_001`,1,1,1,1")
PhyTableOperation(tables="SYSBENCH_CORONADB_1526954857179TGMMSYSBENCH_CORONADB_VGOC_0000_RDS.[sbtest1_002]", sql="INSERT INTO ? (`id`, `k`, `c`, `pad`) VALUES(?, ?, ?, ?)", params="`sbtest1_002`,2,2,2,2")
In this example, the INSERT statement inserts two rows of data. Each row of data corresponds
to a PhyTableOperation
operator. The content of the PhyTableOperation
operator consists of three parts:
tables
: the name of a physical table. Only one name of the physical table is specified.sql
: the SQL template. All the table names and constants in the SQL template are parameterized and replaced with question marks?
. The corresponding parameters are provided in the subsequentparams
parameter.params
: the parameters that correspond to the SQL template, including table names and constants.