This topic describes how to query execution plans by running the EXPLAIN command, and introduces some basic operators, such as LogicalView, Gather, and MergeSort. Other operators, such as Join, Agg, and Sort, are separately described in follow-up topics.

Background information

Generally, the following two steps are required for SQL tuning:

  1. Analyze problems. For example, to analyze the problems, you can run the EXPLAIN command to view execution plans, and you can also use EXPLAIN ANALYZE to view the actual execution status. For more information, see Introduction to the query executor.
  2. Use Hint to control optimizer behavior and modify execution plans.

Execution plans and the EXPLAIN command

The following example describes how to obtain the execution plans of queries by running the EXPLAIN command.

The following syntax of EXPLAIN is available:

Note Hint must be placed next to EXPLAIN.
EXPLAIN <SQL Statement>
EXPLAIN <Hint> <SQL Statement>

All the examples in this topic are based on the following schema:

CREATE TABLE `sbtest1` (
  `id`  INT(10) UNSIGNED NOT NULL,
  `k`   INT(10) UNSIGNED NOT NULL DEFAULT '0',
  `c`   CHAR(120)        NOT NULL DEFAULT '',
  `pad` CHAR(60)         NOT NULL DEFAULT '',
  KEY `xid` (`id`),
  KEY `k_1` (`k`)
) dbpartition BY HASH (`id`) tbpartition BY HASH (`id`) tbpartitions 4

After you run the following EXPLAIN command,PolarDB-X 1.0 returns the corresponding execution plan.

mysql> explain select a.k, count(*) cnt from sbtest1 a, sbtest1 b where a.id = b.k and a.id > 1000 group by k having cnt > 1300 order by cnt limit 5, 10;
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL PLAN                                                                                                                                      |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| MemSort(sort="cnt ASC", offset=? 2, fetch=? 3)                                                                                                      |
|   Filter(condition="cnt > ? 1")                                                                                                                    |
|     HashAgg(group="k", cnt="COUNT()")                                                                                                           |
|       BKAJoin(id="id", k="k", c="c", pad="pad", id0="id0", k0="k0", c0="c0", pad0="pad0", condition="id = k", type="inner")                       |
|         MergeSort(sort="k ASC")                                                                                                                   |
|           LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?) ORDER BY `k`")        |
|         Gather(concurrent=true)                                                                                                                 |
|           LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE ((`k` > ?) AND (`k` IN ('?')))") |
| HitCache:false                                                                                                                                    |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.01 sec)

In addition to the execution plan, the result of EXPLAIN includes some additional information. In the preceding example, only HitCache that indicates whether Plan Cache is hit is included as additional information. For more information about detailed principles, see Manage execution plans.

Introduction to operators

PolarDB-X 1.0 supports the following operators.

Meaning Physical operator
Delivery of queries LogicalView, LogicalModifyView, and PhyTableOperation
Join BKAJoin, NLJoin, HashJoin, SortMergeJoin, HashSemiJoin, SortMergeSemiJoin, and MaterializedSemiJoin
Sorting MemSort and TopN
Aggregation (Group By) HashAgg and SortAgg
Data exchange Gather and MergeSort
Others Project, Filter, Limit, Union, and Window

The following sections describe the meaning and implementation of a part of operators. The rest of operators are described in follow-up topics.

LogicalView

The LogicalView operator pulls data from the storage layer MySQL that serves as a data source. LogicalView is similar to TableScan or IndexScan in other databases, but supports more pushdown operations. LogicalView contains SQL statements and data source information that are pushed down and is more like a view. The SQL statements that are pushed down may contain multiple types of operators, such as Project, Filter, aggregation, sorting, Join, and subqueries. The following example shows the output information and meaning of LogicalView in the EXPLAIN command:

> explain select * From sbtest1 where id > 1000;

Gather(concurrent=true)
   LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)")

The information of LogicalView consists of three parts:

  • tables: the table name that corresponds to the storage layer MySQL. The value of the tables parameter is split by a period (.). Information before the period (.) specifies the ID that corresponds to the database sharding, and information after the period (.) specifies the table name and its ID. For example, [000-127] indicates all the tables whose table name IDs range from 000 to 127.
  • shardCount: the total number of table shardings to be accessed. In this example, 128 table shardings whose IDs range from 000 to 127 are accessed.
  • sql: the SQL template that is delivered to the storage layer MySQL.PolarDB-X 1.0 During the execution, PolarDB-X 1.0 replaces the table name with the name of the physical table and replaces the parameterized constant question mark (?) with the actual parameter. For more information, see Manage execution plans.

Gather

Gather merges multiple sets of data into one set of data. In the preceding example, the Gather operator merges the data that is retrieved from all the table shardings into one set of data.

If parallel queries are disabled, the Gather operator usually appears above LogicalView. This indicates that the Gather operator collects and merges data of all the table shardings. If parallel queries are enabled, the Gather operator may be pulled up to a higher position. In this case, the Gather operator collects and merges the computing results of all Workers.

The concurrent parameter in the Gather operator specifies whether to concurrently run sub-operators. The default value is true. This indicates that data is concurrently pulled. If parallel queries are enabled, the attribute of the Gather operator that is pulled up changes and is displayed as parallel=true.

MergeSort

MergeSort is a merge sort operator that merges and sorts the ordered data streams into one ordered data stream. Example:

> explain select * from sbtest1 where id > 1000 order by id limit 5,10;

MergeSort(sort="id ASC", offset=? 1, fetch=? 2)
  LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?) ORDER BY `id` LIMIT (? + ?)")

The MergeSort operator consists of three parts:

  • sort: the sorting field and the sorting order. id ASC indicates sorting by the ID field in ascending order, and DESC indicates sorting in descending order.
  • offset: the offset when the result set is obtained. The value in this example is parameterized. The actual value is 5.
  • fetch: the maximum number of data rows to return. The value of this parameter is parameterized. This is similar to the value of the offset parameter. The actual value is 10.

Project

Project: the projection operation. That is, some columns are selected from input data for output, or some columns are converted (calculated by using functions or expressions) for output. The Project operator can also contain constants.

> explain select 'Hello, DRDS', 1 / 2, CURTIME();

Project(Hello, DRDS="_UTF-16'Hello, DRDS'", 1 / 2="1 / 2", CURTIME()="CURTIME()")

The plan of the Project operator contains the name of each column and the corresponding columns, values, functions, or expressions.

Filter

The Filter operator indicates a filtering operation and contains some filter conditions. This operator filters input data. If the data meets the conditions, the data is returned as the output. Otherwise, the data is discarded. The following complex example contains most of the preceding operators.

> explain select k, avg(id) avg_id from sbtest1 where id > 1000 group by k having avg_id > 1300;

Filter(condition="avg_id > ? 1")
  Project(k="k", avg_id="sum_pushed_sum / sum_pushed_count")
    SortAgg(group="k", sum_pushed_sum="SUM(pushed_sum)", sum_pushed_count="SUM(pushed_count)")
      MergeSort(sort="k ASC")
        LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k`, SUM(`id`) AS `pushed_sum`, COUNT(`id`) AS `pushed_count` FROM `sbtest1` WHERE (`id` > ?) GROUP BY `k` ORDER BY `k`")

The reason that no Filter operation corresponds to the WHERE id > 1000 condition is because the Filter operator is pushed down to LogicalView. You can view WHERE (id > ?) in the SQL statement of LogicalView.

Union All and Union Distinct

As the name implies, Union All corresponds to UNIONALL and Union Distinct corresponds to UNIONDISTINCT. These operators typically have two or more inputs. This indicates that the operators merge data of multiple inputs. Example:

> explain select * From sbtest1 where id > 1000 union distinct select * From sbtest1 where id < 200;

UnionDistinct(concurrent=true)
  Gather(concurrent=true)
    LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)")
  Gather(concurrent=true)
    LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` < ?)")

LogicalModifyView

The preceding section describes that the LogicalView operator obtains data from underlying data sources. Whereas, the LogicalModifyView operator modifies underlying data sources. LogicalModifyView also records an SQL statement. The SQL statement may be INSERT, UPDATE, or DELETE.

> explain update sbtest1 set c='Hello, DRDS' where id > 1000;

LogicalModifyView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="UPDATE `sbtest1` SET `c` = ? WHERE (`id` > ?)"
> explain delete from sbtest1 where id > 1000;

LogicalModifyView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="DELETE FROM `sbtest1` WHERE (`id` > ?)")

The content of the query plan of LogicalModifyView is similar to that of LogicalView, including the delivered physical table shardings, the number of table shardings, and the SQL template. Similarly, constants in the SQL template are replaced with question marks (?) because the execution plan cache is enabled and SQL statements are parameterized.

PhyTableOperation

The PhyTableOperation operator directly performs an operation on a physical table sharding.

Note Typically, this operator is used only in INSERT statements. However, when SQL statements are routed to a shard, this operator also appears in SELECT statements.
> explain insert into sbtest1 values(1, 1, '1', '1'),(2, 2, '2', '2');

PhyTableOperation(tables="SYSBENCH_CORONADB_1526954857179TGMMSYSBENCH_CORONADB_VGOC_0000_RDS.[sbtest1_001]", sql="INSERT INTO ? (`id`, `k`, `c`, `pad`) VALUES(?, ?, ?, ?)", params="`sbtest1_001`,1,1,1,1")
PhyTableOperation(tables="SYSBENCH_CORONADB_1526954857179TGMMSYSBENCH_CORONADB_VGOC_0000_RDS.[sbtest1_002]", sql="INSERT INTO ? (`id`, `k`, `c`, `pad`) VALUES(?, ?, ?, ?)", params="`sbtest1_002`,2,2,2,2")

In this example, the INSERT statement inserts two rows of data. Each row of data corresponds to a PhyTableOperation operator. The content of the PhyTableOperation operator consists of three parts:

  • tables: the name of a physical table. Only one name of the physical table is specified.
  • sql: the SQL template. All the table names and constants in the SQL template are parameterized and replaced with question marks ?. The corresponding parameters are provided in the subsequent params parameter.
  • params: the parameters that correspond to the SQL template, including table names and constants.