This topic describes how operations are performed by using the operators.

Operators

Description Operator
Operators that are used to push down operations to data nodes. LogicalView, LogicalModifyView, PhyTableOperation, and IndexScan
Operators that are used to join tables. BKAJoin, NLJoin, HashJoin, SortMergeJoin, HashSemiJoin, SortMergeSemiJoin, and MaterializedSemiJoin
Operators that are used to sort data rows. MemSort, TopN, and MergeSort
Aggregate operators that are used in GROUP BY clauses. HashAgg and SortAgg
Operators that are used to redistribute or aggregate data. Exchange and Gather
Filter Filter
The operator that is used to select columns. Project
Operators that are used to merge data sets. UnionAll and UnionDistinct
The operator that is used to specify the maximum number of output rows. You can use the limit operator or the Offset and Fetch arguments to specify the maximum number of output rows. Limit
The Window function. OverWindow

Operators that are used to push down operations to data nodes to be executed

LogicalView

The LogicalView operator obtains data from data nodes. The LogicalView operator is similar to the TableScan and IndexScan operators that are supported by other database services. The LogicalView operator can be used to push down more operations to the storage layer than the TableScan and IndexScan operators. A LogicalView plan contains SQL statements that are pushed down to the storage layer to be executed and the information about the data sources. A LogicalView plan provides a view of the operations that are pushed down to the storage layer to be performed. A statement that is pushed down to the storage layer to be executed can contain operators such as the Project operator, Filter operator, aggregate operators, sort operators, join operators, and subqueries. You can execute the following statement to query the information that is included in the LogicalView plan of an execution plan.

explain select * From sbtest1 where id > 1000;

The following information is returned:

Gather(concurrent=true)
   LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)")
Arguments of the LogicalView operator:
  • tables: specifies the tables on which the statement is executed. In the example, the value is in the [Serial numbers of database shards].Table name_[Serial numbers of table shards] format. [000-127] indicates table shards whose serial numbers range from 000 to 127.
  • shardCount: specifies the total number of table shards that are scanned. In the example, 128 table shards whose serial numbers range from 000 to 127 are scanned.
  • sql: specifies the SQL template that is pushed down to the storage layer. When PolarDB-X executes an SQL statement, PolarDB-X replaces the table name with the physical table name and replaces the question marks (?) with the actual values. For more information, see Manage execution plans.

LogicalModifyView

The LogicalModifyView operator is used to modify data on data nodes. The LogicalModifyView plan contains an SQL statement. The SQL statement can be an INSERT, UPDATE, or DELETE statement. You can execute the following statement to query the information that is included in the LogicalModifyView plan of an execution plan.
  • Example 1
    explain update sbtest1 set c='Hello, DRDS' where id > 1000;

    The following information is returned:

    LogicalModifyView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="UPDATE `sbtest1` SET `c` = ? WHERE (`id` > ?)"
  • Example 2
    explain delete from sbtest1 where id > 1000;

    The following information is returned:

    LogicalModifyView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="DELETE FROM `sbtest1` WHERE (`id` > ?)")
The content of a LogicalModifyView plan is similar to that of a LogicalView plan. The plan contains the names of the physical table shards on which the SQL statement is executed, the number of table shards, and the SQL template. If the execution plan cache feature is enabled, SQL statements are parameterized and constants in the SQL template are replaced with question marks (?).

PhyTableOperation

The PhyTableOperation operator is used to perform an operation on a physical table shard.

Note In most cases, this operator is used to execute INSERT statements. If a SELECT statement is routed to a table shard to be executed, the PhyTableOperation operator is assigned to execute the SELECT statement.
explain insert into sbtest1 values(1, 1, '1', '1'),(2, 2, '2', '2');

The following information is returned:

PhyTableOperation(tables="SYSBENCH_CORONADB_1526954857179TGMMSYSBENCH_CORONADB_VGOC_0000_RDS.[sbtest1_001]", sql="INSERT INTO ? (`id`, `k`, `c`, `pad`) VALUES(?, ?, ?, ?)", params="`sbtest1_001`,1,1,1,1")
PhyTableOperation(tables="SYSBENCH_CORONADB_1526954857179TGMMSYSBENCH_CORONADB_VGOC_0000_RDS.[sbtest1_002]", sql="INSERT INTO ? (`id`, `k`, `c`, `pad`) VALUES(?, ?, ?, ?)", params="`sbtest1_002`,2,2,2,2")
In the example, the INSERT statement is executed to insert two rows of data. A PhyTableOperation operation is performed to insert each row of data. Arguments of the PhyTableOperation operator:
  • tables: specifies the names of the physical tables. In the example, only one physical table is specified for each PhyTableOperation operation.
  • sql: specifies the SQL template. In an SQL template, the table name and constants are parameterized and replaced with question marks (?). The params argument is used to specify the table name and constants.
  • params: specifies the values of arguments that are included in the SQL template, including the table name and constants.

IndexScan

IndexScan is an operator that obtains data from data nodes. The operation of the IndexScan operator is similar to that of the LogicalView operator. The difference between IndexScan and LogicalView is that IndexScan scans index tables whereas LogicalView scans base tables. You can execute the following statement to query the information that is included in the IndexScan plan of an execution plan.

explain select * from sequence_one_base where integer_test=1;

The following information is returned:

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 IndexScan(tables="DRDS_POLARX1_QATEST_APP_000000_GROUP.gsi_sequence_one_index_3a0A_01", sql="SELECT `pk`, `integer_test`, `varchar_test`, `char_test`, `blob_test`, `tinyint_test`, `tinyint_1bit_test`, `smallint_test`, `mediumint_test`, `bit_test`, `bigint_test`, `float_test`, `double_test`, `decimal_test`, `date_test`, `time_test`, `datetime_test`, `timestamp_test`, `year_test`, `mediumtext_test` FROM `gsi_dml_sequence_one_index_index1` AS `gsi_dml_sequence_one_index_index1` WHERE (`integer_test` = ?)") 
If the sequence_one_base table does not contain a global secondary index and integer_test is not a partition key, all shards of the sequence_one_base table are scanned. In the example, the sequence_one_base table has a global secondary index named gsi_sequence_one_index and the index is created on the integer_test column. integer_test=1 is used as a condition to specify a shard of the gsi_sequence_one_index index table. In this case, the system scans only the specified shard.

Operators that can be executed on compute nodes or data nodes

UnionAll and UnionDistinct

The UnionAll operator corresponds to the UNIONALL operator and the UnionDistinct operator corresponds to the UNIONDISTINCT operator. A union operator is used to merge two or more sets of input data into one set of data. The following sample code provides an example on the UnionDistinct plan:

explain select * From sbtest1 where id > 1000 union distinct select * From sbtest1 where id < 200;

The following information is returned:

UnionDistinct(concurrent=true)
  Gather(concurrent=true)
    LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?)")
  Gather(concurrent=true)
    LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` < ?)")

Gather

The Gather operator is used to merge multiple sets of data into one set of data. In the preceding example that is used to describe the LogicalView plan, the Gather operator is used to merge the data that is obtained from the specified table shards into one set of data. In most cases, the Gather operator is displayed before the LogicalView operator to indicate that the data obtained from the specified table shards is merged.

Exchange

Exchange is a logical operator that cannot be used to compute data. The Exchange operator is used to redistribute input data and pushes the data to downstream operators. In most cases, the following redistribution methods are used:
  • SINGLETON: merges multiple sets of data. The operation is the same as the operation of the Gather operator.
  • HASH_DISTRIBUTED: repartitions input data based on specified columns. This method is used in execution plans that include join operators and aggregate operators.
  • BROADCAST_DISTRIBUTED: divides a set of data into multiple sets and then broadcasts each set of data to different downstream nodes. This method is used in execution plans for massively parallel processing (MPP).

MergeSort

MergeSort is a merge sort operator that is used to merge and sort multiple ordered data streams into one ordered data stream. The following sample code provides an example on the MergeSort plan.

explain select * from sbtest1 where id > 1000 order by id limit 5,10; 

The following information is returned:

MergeSort(sort="id ASC", offset=?1, fetch=?2)   
   LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT * FROM `sbtest1` WHERE (`id` > ?) ORDER BY `id` LIMIT (? + ?)")
Arguments of the MergeSort operator:
  • sort: specifies the column based on which sorting is performed and the sorting method. In the example, id specifies that sorting is performed based on the id column, and ASC specifies that the data streams are sorted in ascending order. If DESC is specified as the sorting method, the data streams are sorted in descending order.
  • offset: specifies the offset value after sorting is performed. In the example, the value is parameterized. The actual offset value is 5.
  • fetch: specifies the maximum number of data rows that can be returned. In the example, the value is also parameterized. The actual value is 10.

Project

The Project operator is used to select columns from input data or compute data that is in specified columns by using specified functions or expressions and then outputs the result. The Project operator can also contain constants.

explain select 'Hello, DRDS', 1 / 2, CURTIME(); 

The following information is returned:

Project(Hello, DRDS="_UTF-16'Hello, DRDS'", 1 / 2="1 / 2", CURTIME()="CURTIME()")
The plan of the Project operator can contain the name of each specified column, the values that are specified in the SQL statement, and the function or expression that is used to compute the specified values.

Filter

The Filter operator is used to filter data. The plan for the Filter operator contains conditions that are used to filter data. This operator is used to filter input data. This operator outputs the data that meets the specified conditions and discards the data that does not meet the specified conditions. In the following example, specific operators that are described in the preceding parts are used.

explain select k, avg(id) avg_id from sbtest1 where id > 1000 group by k having avg_id > 1300;

The following information is returned:

Filter(condition="avg_id > ?1")
  Project(k="k", avg_id="sum_pushed_sum / sum_pushed_count")
    SortAgg(group="k", sum_pushed_sum="SUM(pushed_sum)", sum_pushed_count="SUM(pushed_count)")
      MergeSort(sort="k ASC")
        LogicalView(tables="[0000-0031].sbtest1_[000-127]", shardCount=128, sql="SELECT `k`, SUM(`id`) AS `pushed_sum`, COUNT(`id`) AS `pushed_count` FROM `sbtest1` WHERE (`id` > ?) GROUP BY `k` ORDER BY `k`")
In the example, no Filter operator is assigned to filter data based on the WHERE id > 1000 condition because the WHERE id > 1000 condition is pushed down to the LogicalView operation. In the sample code, you can see the WHERE (id > ?) condition in the LogicalView plan.