You can analyze query profiles to help improve query performance. This topic describes how to obtain the query profiles of ApsaraDB for SelectDB.
Enable the query profile feature
Configure the enable_profile variable to enable the query profile feature. For more information, see Variable management.
SET enable_profile=true;After you enable the query profile feature, SelectDB generates a profile for the execution of each query.
A query profile contains the execution details of a query on each node. This helps you analyze query performance bottlenecks.
View query profiles
Execute the following statement to view all query profiles that are stored:
The following statement is supported only for SelectDB instances whose kernel versions are earlier than SelectDB Core V4.0.0. If your instance runs SelectDB Core V4.0.0 or later, we recommend that you preview or download query profiles in the SelectDB console. You can also send HTTP requests to obtain the query profiles. For more information, see Use the query audit feature and Export query profiles.
SHOW QUERY PROFILE "/"\G;In the returned result set, each row of data corresponds to a query profile. You can view the details of a query profile based on the profile ID.
In HTTP requests that are sent to query profiles, fields such as Query ID and query_id indicate the profile ID.
SHOW QUERY PROFILE "/"\G
*************************** 1. row ***************************
Profile ID: c257c52f93e149ee-ace8ac14e8c9fef9
Task Type: QUERY
Start Time: 2021-04-08 11:30:50
End Time: 2021-04-08 11:30:50
Total: 9ms
Task State: EOF
User: root
Default Db: default_cluster:db1
Sql Statement: select tbl1.k1, sum(tbl1.k2) from tbl1 join tbl2 on tbl1.k1 = tbl2.k1 group by tbl1.k1 order by tbl1.k1Use query profiles
Perform the following steps to troubleshoot the performance bottlenecks of an SQL query statement:
View the execution plan tree.
This step is mainly used to analyze the execution plan as a whole and view the execution time of each fragment. In the returned result, each node is marked with the fragment to which it belongs, and the sender node of each fragment is marked with the execution time of this fragment. The time is the longest among the execution time of all execution units in a fragment. This execution plan tree helps find the most time-consuming fragment. The following sample code shows a sample execution plan tree:
SHOW QUERY PROFILE "/c257c52f93e149ee-ace8ac14e8c9fef9"\G *************************** 1. row *************************** Fragments: ┌──────────────────────┐ │[-1: DataBufferSender]│ │Fragment: 0 │ │MaxActiveTime: 6.626 ms│ └──────────────────────┘ │ ┌──────────────────┐ │[9: EXCHANGE_NODE]│ │Fragment: 0 │ └──────────────────┘ │ ┌──────────────────────┐ │[9: DataStreamSender] │ │Fragment: 1 │ │MaxActiveTime: 5.449 ms│ └──────────────────────┘ │ ┌──────────────┐ │[4: SORT_NODE]│ │Fragment: 1 │ └──────────────┘ ┌┘ ┌─────────────────────┐ │[8: AGGREGATION_NODE]│ │Fragment: 1 │ └─────────────────────┘ └┐ ┌──────────────────┐ │[7: EXCHANGE_NODE]│ │Fragment: 1 │ └──────────────────┘ │ ┌──────────────────────┐ │[7: DataStreamSender] │ │Fragment: 2 │ │MaxActiveTime: 3.505 ms│ └──────────────────────┘ ┌┘ ┌─────────────────────┐ │[3: AGGREGATION_NODE]│ │Fragment: 2 │ └─────────────────────┘ │ ┌───────────────────┐ │[2: HASH_JOIN_NODE]│ │Fragment: 2 │ └───────────────────┘ ┌────────────┴────────────┐ ┌──────────────────┐ ┌──────────────────┐ │[5: EXCHANGE_NODE]│ │[6: EXCHANGE_NODE]│ │Fragment: 2 │ │Fragment: 2 │ └──────────────────┘ └──────────────────┘ │ │ ┌─────────────────────┐ ┌────────────────────────┐ │[5: DataStreamSender]│ │[6: DataStreamSender] │ │Fragment: 4 │ │Fragment: 3 │ │MaxActiveTime: 1.87 ms│ │MaxActiveTime: 636.767 us│ └─────────────────────┘ └────────────────────────┘ │ ┌┘ ┌───────────────────┐ ┌───────────────────┐ │[0: OLAP_SCAN_NODE]│ │[1: OLAP_SCAN_NODE]│ │Fragment: 4 │ │Fragment: 3 │ └──────────────────┘ └──────────────────┘ │ │ ┌─────────────┐ ┌─────────────┐ │[OlapScanner]│ │[OlapScanner]│ │Fragment: 4 │ │Fragment: 3 │ └─────────────┘ └─────────────┘ │ │ ┌─────────────────┐ ┌─────────────────┐ │[SegmentIterator]│ │[SegmentIterator]│ │Fragment: 4 │ │Fragment: 3 │ └─────────────────┘ └─────────────────┘ 1 row in set (0.02 sec)View the execution units in a specific fragment.
In the preceding example, Fragment 1 is the most time-consuming fragment. Therefore, you can view the execution units in Fragment 1. The following sample code shows how to query the execution nodes and execution time of all execution units in Fragment 1:
SHOW QUERY PROFILE "/c257c52f93e149ee-ace8ac14e8c9fef9/1"; +-----------------------------------+-------------------+------------+ | Instances | Host | ActiveTime | +-----------------------------------+-------------------+------------+ | c257c52f93e149ee-ace8ac14e8c9ff03 | 10.200.00.01:9060 | 5.449ms | | c257c52f93e149ee-ace8ac14e8c9ff05 | 10.200.00.02:9060 | 5.367ms | | c257c52f93e149ee-ace8ac14e8c9ff04 | 10.200.00.03:9060 | 5.358ms | +-----------------------------------+-------------------+------------+View a specific execution unit.
You can view the profile of each operator on a specific execution unit. The following sample code shows how to query the profile of each operator on the execution unit c257c52f93e149ee-ace8ac14e8c9ff03 in Fragment 1:
SHOW QUERY PROFILE "/c257c52f93e149ee-ace8ac14e8c9fef9/1/c257c52f93e149ee-ace8ac14e8c9ff03"\G *************************** 1. row *************************** Instance: ┌───────────────────────────────────────┐ │[9: DataStreamSender] │ │(Active: 37.222 us, non-child: 0.40) │ │ - Counters: │ │ - BytesSent: 0.00 │ │ - IgnoreRows: 0 │ │ - OverallThroughput: 0.0 /sec │ │ - PeakMemoryUsage: 8.00 KB │ │ - SerializeBatchTime: 0ns │ │ - UncompressedRowBatchSize: 0.00 │ └───────────────────────────────────────┘ └┐ │ ┌──────────────────────────────────┐ │[4: SORT_NODE] │ │(Active: 5.421 ms, non-child: 0.71)│ │ - Counters: │ │ - PeakMemoryUsage: 12.00 KB │ │ - RowsReturned: 0 │ │ - RowsReturnedRate: 0 │ └──────────────────────────────────┘ ┌┘ │ ┌───────────────────────────────────┐ │[8: AGGREGATION_NODE] │ │(Active: 5.355 ms, non-child: 10.68)│ │ - Counters: │ │ - BuildTime: 3.701us │ │ - GetResultsTime: 0 ns │ │ - HTResize: 0 │ │ - HTResizeTime: 1.211 us │ │ - HashBuckets: 0 │ │ - HashCollisions: 0 │ │ - HashFailedProbe: 0 │ │ - HashFilledBuckets: 0 │ │ - HashProbe: 0 │ │ - HashTravelLength: 0 │ │ - LargestPartitionPercent: 0 │ │ - MaxPartitionLevel: 0 │ │ - NumRepartitions: 0 │ │ - PartitionsCreated: 16 │ │ - PeakMemoryUsage: 34.02 MB │ │ - RowsProcessed: 0 │ │ - RowsRepartitioned: 0 │ │ - RowsReturned: 0 │ │ - RowsReturnedRate: 0 │ │ - SpilledPartitions: 0 │ └───────────────────────────────────┘ └┐ │ ┌──────────────────────────────────────────┐ │[7: EXCHANGE_NODE] │ │(Active: 4.360 ms, non-child: 46.84) │ │ - Counters: │ │ - BytesReceived: 0.00 │ │ - ConvertRowBatchTime: 387 ns │ │ - DataArrivalWaitTime: 4.357 ms │ │ - DeserializeRowBatchTimer: 0 ns │ │ - FirstBatchArrivalWaitTime: 4.356 ms│ │ - PeakMemoryUsage: 0.00 │ │ - RowsReturned: 0 │ │ - RowsReturnedRate: 0 │ │ - SendersBlockedTotalTimer(*): 0 ns │ └──────────────────────────────────────────┘
Export query profiles
Perform the following steps to export query profiles for subsequent analysis:
Only SelectDB instances that run SelectDB Core V3.0.4 and later support this feature.
Enable the query profile feature.
SET enable_profile=true;Execute a query statement.
The following code shows a sample query statement. Replace it with an SQL query statement that you want to use in actual use.
-- Send an SQL request. In this case, the profile of the query statement is generated. SELECT count(1) FROM test_table LIMIT 10;Query profile IDs.
SelectDB Core V3.0.x
The following sample code shows an example on how to query the profile IDs of a SelectDB instance that runs SelectDB Core V3.0.x:
SHOW QUERY PROFILE "/"; +-----------------------------------+-----------+---------------------+---------------------+-------+------------+-------+------------------------------------+-------------------------------+ | Profile ID | Task Type | Start Time | End Time | Total | Task State | User | Default Db | Sql Statement | +-----------------------------------+-----------+---------------------+---------------------+-------+------------+-------+------------------------------------+-------------------------------+ | b9c9ba063d9d4365-97878361371e757a | QUERY | 2024-02-07 17:40:04 | 2024-02-07 17:40:04 | 32ms | EOF | admin | default_cluster:information_schema | select * from user_privileges | | 8800c306137e4072-9bb1ed419f4ac9f2 | QUERY | 2024-02-07 17:40:00 | 2024-02-07 17:40:00 | 3ms | ERR | admin | default_cluster:information_schema | select * from user_priveleges | | e2efdd1a996c4de2-ab939ad49b409990 | QUERY | 2024-02-07 17:39:51 | 2024-02-07 17:39:51 | 13ms | EOF | admin | | SELECT DATABASE() | +-----------------------------------+-----------+---------------------+---------------------+-------+------------+-------+------------------------------------+-------------------------------+SelectDB Core V4.0.x
The following sample code shows an example on how to query the profile IDs of an ApsaraDB for SelectDB instance that runs SelectDB Core V4.0.x. The
Query IDfield indicates theprofile ID.curl -u'<userName>:<userPassword>' "http://<selectdbAddress>:<httpPort>/rest/v2/manager/query/query_info?is_all_node=true" # Response { "msg": "success", "code": 0, "data": { "column_names": [ "Query ID", "FE Node", "Query User", "Execution Database", "Sql", "Query Type", "Start Time", "End Time", "Execution Duration", "Status" ], "rows": [ [ ... ] ] }, "count": 0 }Run the curl command to export the profile into a file based on the profile ID:
curl -u'<userName>:<userPassword>' "http://<selectdbAddress>:<httpPort>/api/profile?query_id=<query_id>" # You can use a redirection symbol to export the profiles into a file. curl -u'<userName>:<userPassword>' "http://<selectdbAddress>:<httpPort>/api/profile?query_id=<query_id>" > res.txtParameters
Parameter
Description
userName
The username that is used to access the SelectDB instance.
userPassword
The password that is used to access the SelectDB instance.
selectdbAddress
The endpoint of the SelectDB instance.
httpPort
The HTTP port number of the SelectDB instance. Default value: 8080.
query_id
The query ID of the profile.
Profile parameters
The following tables describe the parameters in the collected statistics.
Fragment
Parameter | Description |
AverageThreadTokens | The number of threads that are used for fragment execution, excluding the usage of the thread pool. |
Buffer Pool PeakReservation | The peak memory usage of the buffer pool. |
MemoryLimit | The memory limit for the query. |
PeakMemoryUsage | The peak memory usage of an execution unit during the query. |
RowsProduced | The number of rows whose columns are processed. |
BlockMgr
Parameter | Description |
BlocksCreated | The number of blocks that are created by BlockManager. |
BlocksRecycled | The number of blocks that are reused. |
BytesWritten | The total size of data that is written to the disk. |
MaxBlockSize | The size of a single block. |
TotalReadBlockTime | The total time consumed to read blocks. |
DataStreamSender
Parameter | Description |
BytesSent | The total size of data that is sent. The data size is calculated by using the following formula: Total size of data that is sent = Number of receivers × Size of data that is sent. |
IgnoreRows | The number of rows that are filtered. |
LocalBytesSent | The size of data that is sent by the local node and then received by the local node during data exchange. |
OverallThroughput | The total throughput. The total throughput is calculated by using the following formula: Total throughput = Value of the BytesSent parameter/Execution time. |
SerializeBatchTime | The time consumed to serialize the data that is sent. |
UncompressedRowBatchSize | The size of the sent RowBatch data before compression. |
ODBC_TABLE_SINK
Parameter | Description |
NumSentRows | The total number of rows that are written to an external table. |
TupleConvertTime | The time consumed to serialize the sent data into INSERT statements. |
ResultSendTime | The time consumed to write data by using the Open Database Connectivity (ODBC) driver. |
EXCHANGE_NODE
Parameter | Description |
BytesReceived | The size of data that is received over the network. |
MergeGetNext | If sorting is performed on the child node, the exchange node performs unified merging and sorting and outputs ordered results. This parameter indicates the total time consumed for merging and sorting, including the time specified by the MergeGetNextBatch parameter. |
MergeGetNextBatch | The time consumed by the merge node to fetch data. For single-level merging and sorting, the object that fetches data is a network queue. For multi-level merging and sorting, the object that fetches data is a child merger. |
ChildMergeGetNext | If excessive senders are sending data, single-thread merging becomes a performance bottleneck. In this case, ApsaraDB for SelectDB starts multiple threads on the child merge node to perform merging and sorting in parallel. This parameter records the time consumed by the child merge node for sorting, which is the sum of the time consumed by all threads. |
ChildMergeGetNextBatch | The time consumed for the child merge node to fetch data. If excessive time is consumed, the child data sending node may have bottlenecks. |
DataArrivalWaitTime | The total time to wait for the sender to send data. |
FirstBatchArrivalWaitTime | The time to wait for the first batch to obtain data from the sender. |
DeserializeRowBatchTimer | The time consumed to deserialize network data. |
SendersBlockedTotalTimer(*) | The total waiting time on the sender side if the memory of the DataStreamRecv queue is fully occupied. |
ConvertRowBatchTime | The time consumed to convert the received data into RowBatch data. |
RowsReturned | The number of rows that are received. |
RowsReturnedRate | The rate at which rows are received. |
SORT_NODE
Parameter | Description |
InMemorySortTime | The time consumed for in-memory sorting. |
InitialRunsCreated | The number of times to initialize the sorting. The value is 1 for in-memory sorting. |
SortDataSize | The total size of data that is sorted. |
MergeGetNext | The time consumed by MergeSort to obtain the next batch from multiple sorted runs. Timing is performed only when data is written to disks. |
MergeGetNextBatch | The time consumed by MergeSort to extract the batch of the next sorted run. Timing is performed only when data is written to disks. |
TotalMergesPerformed | The number of times that external merging is performed. |
AGGREGATION_NODE
Parameter | Description |
PartitionsCreated | The number of partitions into which the aggregate query is split. |
GetResultsTime | The time consumed to obtain aggregate results from partitions. |
HTResizeTime | The time consumed to resize the hash table. |
HTResize | The number of times that the hash table is resized. |
HashBuckets | The number of buckets in the hash table. |
HashBucketsWithDuplicate | The number of buckets with duplicate nodes in the hash table. |
HashCollisions | The number of hash collisions that occur in the hash table. |
HashDuplicateNodes | The number of duplicate nodes that belong to the same bucket in the hash table. |
HashFailedProbe | The number of failed probe operations that are performed on the hash table. |
HashFilledBuckets | The number of buckets that are filled with data in the hash table. |
HashProbe | The number of times that the hash table is queried. |
HashTravelLength | The number of steps that are moved during the hash table query. |
HASH_JOIN_NODE
Parameter | Description |
ExecOption | The hash table construction method for a right child node. The methods include the synchronous and asynchronous methods. The right child node in a join may be a table or a subquery. This rule also applies to left child nodes. |
BuildBuckets | The number of buckets in the hash table. |
BuildRows | The number of rows in the hash table. |
BuildTime | The time consumed to construct a hash table. |
LoadFactor | The load factor of the hash table, which indicates the number of non-empty buckets. |
ProbeRows | The number of rows on which the hash probe operation is performed after the left child nodes are traversed. |
ProbeTime | The time consumed to traverse the left child nodes for hash probe operation, excluding the time consumed to call the GetNext operation for RowBatch on the left child node side. |
PushDownComputeTime | The time consumed to compute predicate pushdown conditions. |
PushDownTime | The total time consumed for predicate pushdown. Join queries on right child nodes that meet the conditions are converted to IN queries on left child nodes. |
CROSS_JOIN_NODE
Parameter | Description |
ExecOption | The RowBatchList construction method for a right child node. The methods include the synchronous and asynchronous methods. |
BuildRows | The number of RowBatchList rows, which indicates the number of rows of a right child node. |
BuildTime | The time consumed for RowBatchList construction. |
LeftChildRows | The number of rows of a left child node. |
LeftChildTime | The time consumed to traverse the left child nodes and right child nodes for the generation of the Cartesian product, excluding the time consumed to call the GetNext operation for RowBatch on the left child node side. |
UNION_NODE
Parameter | Description |
MaterializeExprsEvaluateTime | The time consumed to calculate the type conversion expression and materialize the result if the field types on both sides of UNION are inconsistent. |
ANALYTIC_EVAL_NODE
Parameter | Description |
EvaluationTime | The total time consumed for analytic function (window function) compute. |
GetNewBlockTime | The time consumed to apply for a new block during initialization. The block is used to cache the Rows window or the entire partition for analytic function compute. |
PinTime | The time consumed to apply for a new block or re-read the block that is written to the disk back to the memory. |
UnpinTime | The time consumed to write data from a block to the disk if the block is not required or if the current operator has a high memory pressure. |
OLAP_SCAN_NODE
OLAP_SCAN_NODE is responsible for specific data scan tasks. It generates one or more OLAP scanners. Each scanner thread is responsible for scanning partial data.
Some or all of the predicate conditions in the query are pushed to OLAP_SCAN_NODE. Some of these predicate conditions are pushed down to the storage engine for data filtering by using the indexes of the storage engine. The other predicate conditions are retained in OLAP_SCAN_NODE and are used to filter the data returned from the storage engine.
The profile of OLAP_SCAN_NODE is generally used to analyze the efficiency of data scanning. The profile is divided into the following three layers based on the call relationships: OLAP_SCAN_NODE, OlapScanner, and SegmentIterator.
The following sample code shows a typical OLAP_SCAN_NODE node. The meanings of some metrics may vary with the storage format (V1 or V2).
OLAP_SCAN_NODE (id=0):(Active: 1.2ms, % non-child: 0.00%)
- BytesRead: 265.00 B # The size of data that is read from the data file. For example, if 10 32-bit integers are read, the data size is 10 × 4 bytes = 40 bytes. This parameter indicates only the size of the data that is fully displayed in the memory, and does not indicate the actual I/O size.
- NumDiskAccess: 1 # The number of disks that are involved in the ScanNode node.
- NumScanners: 20 # The number of scanners that are generated by the ScanNode node.
- PeakMemoryUsage: 0.00 # The peak memory usage during query. The memory is not used.
- RowsRead: 7 # The number of rows that are returned from the storage engine to the scanner, excluding the number of rows that are filtered by the scanner.
- RowsReturned: 7 # The number of rows that are returned from the ScanNode node to the parent node.
- RowsReturnedRate: 6.979K /sec # RowsReturned/ActiveTime
-TabletCount: 20 # The number of tablets involved in the ScanNode node.
- TotalReadThroughput: 74.70 KB/sec # The total read throughout that is calculated by dividing BytesRead by the total execution time on this node from the initiation to the completion of the query. For I/O constrained queries, the value is close to the total throughput of the disk.
- ScannerBatchWaitTime: 426.886us # The time for the transfer thread to wait for the scanner thread to return RowBatch data.
- ScannerWorkerWaitTime: 17.745us # The time for the scanner thread to wait for the available worker threads in the thread pool.
OlapScanner:
- BlockConvertTime: 8.941us # The time consumed to convert a vectorized block to a row-structured RowBlock. A vectorized block is VectorizedRowBatch in V1 and RowBlockV2 in V2.
- BlockFetchTime: 468.974us # The time for the rowset reader to obtain blocks.
- ReaderInitTime: 5.475ms # The time for the OLAP scanner to initialize readers. The MergeHeap creation time is included in V1. The time consumed for generating all levels of Iterators and reading the first group of blocks is included in V2.
- RowsDelFiltered: 0 # The number of rows that are filtered out, including those based on the DELETE information in the tablet and those for rows that are marked for deletion in the Unique Key model.
- RowsPushedCondFiltered: 0 # The condition that is filtered out based on the predicate that is pushed down, such as the condition that is passed from BuildTable to ProbeTable in the join operation. This value is not accurate. If the filtering effect is poor, no filtering operation is performed.
- ScanTime: 39.24us # The time for the data to be returned from ScanNode to the parent node.
- ShowHintsTime_V1: 0ns # This parameter is not applicable in V2. Some data is read for ScanRange splitting in V1.
SegmentIterator:
- BitmapIndexFilterTimer: 779ns # The time consumed to filter data by using bitmap indexes.
- BlockLoadTime: 415.925us # The time consumed for the segment reader (V1) or segment iterator (V2) to obtain blocks.
- BlockSeekCount: 12 # The number of times that block seeking is performed during segment reading.
- BlockSeekTime: 222.556us # The time consumed for block seeking during segment reading.
- BlocksLoad: 6 # The number of blocks that are read.
- CachedPagesNum: 30 # The number of pages that hit the cache after page cache is enabled only in V2.
- CompressedBytesRead: 0.00 # The size of data before decompression that is read from the file in V1, or the size of the read pages before compression that do not hit the page cache in V2.
- DecompressorTimer: 0ns # The time consumed for data decompression.
- IOTimer: 0ns # The actual I/O time of reading data from the operating system.
- IndexLoadTime_V1: 0ns # The time consumed to read index streams only in V1.
- NumSegmentFiltered: 0 # The number of segments that are completely filtered out based on the column statistics and query conditions.
- NumSegmentTotal: 6 # The number of all segments that are involved in the query.
- RawRowsRead: 7 # The number of source data rows that are read from the storage engine. For more information, see the following section.
- RowsBitmapIndexFiltered: 0 # The number of rows that are filtered by using bitmap indexes only in V2.
- RowsBloomFilterFiltered: 0 # The number of rows that are filtered out by using BloomFilter indexes only in V2.
- RowsKeyRangeFiltered: 0 # The number of rows that are filtered out by using SortkeyIndex indexes only in V2.
- RowsStatsFiltered: 0 # The number of rows that are filtered out by using ZoneMap indexes in V2, including delete conditions. The number of rows that are filtered out by using BloomFilter indexes is included in V1.
- RowsConditionsFiltered: 0 # The number of rows that are filtered out by using various column indexes only in V2.
- RowsVectorPredFiltered: 0 # The number of rows that are filtered out by using filtering operations based on vectorized conditions.
- TotalPagesNum: 30 # The total number of pages that are read only in V2.
- UncompressedBytesRead: 0.00 # The size of the data file that is read after decompression in V1. If decompression is not required, this parameter specifies the size of the file. This parameter specifies the size of pages that do not hit the page cache after decompression in V2. If decompression is not required, this parameter specifies the size of pages.
- VectorPredEvalTime: 0ns # The time consumed to perform filtering by vectorized condition.
- ShortPredEvalTime: 0ns # The time consumed to perform filtering by short circuit predicate.
- PredColumnReadTime: 0ns # The time consumed for predicate column reading.
- LazyReadTime: 0ns # The time consumed to read non-predicate columns.
- OutputColumnTime: 0ns # The time consumed for column materialization.
Predicate condition pushdown and index usage can be inferred from the metrics related to the number of data rows in the profile. The following section describes the profile information during the reading of segments in the V2 format. These metrics have slightly different meanings for segments in the V1 format.
Initialization phase
During the reading of a segment in the V2-format, if the query has key_ranges (a query range consisting of prefix keys), the data is first filtered by using SortkeyIndex indexes. The number of rows that are filtered out is recorded in
RowsKeyRangeFiltered.Accurate filtering is performed on columns by using bitmap indexes if the query conditions contain bitmap indexes. The number of rows that is filtered out is recorded in
RowsBitmapIndexFiltered.Data is filtered based on the equivalent
(eq, in, and is)condition in query conditions by using BloomFilter indexes. The number of rows that are filtered out is recorded inRowsBloomFilterFiltered.RowsBloomFilterFilteredspecifies the difference between the total number of rows in the segment (not the number of rows that are filtered out by using bitmap indexes) and the remaining number of rows after filtering by using BloomFilter indexes. As a result, the data that is filtered by using BloomFilter indexes may overlap with the data that is filtered by using bitmap indexes.Data is filtered by using ZoneMap indexes based on query and delete conditions. The number of rows that are filtered out is recorded in
RowsStatsFiltered.RowsConditionsFilteredspecifies the number of rows that are filtered out by using various indexes, including the values ofRowsBloomFilterFilteredandRowsStatsFiltered.
Next phase
The number of rows that are filtered out by using the delete conditions in the next phase is recorded in
RowsDelFiltered. The number of rows that are actually filtered by using the delete conditions is recorded inRowsStatsFilteredandRowsDelFiltered.RawRowsReadspecifies the number of rows to be read after the preceding filtering operations are performed.RowsReadspecifies the number of rows that are returned to the scanner. Generally, the value ofRowsReadis less than that ofRawRowsRead. This is because the data that is returned from the storage engine to the scanner may be aggregated. If the difference between the value ofRawRowsReadand that ofRowsReadis larger than expected, a large number of rows are aggregated, which may be time-consuming.RowsReturnedspecifies the number of rows that are finally returned by ScanNode to the parent node. Generally, the value ofRowsReturnedis also less than that ofRowsRead. Some predicate conditions are not pushed down to the storage engine on the scanner. This indicates that filtering is performed. If the difference between the value ofRowsReadand that ofRowsReturnedis larger than expected, a large number of rows are filtered on the scanner. This indicates that multiple predicate conditions with high selectivity are not pushed to the storage engine. The filtering efficiency of the scanner is lower than that of the storage engine.
Based on the preceding metrics, you can roughly analyze the number of rows that are processed by the storage engine and the number of rows in the final results after the filtering. This set of metrics (Rows***Filtered) also allows you to analyze whether the query conditions are pushed down to the storage engine and the filtering effect of different indexes.
Other metrics
You can also perform analysis based on the following metrics:
Metrics under
OlapScanner, such asIOTimerandBlockFetchTime, are the accumulation of all scanner thread metrics. As a result, the metrics values may be relatively large. Scanner threads asynchronously read data. In this case, these accumulated metric values can only reflect the accumulated working time of the scanner, and cannot directly indicate the time consumed by ScanNode. The proportion of the time consumed by ScanNode in the entire execution plan is recorded inActive. In some cases, the value ofIOTimeris tens of seconds, whereas the value ofActiveis only a few seconds. The following items may describe the causes.IOTimeris the cumulative time of multiple scanners, and a large number of scanners are available.The parent node is time-consuming. For example, the parent node consumes 100 seconds, whereas ScanNode consumes only 10 seconds. The value of the
Activeparameter may be only a few milliseconds. When the parent node is processing data, ScanNode has asynchronously scanned the data and prepared the data. Therefore, the parent node can obtain the data that are prepared in ScanNode. In this case, the active time is short.
NumScannersindicates the number of tasks that the scanner submits to the thread pool, which is scheduled by the thread pool inRuntimeState. Thedoris_scanner_thread_pool_thread_numanddoris_scanner_thread_pool_queue_sizeparameters respectively specify the size and queue length of the thread pool. If the number of threads is excessive or insufficient, the query efficiency is affected. You can also divide some metrics by the number of threads to roughly estimate the time consumed by each thread.TabletCountindicates the number of tablets to be scanned. Excessive tablets indicate that a large number of random read and data merge operations are required.UncompressedBytesReadindirectly reflects the size of data that is read. If the value is large, a large number of I/O operations may be performed.You can check the page cache hits based on
CachedPagesNumandTotalPagesNum. A higher hit ratio indicates that the I/O and decompression operations consume less time.
Buffer pool
Parameter | Description |
AllocTime | The time consumed for memory allocation. |
CumulativeAllocationBytes | The size of cumulative memory allocation. |
CumulativeAllocations | The cumulative number of memory allocations. |
PeakReservation | The peak reservation. |
PeakUnpinnedBytes | The size of memory data for unpinning. |
PeakUsedReservation | The memory usage of reservation. |
ReservationLimit | The limit of reservation in the buffer pool. |