All Products
Search
Document Center

ApsaraDB for SelectDB:Query profile

Last Updated:Mar 30, 2026

Query profiles capture per-node execution details for every query. Use them to pinpoint performance bottlenecks and optimize query performance.

Enable query profiles

Set the enable_profile variable to start collecting profiles:

SET enable_profile=true;
Note With enable_profile set to true, SelectDB generates a profile for every query that runs. Each profile records execution details across all nodes.

View query profiles

Important

The SHOW QUERY PROFILE statement is supported only for SelectDB Core versions earlier than V4.0.0. For SelectDB Core V4.0.0 and later, preview or download query profiles from the SelectDB console, or use the HTTP API. For details, see Use the query audit feature and Export query profiles.

Run the following statement to list all stored profiles:

SHOW QUERY PROFILE "/"\G;

Each row in the result corresponds to one profile:

SHOW QUERY PROFILE "/"\G
*************************** 1. row ***************************
   Profile ID: c257c52f93e149ee-ace8ac14e8c9fef9
    Task Type: QUERY
   Start Time: 2021-04-08 11:30:50
     End Time: 2021-04-08 11:30:50
        Total: 9ms
   Task State: EOF
         User: root
   Default Db: default_cluster:db1
Sql Statement: select tbl1.k1, sum(tbl1.k2) from tbl1 join tbl2 on tbl1.k1 = tbl2.k1 group by tbl1.k1 order by tbl1.k1
Note In HTTP API responses, the Query ID and query_id fields refer to the profile ID.

Analyze query profiles

Use the following three-step workflow to locate the performance bottleneck in a query. Each step drills one level deeper into the execution hierarchy.

Step 1: Identify the slowest fragment

Goal: Find which fragment is the primary bottleneck.

The execution plan tree shows the fragment structure and the execution time of each fragment. The MaxActiveTime value on each sender node is the longest execution time across all execution units in that fragment.

SHOW QUERY PROFILE "/<profile_id>"\G

Example output:

SHOW QUERY PROFILE "/c257c52f93e149ee-ace8ac14e8c9fef9"\G
*************************** 1. row ***************************
Fragments:
             ┌──────────────────────┐
             │[-1: DataBufferSender]│
             │Fragment: 0           │
             │MaxActiveTime: 6.626 ms│
             └──────────────────────┘
                         │
               ┌──────────────────┐
               │[9: EXCHANGE_NODE]│
               │Fragment: 0       │
               └──────────────────┘
                         │
             ┌──────────────────────┐
             │[9: DataStreamSender] │
             │Fragment: 1           │
             │MaxActiveTime: 5.449 ms│
             └──────────────────────┘
                         │
                 ┌──────────────┐
                 │[4: SORT_NODE]│
                 │Fragment: 1   │
                 └──────────────┘
                        ┌┘
             ┌─────────────────────┐
             │[8: AGGREGATION_NODE]│
             │Fragment: 1          │
             └─────────────────────┘
                        └┐
               ┌──────────────────┐
               │[7: EXCHANGE_NODE]│
               │Fragment: 1       │
               └──────────────────┘
                         │
             ┌──────────────────────┐
             │[7: DataStreamSender] │
             │Fragment: 2           │
             │MaxActiveTime: 3.505 ms│
             └──────────────────────┘
                        ┌┘
             ┌─────────────────────┐
             │[3: AGGREGATION_NODE]│
             │Fragment: 2 │
             └─────────────────────┘
                        │
              ┌───────────────────┐
              │[2: HASH_JOIN_NODE]│
              │Fragment: 2        │
              └───────────────────┘
           ┌────────────┴────────────┐
 ┌──────────────────┐      ┌──────────────────┐
 │[5: EXCHANGE_NODE]│      │[6: EXCHANGE_NODE]│
 │Fragment: 2       │      │Fragment: 2       │
 └──────────────────┘      └──────────────────┘
           │                         │
┌─────────────────────┐ ┌────────────────────────┐
│[5: DataStreamSender]│ │[6: DataStreamSender] │
│Fragment: 4          │ │Fragment: 3             │
│MaxActiveTime: 1.87 ms│ │MaxActiveTime: 636.767 us│
└─────────────────────┘ └────────────────────────┘
           │                        ┌┘
 ┌───────────────────┐    ┌───────────────────┐
 │[0: OLAP_SCAN_NODE]│    │[1: OLAP_SCAN_NODE]│
 │Fragment: 4        │    │Fragment: 3        │
 └──────────────────┘      └──────────────────┘
           │                        │
    ┌─────────────┐          ┌─────────────┐
    │[OlapScanner]│          │[OlapScanner]│
    │Fragment: 4  │          │Fragment: 3  │
    └─────────────┘          └─────────────┘
           │                        │
  ┌─────────────────┐      ┌─────────────────┐
  │[SegmentIterator]│      │[SegmentIterator]│
  │Fragment: 4      │      │Fragment: 3      │
  └─────────────────┘      └─────────────────┘

1 row in set (0.02 sec)

In this example, Fragment 1 has the highest MaxActiveTime (5.449 ms) — it is the primary bottleneck. Proceed to Step 2 with Fragment 1.

Step 2: Identify the slowest execution unit

Goal: Find which node instance within the bottleneck fragment is taking the most time.

After identifying the slowest fragment, list all execution units (instances) within it:

SHOW QUERY PROFILE "/<profile_id>/<fragment_id>";

Example output for Fragment 1:

SHOW QUERY PROFILE "/c257c52f93e149ee-ace8ac14e8c9fef9/1";
+-----------------------------------+-------------------+------------+
| Instances                         | Host              | ActiveTime |
+-----------------------------------+-------------------+------------+
| c257c52f93e149ee-ace8ac14e8c9ff03 | 10.200.00.01:9060 | 5.449ms    |
| c257c52f93e149ee-ace8ac14e8c9ff05 | 10.200.00.02:9060 | 5.367ms    |
| c257c52f93e149ee-ace8ac14e8c9ff04 | 10.200.00.03:9060 | 5.358ms    |
+-----------------------------------+-------------------+------------+

The instance with the highest ActiveTime is your target for Step 3.

Step 3: Inspect per-operator metrics

Goal: Determine which operator is consuming the most time and resources on the target execution unit.

SHOW QUERY PROFILE "/<profile_id>/<fragment_id>/<instance_id>"\G

Example output for instance c257c52f93e149ee-ace8ac14e8c9ff03 in Fragment 1:

SHOW QUERY PROFILE "/c257c52f93e149ee-ace8ac14e8c9fef9/1/c257c52f93e149ee-ace8ac14e8c9ff03"\G
*************************** 1. row ***************************
Instance:
 ┌───────────────────────────────────────┐
 │[9: DataStreamSender]                  │
 │(Active: 37.222 us, non-child: 0.40)    │
 │  - Counters:                          │
 │      - BytesSent: 0.00                │
 │      - IgnoreRows: 0                  │
 │      - OverallThroughput: 0.0 /sec    │
 │      - PeakMemoryUsage: 8.00 KB       │
 │      - SerializeBatchTime: 0ns        │
 │      - UncompressedRowBatchSize: 0.00 │
 └───────────────────────────────────────┘
                     └┐
                      │
    ┌──────────────────────────────────┐
    │[4: SORT_NODE]                    │
    │(Active: 5.421 ms, non-child: 0.71)│
    │  - Counters:                     │
    │      - PeakMemoryUsage: 12.00 KB │
    │ - RowsReturned: 0 │
    │      - RowsReturnedRate: 0       │
    └──────────────────────────────────┘
                     ┌┘
                     │
   ┌───────────────────────────────────┐
   │[8: AGGREGATION_NODE]              │
   │(Active: 5.355 ms, non-child: 10.68)│
   │  - Counters:                      │
   │      - BuildTime: 3.701us         │
   │      - GetResultsTime: 0 ns        │
   │      - HTResize: 0                │
   │      - HTResizeTime: 1.211 us      │
   │      - HashBuckets: 0             │
   │      - HashCollisions: 0          │
   │      - HashFailedProbe: 0         │
   │      - HashFilledBuckets: 0       │
   │      - HashProbe: 0               │
   │      - HashTravelLength: 0        │
   │      - LargestPartitionPercent: 0 │
   │      - MaxPartitionLevel: 0       │
   │      - NumRepartitions: 0         │
   │      - PartitionsCreated: 16      │
   │      - PeakMemoryUsage: 34.02 MB  │
   │      - RowsProcessed: 0           │
   │      - RowsRepartitioned: 0       │
   │      - RowsReturned: 0            │
   │      - RowsReturnedRate: 0        │
   │      - SpilledPartitions: 0       │
   └───────────────────────────────────┘
                     └┐
                      │
┌──────────────────────────────────────────┐
│[7: EXCHANGE_NODE]                        │
│(Active: 4.360 ms, non-child: 46.84)       │
│  - Counters:                             │
│      - BytesReceived: 0.00               │
│      - ConvertRowBatchTime: 387 ns        │
│      - DataArrivalWaitTime: 4.357 ms      │
│      - DeserializeRowBatchTimer: 0 ns     │
│      - FirstBatchArrivalWaitTime: 4.356 ms│
│      - PeakMemoryUsage: 0.00             │
│      - RowsReturned: 0                   │
│      - RowsReturnedRate: 0               │
│      - SendersBlockedTotalTimer(*): 0 ns  │
└──────────────────────────────────────────┘

A high Active percentage combined with a high non-child percentage on a node indicates that the node itself is the bottleneck (not a child operator). Focus your optimization efforts there.

Export query profiles

Important

Exporting query profiles requires SelectDB Core V3.0.4 or later.

Export a profile to a file for offline analysis or sharing.

  1. Enable query profiles.

    SET enable_profile=true;
  2. Run the query you want to profile.

    -- Run your target query. SelectDB generates a profile for it.
    SELECT count(1) FROM test_table LIMIT 10;
  3. Get the profile ID.

    SelectDB 4.0.x versions

    Send an HTTP request to get query information. The Query ID field in the response is the profile ID.

    curl -u'<userName>:<userPassword>' \
      "http://<selectdbAddress>:<httpPort>/rest/v2/manager/query/query_info?is_all_node=true"

    Example response:

    {
      "msg": "success",
      "code": 0,
      "data": {
        "column_names": [
          "Query ID",
          "FE Node",
          "Query User",
          "Execution Database",
          "Sql",
          "Query Type",
          "Start Time",
          "End Time",
          "Execution Duration",
          "Status"
        ],
        "rows": [
          [
            ...
          ]
        ]
      },
      "count": 0
    }

    SelectDB 3.0.x versions

    The following example shows how to obtain SelectDB 3.0:

    SHOW QUERY PROFILE "/";

    Example output:

    +-----------------------------------+-----------+---------------------+---------------------+-------+------------+-------+------------------------------------+-------------------------------+
    | Profile ID                        | Task Type | Start Time          | End Time            | Total | Task State | User  | Default Db                         | Sql Statement                 |
    +-----------------------------------+-----------+---------------------+---------------------+-------+------------+-------+------------------------------------+-------------------------------+
    | b9c9ba063d9d4365-97878361371e757a | QUERY     | 2024-02-07 17:40:04 | 2024-02-07 17:40:04 | 32ms  | EOF        | admin | default_cluster:information_schema | select * from user_privileges |
    | 8800c306137e4072-9bb1ed419f4ac9f2 | QUERY     | 2024-02-07 17:40:00 | 2024-02-07 17:40:00 | 3ms   | ERR        | admin | default_cluster:information_schema | select * from user_priveleges |
    | e2efdd1a996c4de2-ab939ad49b409990 | QUERY     | 2024-02-07 17:39:51 | 2024-02-07 17:39:51 | 13ms  | EOF        | admin |                                    | SELECT DATABASE()             |
    +-----------------------------------+-----------+---------------------+---------------------+-------+------------+-------+------------------------------------+-------------------------------+
  4. Export the profile using its ID.

    Placeholder Description Default
    <userName> Username for accessing the SelectDB instance
    <userPassword> Password for accessing the SelectDB instance
    <selectdbAddress> Endpoint of the SelectDB instance
    <httpPort> HTTP port number 8080
    <query_id> Profile ID from step 3
    curl -u'<userName>:<userPassword>' \
      "http://<selectdbAddress>:<httpPort>/api/profile?query_id=<query_id>"

    To save the output to a file, redirect the response:

    curl -u'<userName>:<userPassword>' \
      "http://<selectdbAddress>:<httpPort>/api/profile?query_id=<query_id>" > res.txt

    Replace the following placeholders with your actual values:

  5. (Optional) Format the exported file for readability.

    sudo sed -r 's/\\n/\n/g' res.txt | sed -r 's/\\t/\t/g' > new.txt

Profile parameter reference

The following tables describe all parameters collected in a query profile.

Fragment

Parameter Description
AverageThreadTokens Number of threads used for fragment execution, excluding thread pool usage.
Buffer Pool PeakReservation Peak memory usage of the buffer pool.
MemoryLimit Memory limit for the query.
PeakMemoryUsage Peak memory usage of an execution unit during the query.
RowsProduced Number of rows whose columns are processed.

BlockMgr

Parameter Description
BlocksCreated Number of blocks created by BlockManager.
BlocksRecycled Number of blocks that are reused.
BytesWritten Total size of data written to disk.
MaxBlockSize Size of a single block.
TotalReadBlockTime Total time consumed to read blocks.

DataStreamSender

Parameter Description
BytesSent Total size of data sent. Calculated as: number of receivers x size of data sent.
IgnoreRows Number of rows filtered out.
LocalBytesSent Size of data sent by the local node and received by the local node during data exchange.
OverallThroughput Total throughput. Calculated as: BytesSent / execution time.
SerializeBatchTime Time consumed to serialize the data being sent.
UncompressedRowBatchSize Size of the sent RowBatch data before compression.

ODBC_TABLE_SINK

Parameter Description
NumSentRows Total number of rows written to an external table.
TupleConvertTime Time consumed to serialize the sent data into INSERT statements.
ResultSendTime Time consumed to write data using the Open Database Connectivity (ODBC) driver.

EXCHANGE_NODE

Parameter Description
BytesReceived Size of data received over the network.
MergeGetNext If sorting is performed on the child node, the exchange node performs unified merging and sorting. This is the total time for that merging and sorting, including MergeGetNextBatch.
MergeGetNextBatch Time consumed by the merge node to fetch data. For single-level merging and sorting, data is fetched from a network queue. For multi-level merging and sorting, data is fetched from a child merger.
ChildMergeGetNext When many senders are sending data, single-thread merging becomes a bottleneck. ApsaraDB for SelectDB starts multiple threads on the child merge node to merge and sort in parallel. This is the cumulative time across all threads.
ChildMergeGetNextBatch Time consumed for the child merge node to fetch data. High values may indicate bottlenecks in the child data sending node.
DataArrivalWaitTime Total time waiting for the sender to send data.
FirstBatchArrivalWaitTime Time to wait for the first batch of data from the sender.
DeserializeRowBatchTimer Time consumed to deserialize network data.
SendersBlockedTotalTimer(*) Total waiting time on the sender side when the DataStreamRecv queue memory is fully occupied.
ConvertRowBatchTime Time consumed to convert received data into RowBatch.
RowsReturned Number of rows received.
RowsReturnedRate Rate at which rows are received.

SORT_NODE

Parameter Description
InMemorySortTime Time consumed for in-memory sorting.
InitialRunsCreated Number of times sorting is initialized. The value is 1 for in-memory sorting.
SortDataSize Total size of data sorted.
MergeGetNext Time consumed by MergeSort to get the next batch from multiple sorted runs. Measured only when data is written to disk.
MergeGetNextBatch Time consumed by MergeSort to extract the batch of the next sorted run. Measured only when data is written to disk.
TotalMergesPerformed Number of times external merging is performed.

AGGREGATION_NODE

Parameter Description
PartitionsCreated Number of partitions the aggregate query is split into.
GetResultsTime Time consumed to get aggregate results from partitions.
HTResizeTime Time consumed to resize the hash table.
HTResize Number of times the hash table is resized.
HashBuckets Number of buckets in the hash table.
HashBucketsWithDuplicate Number of buckets with duplicate nodes in the hash table.
HashCollisions Number of hash collisions in the hash table.
HashDuplicateNodes Number of duplicate nodes belonging to the same bucket in the hash table.
HashFailedProbe Number of failed probe operations on the hash table.
HashFilledBuckets Number of buckets filled with data in the hash table.
HashProbe Number of times the hash table is queried.
HashTravelLength Number of steps moved during a hash table query.

HASH_JOIN_NODE

Parameter Description
ExecOption Hash table construction method for the right child node: synchronous or asynchronous.
BuildBuckets Number of buckets in the hash table.
BuildRows Number of rows in the hash table.
BuildTime Time consumed to build the hash table.
LoadFactor Load factor of the hash table, which indicates the number of non-empty buckets.
ProbeRows Number of rows on which the hash probe operation is performed after traversing the left child nodes.
ProbeTime Time consumed to traverse the left child nodes for hash probe, excluding time to call GetNext on RowBatch from the left child node.
PushDownComputeTime Time consumed to compute predicate pushdown conditions.
PushDownTime Total time consumed for predicate pushdown. Join queries on right child nodes that meet the conditions are converted to IN queries on left child nodes.

CROSS_JOIN_NODE

Parameter Description
ExecOption RowBatchList construction method for the right child node: synchronous or asynchronous.
BuildRows Number of RowBatchList rows (rows in the right child node).
BuildTime Time consumed to build the RowBatchList.
LeftChildRows Number of rows in the left child node.
LeftChildTime Time consumed to traverse left and right child nodes to generate the Cartesian product, excluding time to call GetNext on RowBatch from the left child node.

UNION_NODE

Parameter Description
MaterializeExprsEvaluateTime Time consumed to evaluate type conversion expressions and materialize results when field types differ on both sides of a UNION.

ANALYTIC_EVAL_NODE

Parameter Description
EvaluationTime Total time consumed for analytic function (window function) computation.
GetNewBlockTime Time consumed to allocate a new block during initialization. The block caches the Rows window or an entire partition for analytic function computation.
PinTime Time consumed to allocate a new block or re-read a block that was written to disk back into memory.
UnpinTime Time consumed to write a block's data to disk when the block is no longer needed or when the current operator is under high memory pressure.

OLAP_SCAN_NODE

OLAP_SCAN_NODE handles data scan tasks. It creates one or more OLAP scanners, each responsible for scanning a portion of the data. The profile is organized in three layers that reflect the call hierarchy: OLAP_SCAN_NODE -> OlapScanner -> SegmentIterator.

Some or all predicate conditions are pushed to OLAP_SCAN_NODE. Conditions that can use storage-engine indexes are pushed further to the storage engine for filtering; the remainder are evaluated in OLAP_SCAN_NODE itself.

Note Some metric values differ depending on the storage format (V1 or V2).

The following example shows a typical OLAP_SCAN_NODE profile:

OLAP_SCAN_NODE (id=0):(Active: 1.2ms, % non-child: 0.00%)
  - BytesRead: 265.00 B                 # Size of data read from data files. For example, 10 x 32-bit integers = 40 bytes. This reflects data fully loaded into memory, not raw I/O size.
  - NumDiskAccess: 1                    # Number of disks accessed by the ScanNode.
  - NumScanners: 20                     # Number of scanners created by the ScanNode.
  - PeakMemoryUsage: 0.00               # Peak memory usage during the query.
  - RowsRead: 7                         # Rows returned from the storage engine to the scanner, excluding rows filtered by the scanner.
  - RowsReturned: 7                     # Rows returned from the ScanNode to the parent node.
  - RowsReturnedRate: 6.979K /sec       # RowsReturned / ActiveTime.
  - TabletCount: 20                     # Number of tablets accessed by the ScanNode.
  - TotalReadThroughput: 74.70 KB/sec   # BytesRead divided by total execution time. For I/O-bound queries, this approaches total disk throughput.
  - ScannerBatchWaitTime: 426.886us     # Time for the transfer thread to wait for the scanner thread to return RowBatch data.
  - ScannerWorkerWaitTime: 17.745us     # Time for the scanner thread to wait for available worker threads in the thread pool.
  OlapScanner:
    - BlockConvertTime: 8.941us         # Time to convert a vectorized block to a row-structured RowBlock. The vectorized block is VectorizedRowBatch (V1) or RowBlockV2 (V2).
    - BlockFetchTime: 468.974us         # Time for the rowset reader to get blocks.
    - ReaderInitTime: 5.475ms           # Time for the OLAP scanner to initialize readers. Includes MergeHeap creation time (V1) and the time to generate all iterator levels and read the first block group (V2).
    - RowsDelFiltered: 0                # Rows filtered based on DELETE information in the tablet and rows marked for deletion in the Unique Key model.
    - RowsPushedCondFiltered: 0         # Rows filtered by predicates pushed down from a join operation (e.g., from BuildTable to ProbeTable). This value is approximate; filtering may be skipped if the effect is poor.
    - ScanTime: 39.24us                 # Time for data to travel from ScanNode to the parent node.
    - ShowHintsTime_V1: 0ns             # V1 only: time to read some data for ScanRange splitting.
    SegmentIterator:
      - BitmapIndexFilterTimer: 779ns   # Time to filter data using bitmap indexes.
      - BlockLoadTime: 415.925us        # Time for the segment reader (V1) or segment iterator (V2) to get blocks.
      - BlockSeekCount: 12              # Number of block seeks during segment reading.
      - BlockSeekTime: 222.556us        # Time consumed for block seeks during segment reading.
      - BlocksLoad: 6                   # Number of blocks read.
      - CachedPagesNum: 30              # V2 only: number of pages that hit the page cache.
      - CompressedBytesRead: 0.00       # V1: size of data before decompression read from file. V2: size of pages before compression that did not hit the page cache.
      - DecompressorTimer: 0ns          # Time consumed for data decompression.
      - IOTimer: 0ns                    # Actual I/O time for reading data from the operating system.
      - IndexLoadTime_V1: 0ns           # V1 only: time to read index streams.
      - NumSegmentFiltered: 0           # Number of segments completely filtered out based on column statistics and query conditions.
      - NumSegmentTotal: 6              # Total number of segments involved in the query.
      - RawRowsRead: 7                  # Source data rows read from the storage engine after all preceding filtering.
      - RowsBitmapIndexFiltered: 0      # V2 only: rows filtered using bitmap indexes.
      - RowsBloomFilterFiltered: 0      # V2 only: rows filtered using BloomFilter indexes.
      - RowsKeyRangeFiltered: 0         # V2 only: rows filtered using SortkeyIndex indexes.
      - RowsStatsFiltered: 0            # V2: rows filtered using ZoneMap indexes, including delete conditions. V1: includes rows filtered using BloomFilter indexes.
      - RowsConditionsFiltered: 0       # V2 only: rows filtered by all column indexes combined.
      - RowsVectorPredFiltered: 0       # Rows filtered by vectorized condition operations.
      - TotalPagesNum: 30               # V2 only: total number of pages read.
      - UncompressedBytesRead: 0.00     # V1: size of data after decompression (or file size if not compressed). V2: size of pages that did not hit the page cache after decompression (or page size if not compressed).
      - VectorPredEvalTime: 0ns         # Time to perform filtering by vectorized condition.
      - ShortPredEvalTime: 0ns          # Time to perform filtering by short-circuit predicate.
      - PredColumnReadTime: 0ns         # Time consumed for predicate column reading.
      - LazyReadTime: 0ns               # Time consumed to read non-predicate columns.
      - OutputColumnTime: 0ns           # Time consumed for column materialization.

How filtering metrics relate to each other

The row-count metrics in an OLAP_SCAN_NODE profile reveal how effectively predicates and indexes reduce the data volume at each stage. The following describes the V2-format filtering sequence. V1-format metrics have slightly different meanings.

Initialization phase

Filtering happens in this order:

  1. SortkeyIndex — If the query has key ranges (prefix key ranges), data is filtered first. Rows eliminated are recorded in RowsKeyRangeFiltered.

  2. Bitmap indexes — Accurate column filtering is applied for conditions that match bitmap indexes. Rows eliminated are recorded in RowsBitmapIndexFiltered.

  3. BloomFilter indexes — Equality conditions (=, IN, IS) are filtered using BloomFilter indexes. Rows eliminated are recorded in RowsBloomFilterFiltered.

    Note RowsBloomFilterFiltered is calculated as total segment rows minus remaining rows after BloomFilter filtering. It may overlap with rows already eliminated by bitmap indexes.
  4. ZoneMap indexes — Data is filtered based on query and delete conditions. Rows eliminated are recorded in RowsStatsFiltered.

  5. RowsConditionsFiltered is the total rows filtered by all indexes combined, including RowsBloomFilterFiltered and RowsStatsFiltered.

Next phase

After initialization, the remaining data passes through additional stages:

  • RowsDelFiltered: Rows eliminated by delete conditions. These are also counted in RowsStatsFiltered.

  • RawRowsRead: Rows remaining after all preceding filtering.

  • RowsRead: Rows returned from the storage engine to the scanner. This is typically less than RawRowsRead because the storage engine may aggregate rows. A large gap between RawRowsRead and RowsRead indicates heavy aggregation, which can be a performance concern.

  • RowsReturned: Rows finally returned by ScanNode to the parent node. This is typically less than RowsRead because predicates not pushed to the storage engine are applied at the scanner level. A large gap between RowsRead and RowsReturned means many high-selectivity predicates are not reaching the storage engine, where filtering is more efficient.

Together, the Rows***Filtered metrics let you assess whether query predicates are being pushed to the storage engine and how effective each index type is.

Other metrics

  • Metrics under OlapScanner — such as IOTimer and BlockFetchTime — are cumulative across all scanner threads. Because scanner threads read data asynchronously, these accumulated values reflect total scanner working time, not ScanNode wall-clock time. The Active percentage records the actual proportion of time ScanNode contributed to the overall execution plan. It is normal to see IOTimer in the tens of seconds while Active is only a few seconds, for two reasons:

    • IOTimer accumulates across many scanners running in parallel.

    • If the parent node is slow (for example, 100 seconds), ScanNode (10 seconds) finishes early and prepares data in the background. The parent then retrieves prepared data, so ScanNode's Active time is short.

  • NumScanners is the number of tasks the scanner submits to the thread pool, which is managed by RuntimeState. The doris_scanner_thread_pool_thread_num and doris_scanner_thread_pool_queue_size parameters control thread pool size and queue length. If the number of threads is excessive or insufficient, the query efficiency is affected. You can also divide some metrics by the number of threads to roughly estimate the time consumed by each thread.

  • TabletCount is the number of tablets to scan. A high tablet count means more random reads and data merge operations.

  • UncompressedBytesRead reflects the volume of data read. A high value may indicate heavy I/O.

  • Compare CachedPagesNum against TotalPagesNum to assess page cache hit rate. A higher hit rate reduces both I/O and decompression overhead.

Buffer pool

Parameter Description
AllocTime Time consumed for memory allocation.
CumulativeAllocationBytes Cumulative size of memory allocated.
CumulativeAllocations Cumulative number of memory allocations.
PeakReservation Peak reservation size.
PeakUnpinnedBytes Size of memory data that is unpinned.
PeakUsedReservation Memory usage of reservations.
ReservationLimit Reservation limit in the buffer pool.

What's next