Analyze queries by using query profile
A query profile records the execution details of every node involved in a query. Use query profiles in StarRocks Manager to visualize query execution and quickly pinpoint performance bottlenecks.
Enable query profile
Set the enable_profile session variable to true:
SET enable_profile = true;
To enable query profile globally without affecting query performance, your StarRocks kernel version must be 2.5.13 or later (2.x series) or 3.1.5 or later (3.x series). If your version is earlier, upgrade before enabling; otherwise, query performance may be affected.
Query profile structure
A query profile has five hierarchical levels:
| Level | Description |
|---|---|
| Fragment | An execution tree. A query consists of one or more fragments. |
| Fragment instance | Each fragment can have multiple instances, each running on a different compute node. |
| Pipeline | An execution chain of connected operators. A fragment instance splits into multiple pipelines. |
| Pipeline driver | An instance of a pipeline. A pipeline can have multiple pipeline drivers, each running on a separate CPU core to maximize core utilization. |
| Operator | A computational unit. A pipeline driver consists of multiple operators. |
Analyze a query profile
StarRocks Manager visualizes query profiles in a tree structure. Tree nodes represent aggregated operators.
-
Click an operator to view its details in the tabs on the right side of the page.
-
If no operator is selected, the right side shows an overview of the query.
Tip: The system highlights the top three time-consuming operators with progressively darker colors. Start your analysis there — click each highlighted operator to examine its CoreMetrics and identify the bottleneck. Use the Zoom In and Zoom Out buttons, or scroll your mouse wheel, to navigate the profile tree.
Analysis workflow
Use this sequence to move from a high-level overview to a specific bottleneck:
-
Check the execution overview. On the Execution Details tab, look at the time breakdown to understand where the query spent most of its time:
-
CPU-bound: High
CumulativeCpuTimerelative toExecutionWallTime -
I/O-bound: High
I/Otime or largeDiskReadBytes/RemoteReadBytes -
Network-bound: Large
Bytes sent over network -
Scheduling overhead: High
CumulativeWaitTime
-
-
Locate the top operators. In the profile tree, the three darkest-highlighted operators are the most expensive. Click each one.
-
Inspect operator details. In the right-side tabs:
-
CoreMetrics — the most relevant metrics for this operator type
-
NodeMetrics — all metrics for the operator
-
Pipeline — scheduling metrics (usually not the focus)
-
Execution overview metrics
The Execution Details tab displays query-level summary metrics.
Total
| Metric | Description |
|---|---|
| Total | Total time consumed by the query, including planning, execution, and profiling phases. |
| PlannerTotalTime | Time spent planning the query, including SQL parsing, analysis, optimization, and execution plan generation. |
| ExecutionWallTime | Wall-clock time for query execution and backend (BE) processing. |
| CollectProfileTime | Time spent collecting profile metrics from BE nodes. |
Execution time
| Metric | Description |
|---|---|
| ExecutionWallTime | Total wall-clock time for query execution. |
| CumulativeCpuTime | Sum of CPU time across all BE nodes. Because this sums across concurrent processes, it exceeds wall-clock time. |
| CpuUtilization | CumulativeCpuTime divided by ExecutionWallTime. Represents the average number of CPU cores used. |
| CumulativeWaitTime | Cumulative pipeline wait time, including scheduling wait time (ScheduleTime) and blocking time (PendingTime). Examples: empty input queues, full output queues, unmet dependencies. |
| OperatorCumulativeTime | Sum of execution times for all operators, covering I/O, network, and computation. |
| I/O | Cumulative I/O time for all SCAN operators, including local disk reads, remote reads, and cache access. |
I/O
| Metric | Description |
|---|---|
| RawRowsRead | Total rows scanned by all SCAN nodes. |
| DiskReadBytes | Total compressed data read by all SCAN nodes. |
| LocalDiskReadBytes | Total compressed data read from local cache by all CONNECTOR_SCAN nodes. Applies only to shared-data instances. |
| RemoteReadBytes | Total compressed data read from Object Storage Service (OSS) by all CONNECTOR_SCAN nodes. Applies only to shared-data instances. |
| ResultRows | Total output records generated by all SCAN nodes. |
| ResultBytes | Total data read by all SCAN nodes. |
Network
| Metric | Description |
|---|---|
| Bytes sent over network | Total bytes transmitted by all Exchange nodes (BytesSent). |
Operator metrics
Query-level metrics
Summary metrics
| Metric | Description |
|---|---|
| Total | Total time consumed by the query, including planning, execution, and profiling phases. |
| QueryCpuCost | Cumulative CPU time used by the query. Sums CPU time across all concurrent processes, so it exceeds actual wall-clock time. |
| QueryMemCost | Total memory consumed by the query. |
| Variables | Variables used in the query. |
Pipeline-level metrics
| Metric | Description |
|---|---|
| ActiveTime | Time during which a driver runs. |
| DriverTotalTime | Total time consumed by a driver. |
| PendingTime | Time a driver waits when input or prerequisites are not met. |
General metrics
| Metric | Description |
|---|---|
| OperatorTotalTime | Total time consumed by the operator. |
| PushRowNum | Cumulative output rows generated by the operator. |
| PullRowNum | Cumulative input rows processed by the operator. |
| PullChunkNum | Cumulative input chunks processed by the operator. |
| PushChunkNum | Cumulative output chunks generated by the operator. |
| PeakMemoryUsage | Maximum memory used by the operator. |
OLAP Scan operator
| Metric | Description |
|---|---|
| Table | Name of the table. |
| ScanTime | Cumulative scan time. Scan operations run in an asynchronous I/O thread pool. |
| TabletCount | Number of tablets. |
| PushdownPredicates | Number of predicates pushed down to storage. |
| BytesRead | Size of data read. |
| CompressedBytesRead | Size of compressed data read. |
| IOTime | Cumulative I/O time. |
| BitmapIndexFilterRows | Rows filtered by a bitmap index. |
| BloomFilterFilterRows | Rows filtered by a Bloom filter. |
| SegmentRuntimeZoneMapFilterRows | Rows filtered by a runtime zone map. |
| SegmentZoneMapFilterRows | Rows filtered by a zone map. |
| ShortKeyFilterRows | Rows filtered by a short key. |
| ZoneMapIndexFilterRows | Rows filtered by a zone map index. |
Connector Scan operator
The following metrics apply only to instances that use storage-compute separation.
| Metric | Description |
|---|---|
| CompressedBytesReadLocalDisk | Compressed data read from the local cache of a compute node. |
| CompressedBytesReadRemote | Total compressed data read from OSS. |
| IOTimeLocalDisk | I/O time to read data from the local cache. |
| IOTimeRemote | I/O time to read data from OSS. |
Exchange operator — Sink
| Metric | Description |
|---|---|
| PartType | Data distribution mode. Valid values: UNPARTITIONED, RANDOM, HASH_PARTITIONED, BUCKET_SHUFFLE_HASH_PARTITIONED. |
| BytesSent | Size of data sent. |
| OverallThroughput | Throughput rate. |
| NetworkTime | Time to transmit a data packet, excluding post-reception processing time. |
| WaitTime | Wait time caused by a full sender queue. |
| NetworkBandwidth | Network bandwidth. |
Exchange operator — Source
| Metric | Description |
|---|---|
| SenderWaitLockTime | Time spent waiting for a lock. |
| BytesReceived | Size of data received. |
| DecompressChunkTime | Time spent decompressing data. |
| DeserializeChunkTime | Time spent deserializing data. |
| SenderTotalTime | Total time spent sending data. |
Aggregate operator
| Metric | Description |
|---|---|
| GroupingKeys | The GROUP BY columns. |
| AggregateFunctions | The aggregate functions. |
| AggComputeTime | Time spent computing aggregate functions. |
| ExprComputeTime | Time spent computing expressions. |
| HashTableSize | Size of the hash table. |
Join operator — Probe
| Metric | Description |
|---|---|
| DistributionMode | Data distribution mode. |
| JoinType | Join type. |
| OtherJoinConjunctEvaluateTime | Time spent evaluating other join conjuncts. |
| ProbeConjunctEvaluateTime | Time spent evaluating probe conjuncts. |
| SearchHashTableTime | Time spent querying the hash table. |
| WhereConjunctEvaluateTime | Time spent evaluating WHERE conjuncts. |
Join operator — Build
| Metric | Description |
|---|---|
| JoinPredicates | The join predicates. |
| JoinType | Join type. |
| BuildBuckets | Number of buckets in the hash table. |
| BuildHashTableTime | Time spent building the hash table. |
| RuntimeFilterBuildTime | Time spent building runtime filters. |
| RuntimeFilterNum | Number of runtime filters. |
| DistributionMode | Data distribution mode. |
Window Function operator
| Metric | Description |
|---|---|
| ComputeTime | Time spent computing window functions. |
| PartitionKeys | The partition key column. |
| AggregateFunctions | The aggregate functions. |
Sort operator
| Metric | Description |
|---|---|
| SortKeys | The sort key. |
| SortType | Sorting scope: all results, or only the top N results. |
| MergingTime | Time spent merging data. |
| SortingTime | Time spent sorting data. |
Table Function operator
| Metric | Description |
|---|---|
| TableFunctionExecTime | Time spent computing table functions. |
| TableFunctionExecCount | Number of times the table function is executed. |
Project operator
| Metric | Description |
|---|---|
| ExprComputeTime | Time spent computing expressions. |
| CommonSubExprComputeTime | Time spent computing common sub-expressions. |
Local Exchange operator
| Metric | Description |
|---|---|
| Type | Local Exchange type. Valid values: Passthrough, Partition, Broadcast. |
| ShuffleNum | Number of shuffles. Applies only when Type is Partition. |