EMR Serverless StarRocks provides monitoring and alerting features that allow you to view the status and key performance metrics of EMR Serverless StarRocks instances in real time. This helps you identify issues efficiently.
Limits
Only monitoring data from the previous 30 days is available.
Precautions
Some metrics are related to the root account, such as the Query metric. The root account is a dedicated account used to manage StarRocks instances. Users cannot view or use the root account.
Procedure
Go to the homepage of E-MapReduce (EMR) Serverless StarRocks.
Log on to the EMR console.
In the left-side navigation pane, choose .
In the top navigation bar, select a region based on your business requirements.
Click the ID of the instance.
Click the Monitoring And Alerting tab.
On the Monitoring And Alerting tab, configure the Resource Group and Select Time parameters to view specific metrics.
Valid values of the Resource Group parameter:
default_wg: the default resource group used by query tasks.
default_mv_wg: the default resource group used by materialized views.
Metrics
Instance
Overview
Metric
Description
FE Availability
The availability of frontend nodes (FEs).
BE/CN Availability
The availability of backend nodes (BEs) or compute nodes (CNs).
FE Count
The number of FEs.
BE or CN Count
The number of BEs or CNs.
Disk Usage (Avg)
The average disk usage of all BEs in the StarRocks instance.
Storage
The actual storage space used by StarRocks. This metric is available only for compute-storage separation scenarios. The value of the metric is updated with a delay of about one hour.
Compaction Score (Max)
The highest Compaction Score of each FE. This parameter is available only for StarRocks shared-nothing instances.
FE Detection
The detection status of FEs. EMR Serverless StarRocks detects the status of FEs by sending HTTP requests. The value On indicates that the detection result is normal, and the value Off indicates that the detection fails.
BE/CN Node Status
The status of BE/CN nodes reported by FE. If the number of Alive nodes is abnormal, you can use the SHOW COMPUTE NODES command to view node details.
Query
Metric
Description
Queries per minute
The number of query tasks per minute.
Number of query faults per minute
The number of query errors per minute.
Query latency p99
The query latency.
Slow Query
The number of slow queries per minute.
FE
Metric
Description
FE transaction resolution statistics
The statistics on the transaction status of each FE or all FEs per minute.
FE Disk Usage
The data disk used by each FE or all FEs. The metric value is updated every hour.
FE CPU
Metric
Description
CPU Util
The CPU utilization of each FE.
FE CPU Load 1min
The average CPU load of each FE in the previous minute.
FE Mem
Metric
Description
JVM Heap Usage
The ratio of used memory to maximum memory in the JVM heap.
JVM Young GC
The number of times and the time when garbage collection is performed in the young generation space.
JVM Heap
The usage of JVM heap memory.
JVM Old GC
The number of times and the time when garbage collection is performed in the old generation space of a Java virtual machine (JVM).
FE Net
Metric
Description
Network Receive Rate
The amount of data that is received per second.
Net Out
The amount of data that is sent per second.
FE Connections
The number of active connections to each FE.
Resource Group
Metric
Description
Query
The number of query tasks that run on the selected resource group per minute.
Query Latency p99
The query latency.
Query (Resource Group)
The number of query tasks that run on all the resource groups per minute.
Materialized View
Metric
Description
MV Status
The status of materialized views. Valid values: 0 and 1. The value 0 indicates that the materialized view is active, and the value 1 indicates that the materialized view is inactive.
MV Refresh Duration p99
The amount of time required to refresh materialized views.
MV Jobs (Total)
The total number of refresh tasks.
MV Jobs (Successful)
The number of successful refresh tasks.
Purge job failed
The number of failed refresh tasks.
Purge Job Empty
The number of refresh tasks that are canceled because no new data is available.
MV Jobs (Running)
The number of refresh tasks that are in progress.
Purge job pending
The number of refresh tasks that wait to run.
MV Hit Count
The number of queries that are rewritten on each materialized view, excluding the queries that are directly run on materialized views.
MV Query Count
The number of queries that are rewritten on each materialized view, including the queries that are directly run on materialized views.
Tables
Metric
Description
DataBase Tables
The distribution of tables across databases in the instance.
Table Count
The number of tables in the instance.
Tablet Count
The number of tablets in the instance.
Table Scan Bytes
The total amount of data scanned from non-system tables. Unit: bytes.
Table Load Bytes
The total amount of data imported to non-system tables. Unit: bytes.
Others
Metric
Description
Transfer Progress
The progress of table migration. This metric is applicable only to cluster migration scenarios.
Compute group
Overview
Metric
Description
CPU Util (Avg)
The average CPU utilization of all BEs or CNs.
Mem Util (Avg)
The average memory usage of all BEs or CNs.
Disk Usage (Max)
The maximum usage of multiple data disks of all BEs or CNs.
BE/CN Node Status
The detection status of BEs or CNs. EMR Serverless StarRocks detects the status of BEs or CNs by sending HTTP requests. The value On indicates that the detection result is normal, and the value Off indicates that the detection fails.
Compaction
Metric
Description
Maximum Compaction Score
The highest compaction score of the FEs.
Mem (Compaction)
The memory used by compaction tasks.
Compaction Bytes
The amount of data that is compacted per minute during the base compaction and cumulative compaction process.
Compaction Rowsets
The number of rowsets that are compacted per minute during the base compaction and cumulative compaction process.
BE/CN
Metric
Description
Query Scan Bytes
The amount of data scanned during the queries on each BE.
Query Scan Rows
The number of rows scanned during the queries on each BE.
Request Statistics
The total number of requests on specific nodes, such as the requests to create tables, publish versions, and clone tables.
Engine Requests (Failed)
The number of failed requests on BEs, such as the requests to create tables, publish versions, and clone tables.
Transaction Requests
The statistics of transaction phases per minute.
BE/CN CPU
Metric
Description
CPU Util
The CPU utilization.
BE/CN CPU Load 1min
The average CPU load of specific nodes in the previous minute.
BE/CN Mem
Metric
Description
Memory utilization
Node memory utilization includes BE/CN process memory, memory used by UDFs, reserved memory for BE/CN, etc.
Process Mem (BE/CN)
Memory usage of the BE/CN process.
Process memory
The process memory depends on the memory items collected by the kernel. Memory items that are not fully collected and fall outside the collection scope are labeled as "Other". For more memory information, see Memory_management.
Node Mem
Divided into three components: pod available memory (Pod Avail Mem), process memory (Process Mem), and non-process memory (Non Process Mem).
Node mem (BE/CN)
BE/CN node memory includes: total node memory, 81% node memory threshold, node memory usage, and process memory usage. The upper limit of BE/CN available memory is jointly restricted by the 0.9 coefficient in the StarRocks code and the mem_limit configuration parameter (default: 0.9). By default, the actual available memory for BE/CN is 81% of total node memory.
BE/CN Disk
Metric
Description
Disk usage
The ratio of used disk space to total capacity, including Data, Trash, etc.
Used disk space
The absolute capacity of used disk space.
Disk Usage (Data)
The disk space occupied by data files on specific nodes.
Disk Usage (Data)
The disk usage of data files on specific nodes.
BE/CN Disk IO
Metric
Description
Read Traffic (SUM)
The read traffic of all disks per second on specific nodes.
Disk IO (Write)
The write traffic of all disks per second on specific nodes.
Disk IOPS (Read)
The number of read operations on all disks per second on specific nodes.
Disk IOPS (Write)
The number of write operations on all disks per second on specific nodes.
Disk IO Latency (Read)
The average read latency of all disks.
Disk IO Latency (Write)
The average write latency of all disks.
IO Util (Max)
The percentage of time that an I/O device, such as a disk or a network interface, is busy over a period of time.
BE/CN Net
Metric
Description
Net (In)
The amount of data that is received per second.
Net (Out)
The amount of data that is sent per second.
TCP connection count
The number of TCP connections.
Cache
NoteThe metrics described in the following table are available only for compute-storage separation scenarios.
Metric
Description
FSLIB Cache Hit Ratio
The cache hit ratio per minute.
FSLIB Cache Hit/Miss
The number of cache hits per minute.
Storage
NoteThe metrics described in the following table are available only for StarRocks shared-data instances.
Metric
Description
Storage
The amount of fully managed data. Unit: GiB.
Storage IO
The read and write traffic of fully managed data.
Resource Group
Metric
Description
Resource Group Use CPU Cores
The number of CPU cores used by a specific resource group. The value is an estimated average value within two consecutive sampling periods. This metric is available for StarRocks instances of V3.1.4 and later.
Resource Group CPU Usage (v2.x)
The ratio of the CPU time consumed by a specific resource group to the total CPU time.
Resource Group Mem Usage
The memory used by a specific resource group.
Running tasks
The number of query tasks that are running on a specific resource group.
Resource Group Concurrency Overflow
The number of queries that reach the concurrency limit in a specific resource group.
Number of times the large query limit is triggered
The number of times that the large query limit is reached in a specific resource group.
Others
Metric
Description
Page Cache Hit Rate
The number of requests that hit the page cache.
Publish Version Latency P99
The amount of time that is consumed to publish a version when data is written to StarRocks.
Storage
Data Storage
Metric
Description
Storage
The amount of fully managed data. Unit: GiB. This metric is available only for StarRocks shared-data instances. The value of the metric is updated with a delay of about one hour.
Storage IO
The read and write traffic of fully managed data. This metric is available only for StarRocks shared-data instances.
Disk Usage
Compute-storage separation
Metric
Description
Disk usage
The disk usage.
Used disk space
The amount of disk space used.
In-memory computing
Metric
Description
Free space percentage
The percentage of the available space of specific nodes.
Disk Usage (Avail)
The available disk space of specific nodes.
Disk Usage (Data)
The disk space occupied by data files on specific nodes.
Disk Usage (Data)
The disk usage of data files on specific nodes.
Disk Usage (Sum)
The usage of the available, cache, and data files on the disk.
Disk Usage (Sum)
Disk IO
Metric
Description
Disk IO (Read)
The read traffic of all disks per second on specific nodes.
Disk IO (Write)
The write traffic of all disks per second on specific nodes.
Disk IOPS (Read)
The number of read operations on all disks per second on specific nodes.
Disk IOPS (Write)
The number of write operations on all disks per second on specific nodes.
Disk IO Latency (Read)
The average read latency of all disks.
Disk IO Latency (Write)
The average write latency of all disks.
IO Util (Max)
The percentage of time that an I/O device, such as a disk or a network interface, is busy over a period of time.