EMR Serverless StarRocks Business Insight provides multi-dimensional performance insights, including SQL queries, data tables, import tasks, and cache analysis. It helps you improve query efficiency by optimizing parameterized SQL statements and cache. - E-MapReduce

This topic describes the Business Insight feature of E-MapReduce (EMR) Serverless StarRocks and provides examples of its common scenarios. Business Insight provides data from the previous day (T+1) and includes Query Insight, Import Insight, Table Insight, Compaction Insight, and Cache Insight.

View Business Insight

Go to the EMR Serverless StarRocks instance list page.
1. Log on to the E-MapReduce console.
2. In the navigation pane on the left, choose EMR Serverless > StarRocks.
3. In the top menu bar, select the required region.
Click the ID of the target instance.
Click the Business Insight tab.
On the Business Insight page, you can view Query Insight, Import Insight, Table Insight, Compaction Insight, and Cache Insight.

Query insight

This page displays the Query Latency, DB Lock, SQL Query Analysis, and Parameterized SQL Analysis sections. You can select a specific date, SQL type, and database to view the top SQL metrics.

The following SQL types are supported:

DML: Data Manipulation Language (DML) statements are used for data query and modification operations. Examples include SELECT, UPDATE, and DELETE statements.
DDL: Data Definition Language (DDL) statements are used to define and modify data structures. Examples include CREATE and ALTER statements.
Other: SQL commands other than DML and DDL statements. Examples include auxiliary commands such as SHOW.

Query latency

Query latency data is statistically analyzed based on daily audit data.

Query Latency P99 is an important performance monitoring metric that measures the distribution of system response times. Specifically, P99 indicates the time within which 99% of requests receive a response. This metric is crucial for evaluating service quality and user experience.

Monitoring P99 latency provides insight into the service response speed that most users experience. If P99 latency is too high, you may need to add computing resources or optimize query logic to improve processing efficiency.

Displaying query latency data for the last seven days helps you identify potential performance risks in an instance early. This helps prevent major impacts on your services.

DB Lock

The DB Lock feature monitors and analyzes lock contention that occurs during database access. This feature is especially useful when multiple transactions attempt to access or modify the same data resource simultaneously. To ensure data consistency and integrity, the database uses a lock mechanism to control concurrent access to specific data rows or tables. If a transaction waits a long time to acquire a lock, it can affect the overall system response speed and service quality. DB Lock data helps you analyze these performance issues.

SQL query analysis

This section ranks SQL statements executed by StarRocks based on dimensions such as query time, CPU consumption, and memory consumption. This helps you identify the top SQL statements and view their execution metrics. You can use these metrics to optimize potential performance issues. For example, in the Slow SQL Top10 section, you can view the 10 SQL statements with the longest execution time on a specified date. Then, you can use the details provided in the Profile Query Analysis section to optimize these slow SQL statements. For more information, see Introduction to Query Profile.

The following table describes the main fields in the report.

Field Name	Description
Query ID	The unique identifier generated for each SQL execution in StarRocks. A new ID is generated for each SQL execution.
User	The StarRocks database user who executes the SQL statement.
Query Time	The time consumed during SQL execution. Unit: ms.
CPU Execution Time	The CPU time consumed during SQL execution. This is the sum of the CPU time of all CPU cores involved in the execution. Unit: ns.
Memory Usage	The memory consumed during SQL execution. Unit: bytes.
Scan Bytes	The amount of data accessed during SQL execution. Unit: bytes.
Scan Rows	The number of data rows accessed during SQL execution.
Return Rows	The number of result rows returned after SQL execution.
SQL Text	The text of the executed SQL statement.

Parameterized SQL analysis

A parameterized SQL statement is created by replacing constants in an SQL statement with a ? parameter. This process retains the original syntax structure, removes comments, and adjusts spaces to generate a new SQL statement. Parameterization maps the syntax of multiple original SQL statements to a single, identical parameterized SQL statement. This helps you perform a comprehensive analysis of similar SQL types.

For example, the following two SQL statements belong to the same SQL class after they are parameterized.

Original SQL

SELECT * FROM orders WHERE customer_id=10 AND quantity>20

SELECT * FROM orders WHERE customer_id=20 AND quantity>100

Parameterized SQL

SELECT * FROM orders WHERE customer_id=? AND quantity>?

Parameterized SQL analysis ranks SQL statements based on dimensions such as execution frequency, total execution time, execution time variance, total CPU resource consumption, total memory resource consumption, and execution failure frequency. This helps you identify the top SQL statements and view their related metrics.

Parameterized SQL analysis lets you:

View the overall SQL execution status of the StarRocks database.
Optimize SQL statements that are executed frequently, have long execution times, or consume large amounts of CPU and memory. This provides greater optimization benefits.

Use the Coefficient of Variation of Query Time to measure the stability of SQL execution time and identify potential performance issues. For example, a longer execution time for similar SQL statements may indicate data skew or pending tasks caused by insufficient resources.

The following describes the relevant fields.

Field	Description
Parameterized SQL ID	The hash value of the parameterized SQL statement, used to mark the statement.
Coefficient of Variation of Query Time	The ratio of the standard deviation of the SQL query execution time to its average value. A larger coefficient of variation usually means a greater difference in the execution time of similar SQL statements.
Executions	The total number of executions for the parameterized SQL statement.
Parameterized SQL Text	The text of the parameterized SQL statement.

Find the causes of SQL failures using the Execution Failure Count to identify potential issues.

The following table describes the fields.

Field	Description
Parameterized SQL ID	The hash value of the parameterized SQL statement, used to mark the statement.
Execution Failure Count	The number of times the parameterized SQL statement failed to execute.
Executions	The total number of executions for the parameterized SQL statement.
Parameterized SQL Text	The text of the parameterized SQL statement.

Import insight

This page displays statistics and analysis of import tasks from multiple perspectives.

Note

Currently, the system only supports statistics and analysis for import tasks on shared-nothing instances.

Too many versions scenario

Data with a high import frequency is identified based on a statistical analysis of "too many versions" logs. When the Compaction Score exceeds 1000, the system reports an error, and StarRocks returns a "Too many versions" error. To resolve this issue, you can reduce the concurrency and frequency of imports.

Top analysis

Analysis of potential small files in top hot import tables

For table-level data imports, the system analyzes the data files generated by all import tasks for each table. This analysis evaluates the severity of potential small file issues and calculates an impact score. Based on this score, the system selects the top 20 tables most affected by small file issues, sorted from high to low. The presence of small files can degrade query performance and reduce compaction efficiency. To address this issue, you can:

Select the number of partitions and buckets based on the actual data volume of the table to effectively avoid small file issues.
Increase the size of batch processing. This can improve overall data processing throughput and reduce the number of small files in Object Storage Service.
Compaction can consolidate data files and improve system performance, but it consumes system resources during operation. Therefore, if resources are limited, you can adjust the compaction frequency to balance resource usage efficiency.

The following algorithm is used to evaluate the small file impact score:

Primary key tables: The formula is Total number of files written ÷ Average size of files written. If the average file size is small and the number of files is large, the potential impact of small files on these tables is greater.
Non-primary key tables: The formula is Total number of files written ÷ Average time required to write a single file. When the average write time is short and the number of files is large, the potential impact of small files on these tables is greater.

Using the preceding algorithm, you can quantify the small file issue for tables. This allows for targeted optimization of the top 20 tables to improve overall cluster performance.

The following describes the key fields.

Field	Description
Tables	Records all related table information that an import task might write to simultaneously, represented as a collection of multiple tables.
Table Type	Used to distinguish between different table types, mainly divided into primary key tables and non-primary key tables. Non-primary key tables include details tables, aggregation tables, and update tables.
Small File Impact Score	The impact score of potential small file issues, evaluated by an algorithm. A higher score indicates a more severe potential small file problem.
Number of Updated Buckets	The total number of tablets that need to be updated during the import task.
Number of Files Written	The total number of segment files written.
Average Write File Size	The total written data size divided by the total number of files written. This represents the average amount of data written per file.
Average Write Time	The total file write time divided by the total number of files. This reflects the average time required for each file write operation.

Analysis of top hot import tables

The top 20 tables are selected by ranking the number of import tasks per table. The import tasks for these tables are executed most frequently and involve the most data import transactions.

Analysis of hot import nodes

You can analyze data balance using statistical data from each import node. For example, you can use the Total Write Size metric to analyze whether writes are balanced across various brokers.

Data table insights

This page displays metrics to help you optimize your data tables, such as query frequency, SQL query patterns, data distribution balance, and tables that have not been accessed in the last 90 days. The main metrics are described in the table below.

Metric	Description
SQL Executions	The total number of SQL statement executions that involve this table. Typically, a higher number of executions means the table design requires more careful optimization to improve the performance of the StarRocks instance.
Associated Parameterized SQL Count	The number of parameterized SQL statements associated with the table. Analyze the SQL query patterns to optimize the table design. You can also identify commonalities among different query types to determine whether to create materialized views to accelerate data queries on this table.
Coefficient of Variation of Tablet Data Size	The coefficient of variation (CV) of tablet data size within a partition. This metric indicates the balance of the tablet data distribution. It is calculated by dividing the standard deviation of tablet data sizes by their average value within the same partition. A higher CV suggests a greater chance of data skew in the partition.

Cache insights

This page displays cache performance statistics for shared-data instances. It analyzes the cache from the table, parameterized SQL, and SQL statement dimensions. You can view the top metrics for each dimension by day.

Note

This feature is not currently available for Storage-Compute Integrated Edition instances.

Usage notes

A higher cache hit ratio improves the query performance of an instance. This page analyzes the cache status of the instance based on Query Profile data. A Query Profile records the execution information of all operators in a query. You can use the Query Profile to retrieve cache-related metrics for the query, which allows for an in-depth analysis of cache usage.

In a Query Profile, each scan of a table, view, or materialized view corresponds to a CONNECTOR_SCAN operator. The Query Profile provides various metrics for this operator. The following table lists some cache-related metrics.

Metric	Description	Internal table	Appearance
Cache hit volume	Bytes read from the cache.	CompressedBytesReadLocalDisk	dataCacheReadBytes
Cache miss volume	Bytes read from remote storage.	CompressedBytesReadRemote	FSIOBytesRead
Local IO count	Number of IO operations to read from the local cache.	IOCountLocalDisk	dataCacheReadCounter
Remote IO count	Number of IO operations to read from remote storage.	IOCountRemote	FSIOCounter
Local IO time	Time spent reading from the local cache.	IOTimeLocalDisk	dataCacheReadTimer
Remote IO time	Time spent reading from remote storage.	IOTimeRemote	FSIOTime

Based on the preceding metrics, you can calculate the following three cache-related metrics:

Cache hit ratio = Cache hit volume / (Cache hit volume + Cache miss volume)
Local IO time percentage = Local IO time / (Local IO time + Remote IO time)
Local IO count percentage = Local IO count / (Local IO count + Remote IO count)

The analysis process extracts these metrics for all CONNECTOR_SCAN operators from a day's worth of Query Profile records. It then groups the data by three dimensions and calculates metrics for each group, such as average cache hit volume and cache hit ratio. This lets you analyze the cache status from different dimensions. The following table provides example scenarios for each dimension.

Dimension	Scenarios
Table dimension	Find tables with high access frequency, low cache hit ratio, and high cache miss volume. Optimizing their cache status with methods such as cache prefetch helps improve query performance.
Parameterized SQL dimension	Analyze the cache hit status of parameterized SQL statements that have the most CONNECTOR_SCAN operators or the longest average query time. Prioritize optimizing these SQL statements or prefetching the cache for the tables they access to effectively improve query performance.
Single SQL dimension	Perform in-depth analysis on SQL statements with long query times, low cache hit ratios, or high cache miss counts. Optimizing the SQL statements or prefetching the cache for the tables they access helps improve query performance.

Table dimension

This section displays data tables sorted by metrics such as access count, cache hit ratio, cache hit volume, and cache miss volume. Analyzing the data from the table dimension provides the following cache-related information:

You can view the cache hit status of the top-ranked tables by access count. For example, in the figure below, the table with the highest access count (6,411) has a cache hit ratio of only 24.2%. Increasing the cache hit ratio for this table can improve query efficiency. To optimize cache usage, you can prefetch the cache for tables that have a high access frequency and a low cache hit ratio.
You can view tables with a low cache hit ratio or a high cache miss volume. You can use cache prefetching to improve their cache status, which in turn improves the performance of the corresponding queries.

In practice, you usually need to consider multiple metrics when deciding whether to prefetch the cache for a table. For example, if a table appears in the "Top 20 by Access Frequency", "Top 20 by Low Cache Hit Ratio", and "Top 20 by Average Cache Miss Volume" lists, this indicates that the table has a high access frequency, a large query data volume, and a low cache hit ratio. Therefore, you should prioritize prefetching the cache for this table.

The following table describes the main fields.

Field	Description
Access count	The number of CONNECTOR_SCAN operators that scan the table.
Cache hit ratio	The proportion of data successfully read from the local cache. Calculated as `Total cache hit volume / (Total cache hit volume + Total cache miss volume)`. The total cache hit volume is the total amount of data read from the local cache by all CONNECTOR_SCAN operators. The total cache miss volume is the total amount of data read from remote storage by all CONNECTOR_SCAN operators.
Local IO time percentage	The proportion of IO time spent reading from the local cache. Calculated as `Total local IO time / (Total local IO time + Total remote IO time)`. The total local IO time is the sum of IO time spent by all CONNECTOR_SCAN operators reading from the local cache. The total remote IO time is the sum of IO time spent by all CONNECTOR_SCAN operators reading from remote storage.
Local IO count percentage	The proportion of IO operations for reading from the local cache. Calculated as `Total local IO count / (Total local IO count + Total remote IO count)`. The total local IO count is the sum of IO operations by all CONNECTOR_SCAN operators reading from the local cache. The total remote IO count is the sum of IO operations by all CONNECTOR_SCAN operators reading from remote storage.
Average cache hit volume	The average amount of data read from the cache per execution. Calculated as `Total cache hit volume / Executions`.
Average cache miss volume	The average amount of data read from remote storage per execution. Calculated as `Total cache miss volume / Executions`.

Parameterized SQL dimension

A parameterized SQL statement is created by replacing constants in an SQL statement with the ? parameter. This process preserves the original syntax structure while removing comments and adjusting spaces. Parameterization maps original SQL statements that have the same syntax structure to a single parameterized SQL statement, which allows for a comprehensive analysis of similar types of SQL statements.

Parameterized SQL analysis sorts by metrics such as scan count, cache hit ratio, cache hit volume, and cache miss volume to retrieve the top parameterized SQL statements and display their related metrics.

Parameterized SQL analysis provides the following information:

You can view the cache hit status of the top-ranked parameterized SQL statements by scan count. For example, in the figure below, the seventh-ranked parameterized SQL statement has a scan count of 1,086 and a long average query time of 42.5 seconds, but its cache hit ratio is only 6.5%. By optimizing these SQL statements or prefetching the cache for the tables they scan, you can use the cache more effectively and improve query performance.
You can view the cache hit status of the top-ranked parameterized SQL statements by average query time. For example, as shown in the figure below, the first-ranked parameterized SQL statement has an average query time of 201 seconds, but its scan count is only 4. This indicates that the parameterized SQL statement corresponds to a cold query. You can determine whether it requires optimization based on your business needs. In contrast, the seventh-ranked parameterized SQL statement has an average query time of 42.5 seconds and a scan count of 1,086. This parameterized SQL statement deserves more attention.
You can view parameterized SQL statements with a low cache hit ratio and a high cache miss volume. You can decide whether to optimize them based on their scan count, average query time, and other metrics.

The following table describes the main fields.

Field	Description
Parameterized SQL ID	The hash value of the parameterized SQL statement. It is used to mark the statement.
Scan count	The total number of CONNECTOR_SCAN operators for all SQL statements that correspond to the parameterized SQL statement.
Cache hit ratio	Calculated as `Total cache hit volume / (Total cache hit volume + Total cache miss volume)`. The total cache hit volume is the total amount of data read from the local cache by all CONNECTOR_SCAN operators. The total cache miss volume is the total amount of data read from remote storage by all CONNECTOR_SCAN operators.
Local IO time percentage	Calculated as `Total local IO time / (Total local IO time + Total remote IO time)`. The total local IO time is the sum of IO time spent by all CONNECTOR_SCAN operators reading from the local cache. The total remote IO time is the sum of IO time spent by all CONNECTOR_SCAN operators reading from remote storage.
Local IO count percentage	Calculated as `Total local IO count / (Total local IO count + Total remote IO count)`. The total local IO count is the sum of IO operations by all CONNECTOR_SCAN operators reading from the local cache. The total remote IO count is the sum of IO operations by all CONNECTOR_SCAN operators reading from remote storage.
Average cache hit volume	The average amount of data read from the cache per execution. Calculated as `Total cache hit volume / Executions`.
Average cache miss volume	The average amount of data read from remote storage per execution. Calculated as `Total cache miss volume / Executions`.
Parameterized SQL statement	The text of the parameterized SQL statement.

SQL dimension

This section sorts SQL statements executed in StarRocks by metrics such as query time, cache hit ratio, cache hit volume, and cache miss volume. This lets you retrieve the top SQL statements and view their corresponding metrics.

You can use these metrics for in-depth cache analysis. For example, you can view SQL statements with long execution times, low cache hit ratios, and high cache miss counts. You can improve the cache hit status by optimizing these SQL statements or prefetching the cache for the tables they access.

The following table describes the main fields.

Field	Description
Query ID	The unique identifier generated for each SQL execution in StarRocks. A new ID is generated for each SQL execution.
Query time	The time consumed during SQL execution, in seconds (s).
Cache hit ratio	Calculated as `Total cache hit volume / (Total cache hit volume + Total cache miss volume)`. The total cache hit volume is the total amount of data read from the local cache by all CONNECTOR_SCAN operators. The total cache miss volume is the total amount of data read from remote storage by all CONNECTOR_SCAN operators.
Local IO time percentage	Calculated as `Total local IO time / (Total local IO time + Total remote IO time)`. The total local IO time is the sum of IO time spent by all CONNECTOR_SCAN operators reading from the local cache. The total remote IO time is the sum of IO time spent by all CONNECTOR_SCAN operators reading from remote storage.
Local IO count percentage	Calculated as `Total local IO count / (Total local IO count + Total remote IO count)`. The total local IO count is the sum of IO operations by all CONNECTOR_SCAN operators reading from the local cache. The total remote IO count is the sum of IO operations by all CONNECTOR_SCAN operators reading from remote storage.
Data table	All tables, views, and materialized views queried by the SQL statement.
Username	The StarRocks database user who executed the SQL.
sumSegmentInit	The sum of the segment initialization time for all CONNECTOR_SCAN operators in the SQL query.
SQL statement	The text of the executed SQL statement.