All Products
Search
Document Center

E-MapReduce:Health report

Last Updated:Mar 18, 2025

This topic describes the content of the health report of an E-MapReduce (EMR) Serverless StarRocks instance. This topic also explains the fields in the health report. Health reports are generated with a delay of one day. A health report consists of the following parts: SQL queries, table analysis, import tasks, and cache analysis.

View the health report of a StarRocks instance

  1. Navigate to the EMR Serverless StarRocks instance list page.

    1. Log in to the E-MapReduce console.

    2. In the left-side navigation pane, select EMR Serverless > StarRocks.

    3. In the top menu bar, select a region that suits your business needs.

  2. On the Instances tab, find the desired StarRocks instance and click the name of the instance.

  3. Click the Health Report tab.

    On the page that appears, you can view information on the SQL Query, Table Analysis, Import Tasks, and Cache Analysis tabs.

SQL Query

On this tab, you can view information in the following sections: Query Latency, DB Lock, SQL Analysis, and Parameterized SQL Analysis. You can configure the Date, SqlCategory, and Catalog parameters to view specific top SQL metrics.

Valid values of the SqlCategory parameter:

  • DML: DML statements that are used to query and modify data. Example: SELECT, UPDATE, and DELETE statements.

  • DDL: DDL statements that are used to define and modify the data structure. Example: CREATE and ALTER statements.

  • Other: SQL statements other than DML and DDL statements, such as auxiliary statements. Example: SHOW statements.

Query Latency

This section displays the metrics related to query latency. The values of the metrics are obtained based on the audit data of a specific day.

For example, the Query Latency P99 metric specifies the time within which 99% of requests are responded. This metric provides insight into the query performance by analyzing the distribution of the system response time. This metric is essential for assessing service quality and user experience.

The Query Latency P99 metric reflects the service response speed that most users actually experience. When the value of the Query Latency P99 metric is large, it may indicate the need to increase computing resources or optimize the query logic to enhance processing efficiency.

You can view the query latency data of a specific day to identify potential performance risks on an instance in advance and prevent your business from being affected.

image

DB Lock

Database locks are used to monitor and analyze lock races that occur during access to a database. Database locks prevent multiple transactions from accessing or modifying the same data resource at the same time. To ensure data consistency and integrity, database locks are used to control concurrent access to specific data rows or tables. If a transaction needs to wait for a long time to acquire a lock, the response speed and service quality of the overall system decrease. Data in this section can help you identify and troubleshoot performance issues.

SQL Analysis

This section displays the top SQL statements and related fields based on specific dimensions, such as query time, CPU consumption, and memory consumption. You can troubleshoot potential performance issues based on the fields. For example, you can view the top 10 SQL statements with the longest execution time on a specific day in the Slow Query Top 10 section. You can optimize the slow SQL statements based on the detailed information about query profiles. For more information about profile analysis, see Query profile.

The following table describes the fields.

Field

Description

QueryId

The unique ID that is generated when an SQL statement is executed in StarRocks. An ID is generated each time an SQL statement is executed.

User

The user who executes SQL statements in StarRocks.

Query Time (ms)

The time consumed to execute SQL statements. Unit: millisecond.

CPU Cost (ns)

The CPU time consumed to execute SQL statements. Unit: nanosecond.

Memory Bytes

The memory consumed to execute SQL statements. Unit: bytes.

Scan Bytes

The amount of data that is accessed during the execution of SQL statements. Unit: bytes.

Scan Rows

The number of data rows that are accessed during the execution of SQL statements.

Return Rows

The number of rows that are returned after SQL statements are executed.

SQL Text

The texts of the executed SQL statement.

Parameterized SQL Analysis

SQL statement parameterization is the process of replacing constants in an SQL statement with the ? parameter, deleting comments, and adjusting spaces in the SQL statement to generate a new SQL statement that has the same syntax as the original SQL statement. You can parameterize an SQL statement to generate a new SQL statement that has the same syntax as the original SQL statement. This helps you analyze SQL statements of the same type in a comprehensive manner.

In the following example, after parameterization of two SQL statements, one SQL statement is obtained. The result shows that the SQL statements are of the same type.

  • Original SQL statements

    SELECT * FROM orders WHERE customer_id=10 AND quantity>20
    
    SELECT * FROM orders WHERE customer_id=20 AND quantity>100
  • SQL statement after parameterization

    SELECT * FROM orders WHERE customer_id=?  AND quantity>?

The Parameterized SQL Analysis section displays top SQL statements and related fields based on the execution count, execution time, variance and covariance (VC) of execution time, total CPU consumption, total memory consumption, and execution error count.

Parameterized SQL statements can help you achieve the following purposes:

  • Obtain the overall execution status of SQL statements in StarRocks.

  • Optimize SQL statements that are frequently executed, take a long time to execute, and consume a large amount of CPU and memory resources to achieve greater optimization benefits.

  • Use the CV Of Query Time field to measure the stability of SQL execution time and identify potential performance issues. For example, if the values of the CV Of Query Time field for two SQL statements of the same type vary significantly, the larger value indicates that the corresponding SQL statement may be pended due to reasons, such as data skew or insufficient resources.

    The following table describes the key fields.

    Field

    Description

    Parameterized SQL ID

    The hash value of the parameterized SQL statement.

    CV Of Query Time

    The ratio of the standard deviation of the execution time of SQL statements to the average value of the execution time of SQL statements. In most cases, a larger value indicates a greater variance in the execution time of SQL statements of the same type.

    Execution Count

    The total number of times that the parameterized SQL statement is executed.

    Parameterized SQL Text

    The texts of the parameterized SQL statement.

  • You can use the Execution Error Count field to identify the causes of SQL statement execution failures.

    The following table describes the key fields.

    Field

    Description

    Parameterized SQL ID

    The hash value of the parameterized SQL statement.

    Execution Error Count

    The number of times that the parameterized SQL statement failed to be executed.

    Execution Count

    The total number of times that the parameterized SQL statement is executed.

    Parameterized SQL Text

    The texts of the parameterized SQL statement.

Table Analysis

This tab displays top tables and related fields based on the query frequency, types of SQL statements, and data distribution. You can optimize specific tables based on the fields. The following table describes the fields.

Field

Description

SQL Exec Count

The total number of executions of SQL statements on the table. In most cases, a large value indicates that the table requires fine-grained optimization to improve the performance of the StarRocks instance.

Associated Parameterized SQL Count

The number of parameterized SQL statements that are associated with the table. You can analyze the types of SQL statements that are executed on the table to optimize the table design. Furthermore, you can identify commonalities among different types of SQL statements and check whether materialized views are required to accelerate the query of data in the table.

CV Data Size Of Tablet

The CV of tablet data size in a partition. This field reflects the distribution of tablet data in the table. The value of this field is calculated by using the following formula: Standard deviation of tablet data size in a partition/Average tablet data size in the partition. A large value indicates that a partition may have data skew issues.

Import Tasks

This tab displays statistical data and analysis results of import tasks from multiple dimensions in the Too many versions and Top Analysis sections.

Note

Only statistical data and analysis results of import tasks of StarRocks shared-nothing instances are provided.

Too many versions

This section provides information about tables to which data is frequently imported based on the analysis of logs that contain the error message "too many versions". When the compaction score exceeds 1,000, the "Too many versions" error is reported. To resolve the issue, you can reduce the number of concurrent import tasks and decrease the import frequency.

Top Analysis

Top Analysis of Potential Small Files When Data Load Table

The system analyzes the data files generated by all import tasks of each table to evaluate the severity of potential small file issues and obtain an impact score. This section displays the top 20 tables based on the impact score in descending order. Small files may decrease query performance and reduce compaction efficiency. To reduce the number of small files, we recommend that you perform the following operations:

  • Specify the number of partitions and buckets based on the actual data size of a table.

  • Increase the data scale for batch processing. This helps increase the data processing capability and reduces the number of small files in Object Storage Service (OSS).

  • Adjust the compaction frequency if resources are insufficient. Compaction can integrate data files and improve system performance. However, compaction consumes system resources. Therefore, we recommend that you adjust the compaction frequency to balance resource usage.

The formula used to calculate the impact score of small files varies based on the table type.

  • Primary key tables: Impact score of small files = Total number of files/Average file size. If the average file size is small and the number of files is large, small files have a greater impact on a table.

  • Non-primary key tables: Impact score of small files = Total number of files/Average time required to write data to a single file. If the average time required to write data to a file is short and the number of files is large, small files have a greater impact on a table.

You can use the preceding formulas to quantify the impact exerted by small files and optimize the top 20 tables obtained based on the impact score in descending order to improve the overall performance of the StarRocks instance.

The following table describes the key fields.

Field

Description

Tables

The collection that contains information about multiple tables that are involved in an import task.

Table Type

The table type. Primary key tables and non-primary key tables are included. Non-primary key tables include fact tables, aggregate tables, and update tables.

Small File Score

The score used to evaluate the potential impact of small files on a table. A higher score indicates that small files have a greater impact on the table.

Update Bucket Count

The total number of tablets that need to be updated for an import task.

Write File Count

The total number of segment files to which data is written.

AVG of Write File (KB)

The average data size that is written to a file. This value is calculated by using the following formula: Total data size/Total number of files.

AVG of Write Time (ms)

The average time required to write data to a file. The value is calculated by using the following formula: Total time required to write data to all files/Total number of files.

Top Hot Analysis Of Data Load Table

This section displays the top 20 tables based on the number of import tasks in descending order. The import tasks of the top 20 tables are most frequently run and generate the maximum number of data import transactions.

Hot Load Node Analysis

This section displays the statistical data of nodes to help you check whether data is evenly distributed. For example, you can compare the values of the Total Write Size (MB) field for brokers to check whether data is evenly written to the brokers.

Cache Analysis

This tab displays cache-related information about StarRocks shared-data instances in the Table Dimension, Parameterized SQL Dimension, and SQL Dimension sections. You can configure the Date parameter to view cache-related information of a specific day in different sections.

Note

The cache analysis feature is not applicable to StarRocks shared-nothing instances.

Usage notes

Increasing the cache hit ratio of StarRocks instances helps improve query performance. This tab provides a cache analysis of StarRocks instances based on the query profiles of the instances. You can query and obtain cache-related metrics in a query profile. This helps you implement an in-depth analysis of cache usage.

In a query profile, a CONNECTOR_SCAN operator is generated each time a table, view, or materialized view is scanned. The query profile provides various metrics for the CONNECTOR_SCAN operator. The following table describes specific cache-related metrics.

Metric name

Description

Internal table

External table

Cache hit volume

The number of bytes read from the cache.

CompressedBytesReadLocalDisk

dataCacheReadBytes

Cache miss volume

The number of bytes read from the remote storage.

CompressedBytesReadRemote

FSIOBytesRead

Local I/O count

The number of I/O operations that are triggered when data is read from the local cache.

IOCountLocalDisk

dataCacheReadCounter

Remote I/O count

The number of I/O operations that are triggered when data is read from the remote storage.

IOCountRemote

FSIOCounter

Local I/O time

The I/O time consumed to read data from the local cache.

IOTimeLocalDisk

dataCacheReadTimer

Remote I/O time

The I/O time consumed to read data from the remote storage.

IOTimeRemote

FSIOTime

You can calculate the values of the following cache-related metrics based on the preceding metrics:

  • Cache hit ratio = Cache hit volume/(Cache hit volume + Cache miss volume)

  • Local I/O time percentage = Total local I/O time/(Total local I/O time + Total remote I/O time)

  • Local I/O count percentage = Total local I/O count/(Total local I/O count + Total remote I/O count)

The system extracts the preceding metrics of the CONNECTOR_SCAN operator of all SQL statements from the query profiles of a specific day, obtains other cache-related metrics, such as average cache hit volume and cache hit ratio, and then displays cache-related information in the Table Dimension, Parameterized SQL Dimension, and SQL Dimension sections of the Cache Analysis tab. You can view the information in the sections and analyze the cache usage based on your business requirements. The following table describes the sections.

Section

Scenario

Table Dimension

This section provides cache-related information about tables with high access frequencies, low cache hit ratios, and high cache miss volumes. In this case, you can optimize cache performance and improve query performance by using methods such as cache prefetch.

Parameterized SQL Dimension

This section provides cache-related information about the parameterized SQL statements that have the largest number of CONNECTOR_SCAN operators or have the longest average query time. You can optimize the parameterized SQL statements or prefetch cached data of specific tables that are queried by the parameterized SQL statements to improve query performance.

SQL statement

This section provides cache-related information about SQL statements with long query time, SQL statements with low cache hit ratios, and SQL statements with high cache miss volumes. You can optimize the SQL statements or prefetch cached data of specific tables that are queried by the SQL statements to improve query performance.

Table Dimension

This section displays top tables and related fields based on the access frequency, cache hit rate, cache hit volume, and cache miss volume. In this section, you can perform the following operations:

  • View the cache hit statistics of tables that are most frequently accessed. For example, the number of times that a table is accessed is 6411, and the cache hit ratio is 24.2%. In this case, you can increase the cache hit ratio of this table to improve query performance. To optimize cache performance for tables with high access frequencies and low cache hit ratios, you can prefetch cached data for the tables.

  • View tables with low cache hit ratios and tables with large cache miss volumes. You can prefetch cached data for these tables to optimize cache performance and improve query performance.

During actual analysis, you can determine whether to prefetch the cached data of a table based on multiple fields. For example, a table appears in the Hot Table Top 20, Low Cache Hit Rate Top 20, and Average Cache Miss Volume Top 20 sections under Table Dimension at the same time. This indicates that you need to prefetch cached data for the table.

The following table describes the key fields.

Field

Description

Scan Count

The number of CONNECTOR_SCAN operators generated by a table.

Cache Hit Ratio

The ratio of data that is successfully read from the local cache to total data. The value of this field is calculated by using the following formula: Cache hit volume/(Cache hit volume + Cache miss volume). The total cache hit volume is the total amount of data read from the local cache by all CONNECTOR_SCAN operators. The total cache miss volume is the total amount of data read from the remote storage by all CONNECTOR_SCAN operators.

Local IO Time Percentage

The ratio of I/O time that is consumed to read data from the local cache to the total I/O time. The value of this field is calculated by using the following formula: Total local I/O time/(Total local I/O time + Total remote I/O time). The total local I/O time is the sum of the time that is consumed by all CONNECTOR_SCAN operators to read data from the local cache. The total remote I/O time is the sum of the time that is consumed by all CONNECTOR_SCAN operators to read data from the remote storage.

Local IO Count Percentage

The ratio of I/O operations that are triggered when data is read from the local cache to total I/O operations. The value of this field is calculated by using the following formula: Total local I/O count/(Total local I/O count + Total remote I/O count). The total local I/O count is the sum of the I/O operations that are triggered by all CONNECTOR_SCAN operators to read data from the local cache. The total remote I/O count is the sum of the I/O operations that are triggered by all CONNECTOR_SCAN operators to read data from the remote storage.

Average Cache Hit Volume

The average amount of data that is read from the cache. The value of this field is calculated by using the following formula: Total cache hit volume/SQL statement executions.

Average Cache Miss Volume

The average amount of data that is read from the remote storage. The value of this field is calculated by using the following formula: Total cache miss volume/SQL statement executions.

Parameterized SQL Dimension

SQL statement parameterization is the process of replacing constants in an SQL statement with the ? parameter, deleting comments, and adjusting spaces in the SQL statement to generate a new SQL statement that has the same syntax as the original SQL statement. You can parameterize an SQL statement to generate a new SQL statement that has the same syntax as the original SQL statement. This helps you analyze SQL statements of the same type in a comprehensive manner.

This section displays parameterized SQL statements and specific fields based on the scan count, cache hit ratio, cache hit volume, and cache miss volume.

Parameterized SQL statements can help you achieve the following purposes:

  • View the cache hit statistics of top parameterized SQL statements based on the number of scans. The following figure shows the top parameterized SQL statements based on the number of scans. In the figure, the number of scans of the seventh parameterized SQL statement is 1,086, and the average query time is 42.5 seconds. However, the cache hit ratio is only 6.5%. In this case, you can optimize the parameterized SQL statement or prefetch cached data of specific tables to optimize cache performance and improve query performance.

    image

  • View the cache hit statistics of top parameterized SQL statements by average query time. The following figure shows top parameterized SQL statements by average query time. In the figure, the average query time of the first parameterized SQL statement is 201 seconds, and the scan count is 4. This indicates that the parameterized SQL statement corresponds to a cold query. You can determine whether to optimize the SQL statement based on your business requirements. The average query time of the seventh parameterized SQL statement is 42.5 seconds, and the scan count is 1,086. You must pay attention to such parameterized SQL statements.

    image

  • View parameterized SQL statements with low cache hit ratios and parameterized SQL statements with high cache miss volumes, and determine whether to optimize such parameterized SQL statements based on specific fields, such as Scan Count and Average Query Time(s).

    image

The following table describes the key fields.

Field

Description

Parameterized SQL ID

The hash value of the parameterized SQL statement.

Scan Count

The total number of CONNECTOR_SCAN operators generated by all SQL statements that correspond to the parameterized SQL statement.

Cache Hit Ratio

The ratio of data that is successfully read from the local cache to total data. The value of this field is calculated by using the following formula: Cache hit volume/(Cache hit volume + Cache miss volume). The total cache hit volume is the total amount of data read from the local cache by all CONNECTOR_SCAN operators. The total cache miss volume is the total amount of data read from the remote storage by all CONNECTOR_SCAN operators.

Local IO Time Percentage(%)

The ratio of I/O time that is consumed to read data from the local cache to the total I/O time. The value of this field is calculated by using the following formula: Total local I/O time/(Total local I/O time + Total remote I/O time). The total local I/O time is the sum of the time that is consumed by all CONNECTOR_SCAN operators to read data from the local cache. The total remote I/O time is the sum of the time that is consumed by all CONNECTOR_SCAN operators to read data from the remote storage.

Local IO Count Percentage

The ratio of I/O operations that are triggered when data is read from the local cache to total I/O operations. The value of this field is calculated by using the following formula: Total local I/O count/(Total local I/O count + Total remote I/O count). The total local I/O count is the sum of the I/O operations that are triggered by all CONNECTOR_SCAN operators to read data from the local cache. The total remote I/O count is the sum of the I/O operations that are triggered by all CONNECTOR_SCAN operators to read data from the remote storage.

Average Cache Hit Volume

The average amount of data that is read from the cache. The value of this field is calculated by using the following formula: Total cache hit volume/SQL statement executions.

Average Cache Miss Volume

The average amount of data that is read from the remote storage. The value of this field is calculated by using the following formula: Total cache miss volume/SQL statement executions.

Parameterized SQL

The texts of the parameterized SQL statement.

SQL Dimension

This section displays top SQL statements and specific fields based on the query time, cache hit ratio, cache hit volume, and cache miss volume.

You can perform an in-depth analysis of cache performance based on these fields. For example, you can view SQL statements with long query time, SQL statements with low cache hit ratios, and SQL statements with high cache miss volumes. You can optimize the preceding SQL statements or prefetch cached data of specific tables that are queried by the SQL statements to improve the cache hit ratio.

image

The following table describes the key fields.

Field

Description

Query ID

The unique ID that is generated when an SQL statement is executed in StarRocks. A new ID is generated each time an SQL statement is executed.

Query Time (s)

The time consumed to execute an SQL statement. Unit: seconds.

Cache Hit Ratio

The ratio of data that is successfully read from the local cache to total data. The value of this field is calculated by using the following formula: Cache hit volume/(Cache hit volume + Cache miss volume). The total cache hit volume is the total amount of data read from the local cache by all CONNECTOR_SCAN operators. The total cache miss volume is the total amount of data read from the remote storage by all CONNECTOR_SCAN operators.

Local IO Time Percentage

The ratio of I/O time that is consumed to read data from the local cache to the total I/O time. The value of this field is calculated by using the following formula: Total local I/O time/(Total local I/O time + Total remote I/O time). The total local I/O time is the sum of the time that is consumed by all CONNECTOR_SCAN operators to read data from the local cache. The total remote I/O time is the sum of the time that is consumed by all CONNECTOR_SCAN operators to read data from the remote storage.

Local IO Count Percentage

The ratio of I/O operations that are triggered when data is read from the local cache to total I/O operations. The value of this field is calculated by using the following formula: Total local I/O count/(Total local I/O count + Total remote I/O count). The total local I/O count is the sum of the I/O operations that are triggered by all CONNECTOR_SCAN operators to read data from the local cache. The total remote I/O count is the sum of the I/O operations that are triggered by all CONNECTOR_SCAN operators to read data from the remote storage.

Table

The tables, views, or materialized views that are queried by using an SQL statement.

Username

The user who executes SQL statements in StarRocks.

sumSegmentInit

The sum of the initialization time for the segments of all CONNECTOR_SCAN operators in an SQL query.

SQL

The texts of the executed SQL statement.