All Products
Search
Document Center

Hologres:Hologres Monitoring Metrics in the Console

Last Updated:Feb 05, 2026

This topic explains the meaning of each Hologres monitoring metric. Understanding these metrics helps you select the most suitable ones for your business needs, monitor resource usage and SQL execution in real time, promptly detect system errors, and handle instance failures.

Important Notes

  • Notes about QE and FixedQE:

    • QE is a collective term for Hologres proprietary vector compute engines, such as HQE and SQE, under the XQE engine family. In slow query logs, queries with Engine Type={XQE} fall under the QE category in monitoring metrics.

    • FixedQE refers to queries that use the Fixed Plan path. In slow query logs, queries with Engine Type={FixedQE} (SDK in versions earlier than V2.2) fall under the FixedQE category in monitoring metrics.

  • Notes about Command Type:

    • The Command Type matches the SQL statement type. For example, INSERT xxx or INSERT xxx ON CONFLICT DO UPDATE/NOTHING are both classified as INSERT.

    • UNKNOWN: A classification for SQL statements that the DPI engine cannot recognize due to SQL syntax errors.

    • UTILITY: Administrative, definition, and control commands other than INSERT, UPDATE, DELETE, and SELECT.

      These include the following:

      • Data Definition Language (DDL): CREATE, ALTER, DROP, TRUNCATE, and COMMENT.

      • Transaction Control Language (TCL): BEGIN, COMMIT, ROLLBACK, and SAVEPOINT.

      • Administration and maintenance: ANALYZE, VACUUM, EXPLAIN, SET, SHOW, COPY, and REFRESH.

      • Execution and procedural control: PREPARE, EXECUTE, DEALLOCATE, CALL, and DECLARE CURSOR.

      • Others: LOCK TABLE, LISTEN, and NOTIFY.

  • In Cloud Monitor, each metric has a unique ID that lets you find specific metrics more easily. Metric IDs for different instance types have different prefixes. For example, general-purpose instances, follower instances, Virtual Warehouse Instances, and Lakehouse Acceleration (Shared Cluster) use the prefixes standard_, follower_, warehouse_, and shared_, respectively. The metrics supported by each instance type are listed below:

  • If a metric shows no data, it may be because the current instance version does not support it or there has been no activity for an extended period.

  • Monitoring data is retained for up to 30 days.

Access Control

The monitoring page in the Hologres console retrieves data from Cloud Monitor. If you use a Resource Access Management (RAM) user to view monitoring information, you must grant the appropriate permissions based on your business needs. These permissions include the following:

  • AliyunCloudMonitorFullAccess: Full management permissions for Cloud Monitor.

  • AliyunCloudMonitorReadOnlyAccess: Read-only access permissions for Cloud Monitor.

For more information about RAM user authorization, see Grant permissions to RAM users.

Monitoring Metrics Overview

The following monitoring metrics are available in Hologres:

Categorization

Metric

Description

Supported Instance Types

Notes

CPU

Instance CPU Usage (%)

The CPU usage of the instance.

General-purpose instance, follower instance, and compute group instance

None

Worker Node CPU Usage (%)

The CPU usage of each Worker node in the instance.

Cluster CPU Usage (%)

CPU utilization for each Cluster in the compute group.

Compute group instance

Supported only in Hologres V4.0 and later.

Memory

Instance Memory Usage (%)

The total memory usage of the instance.

General-purpose instance, follower instance, and compute group instance

None

Worker Node Memory Usage (%)

The memory usage of each Worker node in the instance.

Detailed Compute Group Memory Usage (%)

Memory usage is broken down by system, meta, cache, query, and background categories.

General-purpose instance, follower instance, and compute group instance

Supported only in Hologres V2.0 and later.

QE Query Memory Usage (bytes)

The amount of memory used by queries that are executed by the QE engine.

Supported only in Hologres V2.0.44 and later, and V2.1.22 and later.

QE Query Memory Usage (%)

The percentage of memory used by queries that are executed by the QE engine.

Cluster Memory Usage (%)

The memory usage of each Cluster in the compute group.

Compute group instance

Supported only in Hologres V4.0 and later.

Query QPS and RPS

Query QPS (count/s)

The total queries per second (QPS) across the instance.

General-purpose instance, follower instance, compute group instance, and shared cluster instance

Query QPS ≥ QE QPS + FixedQE QPS.

Note

The total QPS includes all queries, such as UNKNOWN, UTILITY, and Engine Type={PG}. Therefore, the total QPS is greater than or equal to the sum of QE QPS and FixedQE QPS.

QE Query QPS (count/s)

The QPS of queries that are executed by the QE engine.

General-purpose instance, follower instance, and compute group instance

Supported only in Hologres V2.2 and later.

FixedQE Query QPS (count/s)

The QPS of queries that are executed by the FixedQE (formerly SDK) engine.

DML RPS (count/s)

The total rows per second (RPS) for DML queries in the instance.

General-purpose instance and compute group instance

DML RPS = QE RPS + FixedQE RPS

QE DML RPS (count/s)

The RPS of DML operations that are executed by the QE engine.

Supported only in Hologres V2.2 and later.

FixedQE DML RPS (count/s)

The RPS of DML operations that are executed by the FixedQE engine.

Query Latency

Query Latency (milliseconds)

The latency of queries in the instance.

General-purpose instance, follower instance, compute group instance, and shared cluster instance

None

QE Query Latency (milliseconds)

The latency of queries that are executed by the QE engine.

General-purpose instance, follower instance, and compute group instance

Supported only in Hologres V2.2 and later.

FixedQE Query Latency (milliseconds)

The latency of queries that are executed by the FixedQE engine.

Optimization Phase Duration (milliseconds)

The duration of the Optimization phase for a query.

General-purpose instance, follower instance, compute group instance, and shared cluster instance

Supported only in Hologres V2.0.44 and later, and V2.1.22 and later.

Start Query Phase Duration (milliseconds)

The duration of the Start Query phase for a query.

Get Next Phase Duration (milliseconds)

The duration of the Get Next phase for a query.

Query P99 Latency (milliseconds)

The P99 latency of queries.

None

Longest Running Query Duration in This Instance (milliseconds)

The duration of the longest-running query among those that are currently executing in the instance.

Failed Query QPS

Failed query QPS (milliseconds)

The total number of failed queries per second in the instance.

General-purpose instance, follower instance, compute group instance, and shared cluster instance

Failed Query QPS ≥ QE Failed Query QPS + FixedQE Failed Query QPS.

Note

Failed Query QPS counts all failed queries, such as UNKNOWN, UTILITY, and Engine Type={PG}. Therefore, the total failed QPS is greater than or equal to the sum of QE Failed Query QPS and FixedQE Failed Query QPS.

QE Failed Query QPS (count/s)

The number of failed queries per second that are executed by the QE engine.

General-purpose instance, follower instance, and compute group instance

Supported only in Hologres V2.2 and later.

FixedQE Failed Query QPS (count/s)

The number of failed queries per second that are executed by the FixedQE engine.

General-purpose instance and compute group instance

Locks

Maximum FE Lock Wait Time (milliseconds)

The wait time for DDL locks on FE nodes.

General-purpose instance, follower instance, and compute group instance

Supported only in Hologres V2.0.44 and later, and V2.1.22 and later.

FixedQE Backend Lock Wait Time (milliseconds)

The lock wait time for FixedQE, which is typically for HQE locks.

Total Backend Lock Wait Time for Instance (milliseconds)

The delay for HQE locks in the instance, which includes FixedQE HQE lock delays.

Connection

Total Connections (count)

The total number of connections used in the instance.

General-purpose instance, follower instance, compute group instance, and shared cluster instance

None

Connections by Database (count)

The number of connections used by each database in the instance.

General-purpose instance, follower instance, and compute group instance

Connections by FE (count)

The number of connections used by each FE in the instance.

Connection Usage Rate of FE with Highest Usage (%)

The connection usage rate in the instance, which defaults to the FE with the highest usage rate.

Query Queue

Queued Queries Count

The number of query requests that are waiting to be executed but have not yet been processed.

General-purpose instance, follower instance, and compute group instance

Supported only in Hologres V3.0 and later.

Query Queue Entry QPS (count/s)

The number of query requests submitted to the system queue per second.

Queries Transitioned from Queued to Running QPS (count/s)

The number of query requests that transition from the waiting state to the running state per second.

QPS by State for Queries That Started Running (count/s)

The per-second request count for queries that have started running but have not yet completed, grouped by execution state.

Average Query Queue Wait Time (milliseconds)

The average time between entering the queue and starting processing. This does not include the actual query execution time.

Query Queue Auto-Rate-Limit Max Concurrency (count)

The maximum concurrency for auto-rate-limited query queues.

Compute group instance

Supported only in Hologres V3.1 and later.

I/O

Standard I/O Read Throughput (bytes/s)

The I/O throughput when reading Standard storage data.

General-purpose instance, follower instance, and compute group instance

None

Standard I/O Write Throughput (bytes/s)

The I/O throughput when writing Standard storage data.

General-purpose instance and compute group instance

Low-Frequency IO Read Throughput (bytes/s)

The I/O throughput when reading IA storage data.

General-purpose instance, follower instance, and compute group instance

Write throughput for low-frequency I/O (bytes/s)

The I/O throughput when writing IA storage data.

General-purpose instance and compute group instance

Storage

Standard Storage Used Capacity (bytes)

The used capacity in Standard storage.

General-purpose instance and compute group instance

None

Standard Storage Usage (%)

The usage percentage of Standard storage capacity.

IA Storage Used Capacity (bytes)

The used capacity in IA storage.

IA Storage Usage (%)

The usage percentage of IA storage capacity.

Recycle Bin Storage Usage (bytes)

The storage used by the recycle bin.

General-purpose instance and compute group instance

Supported only in Hologres V3.1 and later.

Frameworks

FE Replay Delay (milliseconds)

The replay delay for each FE.

General-purpose instance, follower instance, and compute group instance

Supported only in Hologres V2.2 and later.

Shard Multi-Replica Sync Delay (milliseconds)

The sync delay between Shard replicas after replication is enabled.

None

Primary-Follower Sync Delay (milliseconds)

The delay that occurs when a follower instance reads data from the primary instance. This is visible only for follower instances.

Cross-Instance File Sync Delay (milliseconds)

The file sync delay between disaster recovery instances.

General-purpose instance

Auto Analyze

Tables Missing Statistics per Database (count)

The number of tables that are missing statistics in each database.

General-purpose instance and compute group instance

Supported only in Hologres V2.2 and later.

Serverless Computing

Longest Running Serverless Computing Query Duration (milliseconds)

The duration of the longest-running query in Serverless Computing after it is enabled.

General-purpose instance and compute group instance

Supported only in Hologres V2.1 and later.

Serverless Computing Query Queue Count

The number of queries that are queued in the Serverless Computing resource pool.

Supported only in Hologres V2.2 and later.

Serverless Computing Resource Quota Usage (%)

The ratio of the actual Serverless Computing resources used to the maximum allocatable resources.

Binary Logging

Binlog Consumption Rate (count/s)

The number of Binlog entries consumed per second.

General-purpose instance, follower instance, and compute group instance

Supported only in Hologres V2.2 and later.

Binlog Consumption Rate (bytes/s)

The number of bytes consumed from Binlog per second.

WAL Sender Count per FE (count)

The number of WAL senders used per FE.

WAL Sender Usage Rate of FE with Highest Usage (%)

The WAL sender usage rate of the FE with the highest usage rate.

Computing Resource

Elastic Core Count for Compute Groups

The number of cores that are elastically added by time-based scaling in the compute group.

Compute group instance

Supported only in Hologres V2.2.21 and later.

Compute Group Auto-Elastic Core Count (count)

The number of cores that are elastically added by auto-scaling in the compute group.

Supported only in Hologres V4.0 and later.

Gateway

Gateway CPU Usage (%)

The CPU usage of each Gateway in the instance.

Compute group instance

Supported only in Hologres V2.0 and later.

Gateway Memory Usage (%)

The memory usage of each Gateway in the instance.

Supported only in Hologres V2.0 and later.

Gateway New Connection Requests per Second (count/s)

The maximum number of new connections that the system can accept and successfully establish per second.

Supported only in Hologres V2.1.12 and later.

Gateway Inbound Traffic Rate (B/s)

The volume of data that enters the system through the Gateway per second.

Supported only in Hologres V2.1 and later.

Gateway Outbound Traffic Rate (B/s)

The volume of data sent from the Gateway to external systems per second.

Supported only in Hologres V2.1 and later.

Dynamic Table

Instance-Level Dynamic Table Refresh Failure QPS (count/s)

The refresh failure QPS across all Dynamic Tables in the instance. You can use this metric to assess the overall health of the refresh process.

Supported only in Hologres V4.0.8 and later.

Dynamic Table Data Latency (seconds)

The latency of each Dynamic Table relative to the latest upstream base table data or expected timestamp, in seconds. You can use this metric to assess data freshness.

Dynamic Table Current Refresh Duration (milliseconds)

The current duration of the ongoing refresh task for each Dynamic Table, in milliseconds. You can use this metric to detect whether refresh cycles are lengthening.

Dynamic Table Refresh Failure QPM (count/minute)

The number of refresh failures per minute for each Dynamic Table. You can use this metric to evaluate the refresh stability of each table.

CPU

The following metrics relate to CPU usage.

Instance CPU Usage (%)

Instance CPU usage reflects the overall CPU load on the instance.

  • Even without active queries, background processes or asynchronous compaction tasks may consume CPU resources. A small amount of CPU usage during idle periods is normal.

  • Hologres efficiently leverages multi-core parallel computing. A single query can often push CPU usage to 100%, which indicates full utilization of compute resources.

  • If CPU usage remains near 100% for extended periods, such as three hours at 100% or twelve hours above 90%, the instance is under a heavy load. The CPU is likely the bottleneck in the system. You should investigate your workload and queries by considering the following questions:

    • Are large offline data imports (INSERT) occurring with growing data volumes?

    • Are high-QPS queries or writes consuming all CPU resources?

    • Are there hybrid workloads in or outside the aforementioned scenarios?

  • If full CPU usage is required for your business needs, you can scale up the instance to handle more complex queries or larger datasets.

Note

For more information, see FAQ for monitoring metrics.

Worker Node CPU Usage (%)

Worker node CPU usage reflects the CPU load on each Worker node. Hologres provides a varying number of Worker nodes depending on the instance type. For more information, see Instance management.

  • This metric is supported only in Hologres V1.1 and later.

  • If all Worker nodes show sustained CPU usage near 100%, the instance is heavily loaded. You can optimize resource usage or scale up the instance based on your workload.

  • If only some Worker nodes show high CPU usage while others have low usage, a resource skew exists. For common causes and troubleshooting steps, see FAQ for monitoring metrics.

Cluster CPU Usage (%)

The CPU usage of each Cluster in the compute group.

Memory

The following metrics relate to memory usage.

Instance Memory Usage (%)

Instance memory usage reflects the overall memory consumption.

  • Hologres reserves memory. Even without active queries, metadata, indexes, and data caches are loaded into memory to accelerate retrieval and computation. Therefore, non-zero memory usage during idle periods is normal. Typically, 30% to 40% usage is expected when the instance is idle.

  • If memory usage steadily climbs toward 80%, memory may become a bottleneck and affect stability or performance.

  • You can use memory distribution metrics along with QPS and other indicators to identify high-memory consumers and perform optimizations. For more information, see Troubleshooting guide for out-of-memory issues.

Worker Node Memory Usage (%)

Worker node memory usage reflects the memory load on each Worker node. Hologres provides a varying number of Worker nodes depending on the instance type. For more information, see Instance management.

  • This metric is supported only in Hologres V1.1 and later.

  • If all Worker nodes show sustained memory usage near 80%, the instance is heavily loaded. You can optimize resource usage or scale up the instance based on your workload.

  • If only some Worker nodes show high memory usage while others have low usage, a resource skew exists. For common causes and troubleshooting steps, see FAQ for monitoring metrics.

Detailed Compute Group Memory Usage (%)

Hologres divides memory into the following categories: system (System), metadata (Meta), cache (Cache), query (Query), and background process (Background). Starting in V2.0.15, memory distribution metrics can help you analyze usage patterns and optimize effectively. The key categories include the following:

  • System: The memory used by system components such as Holohub, Gateway, and Frontend (FE). The FE includes the FE Master and FE Query, so System memory fluctuates with query activity.

  • Cache: memory used for caching. It includes the following:

    • SQL-related caches, such as the result cache and block cache. These caches change dynamically with query execution. Higher cache hit rates improve query performance. For example, smaller values in the Physical read bytes field of EXPLAIN ANALYZE indicate better cache hit rates. Caches have size limits.

    • Meta cache: Schema metadata and file metadata. To accelerate query execution, Hologres preloads relevant metadata into the cache, which reduces cold access and improves performance.

    • The cache size is fixed, typically at around 30% of the total instance memory. Some cache usage persists even when the instance is idle, which is mainly for Meta.

  • Meta: The memory used for metadata and files. Hologres uses a lazy open mode where frequently accessed metadata stays in memory, but infrequently accessed metadata does not. This mode reduces memory pressure. You should keep Meta usage under 30% of the total memory. High Meta usage suggests many files or partitioned tables. You can use Table statistics overview and analysis to manage tables.

  • Query: The memory consumed during SQL execution. The usage scales with query complexity and concurrency. This includes the memory used by Fixed Plan, HQE, and SQE.

    • Query memory uses elastic allocation. The minimum memory per Worker is 20 GB, and the maximum depends on the available free memory. Higher memory usage in other categories reduces the elastic memory available for Query.

    • High Query memory usage or out-of-memory (OOM) events suggest complex queries or high concurrency. You can optimize queries or scale up the instance. For more information, see Optimize query performance.

  • Background: The memory used by background tasks such as compaction and flush. Background memory usage is typically low, under 5%. It temporarily increases during index changes, bulk writes, or updates, and then drops as tasks are completed.

  • Memtable: The memory used for in-memory tables. Memtables store data after real-time writes, updates, or deletes. Memtable usage is typically under 5%.

QE Query Memory Usage (bytes)

The memory used by queries that are executed by HQE, SQE, or other XQE engines.

  • This metric is supported only in Hologres V2.0.44 and later, and V2.1.22 and later.

  • In memory breakdowns, Query memory usage exceeds QE Query memory usage.

QE Query memory usage helps you assess query complexity. Higher usage indicates more complex queries that require more memory.

QE Query Memory Usage (%)

QE Query memory usage helps you assess the instance load. High usage may cause OOM errors. You can optimize queries or scale up the instance.

This metric is supported only in Hologres V2.0.44 and later, and V2.1.22 and later.

Cluster Memory Usage (%)

The memory usage of each Cluster in the compute group.

Query QPS and RPS

Query QPS (count/s)

Query QPS is the average number of SQL statements executed per second across the instance. It includes SELECT, INSERT, UPDATE, DELETE, UTILITY, and UNKNOWN statements. Query QPS ≥ QE Query QPS + FixedQE Query QPS.

QE Query QPS (count/s)

The number of queries executed per second by the QE engine. This includes SELECT, INSERT, UPDATE, and DELETE statements.

This metric is supported only in Hologres V2.2 and later.

FixedQE Query QPS (count/s)

The number of queries executed per second by the FixedQE engine (Fixed Plan path, formerly SDK). This includes SELECT, INSERT, UPDATE, and DELETE statements.

This metric is supported only in Hologres V2.2 and later.

DML RPS (count/s)

DML RPS is the average number of data records imported or updated per second. It includes INSERT, UPDATE, and DELETE statements. Therefore, DML RPS = QE DML RPS + FixedQE DML RPS.

QE DML RPS (count/s)

The number of data records imported or updated per second by the QE engine. This includes INSERT, UPDATE, and DELETE statements.

  • This metric is supported only in Hologres V2.2 and later.

  • Common QE scenarios include the following:

    • Batch import or update from MaxCompute or OSS external tables.

    • Batch write or update using COPY.

    • Batch import between Hologres tables.

FixedQE DML RPS (count/s)

The number of data records imported or updated per second by INSERT, UPDATE, and DELETE SQL statements executed by the FixedQE engine within the instance (formerly named SDK). Specifically:

Query Latency

Query Latency (milliseconds)

The average latency of all queries in the instance. This includes SELECT, INSERT, UPDATE, DELETE, UTILITY, and UNKNOWN statements. Query Latency ≥ MAX(QE Query Latency, FixedQE Query Latency).

QE Query Latency (milliseconds)

The average latency of queries that are executed by the QE engine. This includes SELECT, INSERT, UPDATE, and DELETE statements.

  • This metric is supported only in Hologres V2.2 and later.

  • To troubleshoot increased QE Query latency, you can check the Optimization duration, Start Query duration, Get Next duration, and QE QPS.

FixedQE Query Latency (milliseconds)

The average latency of queries that are executed by the FixedQE engine. This includes SELECT, INSERT, UPDATE, and DELETE statements.

  • This metric is supported only in Hologres V2.2 and later.

  • High FixedQE Query latency may result from the following reasons:

    • Occasional spikes: These may indicate HQE locks. You can check whether the FixedQE backend lock wait time has increased. If it has, you can use Query Insight to identify the locking queries.

    • Persistent high latency: This may result from a suboptimal table design or interference from complex queries. See Common issues and diagnostics for Blink and Flink.

Optimization Phase Duration (milliseconds)

The time spent in the Optimization phase for a query. During this phase, the optimizer parses the SQL statement and generates a physical plan for the execution engine.

  • This metric is supported only in Hologres V2.0.44 and later, and V2.1.22 and later.

  • Long Optimization durations suggest complex queries. If queries differ only in their parameters, you can use Prepared Statements to reduce optimization overhead. For more information, see JDBC.

Start Query Phase Duration (milliseconds)

The time spent in the Start Query phase, which is the initialization before the actual query execution. This includes locking and schema version alignment.

  • This metric is supported only in Hologres V2.0.44 and later, and V2.1.22 and later.

  • Long Start Query durations often result from lock waits or high CPU usage. You can use execution plans for deeper analysis.

Get Next Phase Duration (milliseconds)

The time from the end of the Start Query phase until all results are returned. This includes computation and result delivery.

  • This metric is supported only in Hologres V2.0.44 and later, and V2.1.22 and later.

  • Long Get Next durations often reflect complex computations. You can correlate this with QE memory usage and QE QPS. If no anomalies exist, the client may simply be waiting to receive the results.

Query P99 Latency (milliseconds)

The P99 latency of all queries in the instance. This includes SELECT, INSERT, UPDATE, UTILITY, and system queries.

Longest Running Query Duration in This Instance (milliseconds)

The duration of the longest-running query in the instance. This metric reports the longest-running query at the current moment. It includes SELECT, INSERT, UPDATE, DELETE, UTILITY, and UNKNOWN statements.

  • This metric is supported only in Hologres V1.1 and later.

  • Hologres is a distributed system. The number of Worker nodes varies by instance type. Queries are randomly distributed across Workers. This metric reports the longest-running query across all Workers. For example, if Workers run queries for 10 minutes, 5 minutes, and 30 seconds, the reported duration is 10 minutes.

  • You can combine this metric with active queries or slow query logs to assess query duration, diagnose long-running queries, and resolve deadlocks or hangs.

Note

Metrics are reported every minute. Therefore, the "current running duration" starts slightly after the query begins. This metric aids in anomaly detection by helping you quickly locate long-running queries, but it does not provide precise timing.

Failed Query QPS

Failed Query QPS (milliseconds)

The Failed Query Count is the average number of failed SQL statements per second within an instance, such as SELECT, INSERT, UPDATE, DELETE, UTILITY, and UNKNOWN. Failed Query QPS >= QE Failed Query QPS + FixedQE Failed Query RPS.

You can use the failed query type and frequency to find failing queries in the slow query logs. You can then analyze the root causes to improve availability.

QE failed query QPS (milliseconds)

The number of queries that fail per second when using the QE engine. This includes SELECT, INSERT, UPDATE, and DELETE statements.

This metric is supported only in Hologres V2.2 and later.

FixedQE Failed Query QPS (milliseconds)

The number of queries that fail per second when using the FixedQE engine. This includes SELECT, INSERT, UPDATE, and DELETE statements.

This metric is supported only in Hologres V2.2 and later.

Locks

Maximum FE Lock Wait Time (milliseconds)

Hologres is a distributed system. Multiple FE nodes parse, dispatch, and route SQL statements. When multiple connections are routed to the same FE and perform DDL operations on the same table, such as CREATE or DROP, FE locks occur. This metric shows how long each FE waits for DDL locks.

  • This metric is supported only in Hologres V2.2 and later.

  • DDL operations always incur a lock wait time. If the FE lock wait time exceeds five minutes and the FE replay delay also spikes, a DDL operation may be stuck. You can use Manage queries to find and terminate long-running queries.

FixedQE Backend Lock Wait Time (milliseconds)

INSERT, DELETE, or UPDATE queries that use HQE take table locks. Queries that use FixedPlan take row locks. The FixedQE backend lock wait time increases when FixedPlan queries wait for row locks while HQE queries hold table locks on the same table.

  • This metric is supported only in Hologres V2.2 and later.

  • If the FixedQE lock wait time is high, you can use slow query logs to find slow FixedQE queries. Then, you can use Query Insight to identify the locking HQE queries.

Instance Total Backend Lock Wait Time (milliseconds)

The total lock wait time for INSERT, DELETE, or UPDATE queries in the instance. This includes FixedQE and HQE lock wait times.

  • This metric is supported only in Hologres V2.2 and later.

  • If the lock wait time is high, you can use slow query logs to find slow INSERT, DELETE, or UPDATE queries. Then, you can use Query Insight to identify the locking HQE queries.

Connection

Total Connections (count)

Hologres sets default connection limits based on the instance type. For more information, see Instance management. Total connections represent all active connections, including those in active, idle, and idle-in-transaction states. You can use Manage queries to view the current usage. You should kill idle connections if the number of available connections is low.

Connections by Database (count)

The number of connections aggregated by database. You can use this to assess the connection usage for each database. Note the following:

  • The default connection limit per database is 128. For more information, see Instance management.

  • If the number of connections approaches the limit, you should review the idle connections versus business connections. For more information, see Connection management. You can clean up idle connections or scale up to add capacity.

  • If the connection load skews across Workers, you can use Connection management to clean up idle connections and balance the load.

Connections by FE (count)

The number of connections aggregated by FE. You can use this to assess the connection usage for each FE. Note the following:

  • The default connection limit per FE node is 128. For more information, see Instance management.

  • If the number of connections approaches the limit, you should review the idle connections versus business connections. For more information, see Connection management. You can clean up idle connections or scale up to add capacity.

  • If the connection load skews across Workers, you can use Connection management to clean up idle connections and balance the load.

Connection Usage Rate of FE with Highest Usage (%)

This metric reports the highest connection usage rate among all FE (Frontend) nodes: Max(frontend_connection_used_rate). This helps you spot when connections are approaching the limit on any FE node and prevent connection failures. FE nodes use round-robin load balancing where new connections are distributed evenly across FEs. You can use Manage queries to view the current usage. You should kill idle connections if the number of available connections is low.

Query Queue

Queued Queries Count (count)

The number of query requests that are waiting to be executed but have not yet been processed.

This metric is supported only in Hologres V3.0 and later.

Query Queue Entry QPS (count/s)

The number of query requests submitted to the system queue per second. You can use this to gauge the system load and query frequency.

This metric is supported only in Hologres V3.0 and later.

Queries Transitioned from Queued to Running QPS (count/s)

The number of query requests that transition from the waiting state to the running state per second.

This metric is supported only in Hologres V3.0 and later.

QPS by State for Queries That Started Running (count/s)

The QPS for queries in the query queue, grouped by state. The states include the following:

  • kReadyToRun (qualified to run)

  • kQueueTimeout (failed due to queue timeout)

  • kCanceled (failed due to cancellation)

  • kExceedConcurrencyLimit (failed due to concurrency limit)

This metric is supported only in Hologres V3.0 and later.

Average Query Queue Wait Time (milliseconds)

The average time between entering the queue and starting processing, not including the actual query execution time, in milliseconds.

This metric is supported only in Hologres V3.0 and later.

Query Queue Auto-Rate-Limit Max Concurrency (count)

The maximum concurrency for auto-rate-limited query queues.

This metric is supported only in Hologres V3.1 and later.

I/O

I/O throughput measures the read and write volume of the instance. It reflects disk I/O activity and helps you assess the system load and diagnose issues. Note: 1 GiB = 1024 MiB = 1024 × 1024 KiB.

Note
  • Standard storage (hot): The I/O throughput is not fixed. It mainly depends on the CPU load.

  • For the IA storage class (cold storage), the maximum I/O throughput is 80 MB/s * (number of cores / 16).

Standard I/O Read Throughput (bytes/s)

The I/O throughput when queries read Standard storage data.

Standard I/O Write Throughput (bytes/s)

The I/O throughput when queries write Standard storage data.

Low-frequency I/O Read Throughput (bytes/s)

The I/O throughput when queries read IA storage data.

Low-frequency I/O write throughput (bytes/s)

Represents the I/O throughput when Query writes data to the IA storage class.

Storage

The logical disk space used by instance data, which is the sum of all database storage, including the recycle bin. Note: 1 GiB = 1024 MiB = 1024 × 1024 KiB. Hologres storage usage grows continuously with no hard cap.

For subscription instances, storage that exceeds the purchased amount is automatically billed on a pay-as-you-go basis. This does not impact system stability or usability.

After you exceed the storage capacity, you should promptly upgrade the storage or delete unused data to avoid unnecessary storage costs. The savings can be used to fund additional compute resources.

You can use the pg_relation_size function to view table and database storage sizes and details. You can also use Table Info for fine-grained table management.

Standard Storage Used Capacity (bytes)

The capacity used in Standard storage. You should scale up the storage if the usage exceeds the purchased capacity.

Standard Storage Usage (%)

The usage percentage of Standard storage capacity. You should scale up the storage if the usage exceeds the purchased capacity.

IA Storage Used Capacity (bytes)

The capacity used in IA storage. You should scale up the storage if the usage exceeds the purchased capacity.

IA Storage Usage (%)

The usage percentage of IA storage capacity. You should scale up the storage if the usage exceeds the purchased capacity.

Recycle Bin Storage Usage (bytes)

Hologres supports a table recycle bin starting in V3.1. Tables that are dropped using the DROP command remain in the recycle bin for a retention period. This lets you recover accidentally dropped tables. Tables in the recycle bin still consume instance storage. You should monitor the recycle bin usage for each database. If frequent table drops cause high recycle bin usage, you can configure tables to skip the recycle bin upon deletion.

Framework

FE Replay Delay (milliseconds)

Hologres is a distributed system. Multiple Frontend (FE) nodes handle SQL parsing, dispatch, and routing. For DDL operations, Hologres first executes the operation on one FE and then replays it on the others. Note the following:

  • FE replay takes time. Delays at the millisecond or second level are normal.

  • If an FE's replay delay exceeds several minutes, too many DDL operations may overwhelm the replay process. If the delay continues to increase, a query may be stuck. You can use hg_stat_activity to find and kill long-running queries.

  • This metric is supported only in Hologres V2.2 and later.

Shard Multi-Replica Sync Delay (milliseconds)

The sync delay between Shard replicas after Replication is enabled.

  • The typical Shard replica delay is in milliseconds.

  • Heavy data writes, updates, or frequent DDL operations may increase the sync delay.

Primary-Follower Sync Delay (milliseconds)

The delay that occurs when a follower instance reads data from the primary instance, in milliseconds. Note the following:

  • This metric appears only for follower instances, not primary instances.

  • Data appears only after a follower instance is bound to a primary instance (0 ms initially). The sync delay fluctuates when the primary instance receives writes.

  • The normal sync delay is in milliseconds. Occasional jitter, for example, from primary DDL operations, is safe to ignore. A persistent high delay of more than a few seconds may indicate a high instance load or a resource shortage. You should check the CPU and memory usage and scale up the instance if needed.

  • The sync delay may spike to several minutes during restarts or upgrades and then recovers automatically.

Cross-Instance File Sync Delay (milliseconds)

The file sync delay between disaster recovery instances. This metric appears only on follower instances (read-only followers).

Auto Analyze

Tables Missing Statistics per Database (count)

The number of tables that are missing statistics in each database.

  • This metric is supported only in Hologres V2.2 and later.

  • For Hologres V2.0 and later, Auto Analyze runs by default. After a table is created or after bulk writes or updates, the statistics may lag. You should first observe the statistics for a short period.

  • If a database consistently lacks statistics for hours or days, Auto Analyze may not have been triggered. You can use the HG_STATS_MISSING view to list the affected tables and then manually run the ANALYZE command to update the statistics.

  • If a database consistently lacks statistics for hours or days, Auto Analyze may not have been triggered. You can review the table statistics and manually run the ANALYZE command. For more information, see ANALYZE and AUTO ANALYZE.

Serverless Computing

Longest Running Serverless Computing Query Duration (milliseconds)

Hologres supports Serverless Computing. You can run specific queries in a dedicated Serverless Computing resource pool to isolate them from the main instance and ensure fast execution.

  • This metric is supported only in Hologres V2.1 and later.

  • This metric shows the longest-running query in Serverless Computing. You can use hg_stat_activity to inspect the status of Serverless queries.

Serverless Computing Query Queue Count (count)

The number of queries that are queued in the Serverless Computing resource pool.

This metric is supported only in Hologres V2.2 and later.

Serverless Computing Resource Quota Usage (%)

The ratio of the actual Serverless Computing resources used to the maximum allocatable resources over a given time.

This metric is supported only in Hologres V2.2 and later.

Binary Logging

Binlog Consumption Rate (count/s or bytes/s)

Hologres supports subscribing to Hologres Binlog. Binlog enables real-time data tiering and accelerates data forwarding.

Binlog Consumption Rate (count/s)

The number of Binlog entries consumed per second. This metric is supported only in Hologres V2.2 and later.

Binlog Consumption Rate (bytes/s)

The number of bytes consumed from Binlog per second. Larger fields or higher data volumes increase the byte count. This metric is supported only in Hologres V2.2 and later.

WAL Sender Count and Usage Rate

Similar to regular connections, each shard of each table consumes one WAL sender connection when consuming Binlog using JDBC. WAL sender connections are independent of regular connections. The number of WAL senders has a default limit.

WAL Sender Count per FE (count)

The number of WAL senders used per FE node.

WAL Sender Usage Rate of FE with Highest Usage (%)

The utilization rate of the frontend (FE) that uses the most WAL senders.

You can use both metrics to assess WAL sender usage. If the usage reaches the limit, see Consume Hologres Binlog via JDBC for troubleshooting.

Computing Resource

Elastic Core Count (Count) for Compute Groups

Hologres compute group instances support time-based elasticity. For more information, see Time-based elasticity (Beta). This metric shows the number of cores that are added using time-based elasticity.

Compute Group Auto-Elastic Core Count (count)

Hologres compute group instances support auto-elasticity. For more information, see Multi-cluster and auto-elasticity (Beta). This metric shows the number of cores that are added using auto-elasticity.

Gateway

Gateway CPU Usage (%)

The CPU usage of each Gateway in the instance.

  • This metric is supported only in Hologres V2.0 and later.

  • Gateways use round-robin traffic forwarding. CPU usage occurs even without new connections.

  • Starting in Hologres V2.2.22, Gateways launch more worker threads by default to improve the handling of new connections, which increases CPU usage.

Gateway Memory Usage (%)

The memory usage of each Gateway in the instance.

This metric is supported only in Hologres V2.0 and later.

Gateway New Connection Requests per Second (count/s)

The maximum number of new connections that the system can accept and successfully establish per second.

  • This metric is supported only in Hologres V2.1.12 and later.

  • A single Gateway handles up to approximately 100 new connections per second.

  • If the number of new connection requests approaches 100 × Gateway count, the Gateways become the bottleneck for handling new connections. You can configure a connection pool or scale up the number of Gateways.

Gateway Inbound Traffic Rate (B/s)

The volume of data that enters the system through the Gateway per second.

  • This metric is supported only in Hologres V2.1 and later.

  • If the inbound traffic approaches 200 MiB/s × Gateway count, the Gateway network capacity becomes the bottleneck. You can scale up the number of Gateways.

Gateway Outbound Traffic Rate (B/s)

The volume of data sent from the Gateway to external systems per second.

  • This metric is supported only in Hologres V2.1 and later.

  • If the outbound traffic approaches 200 MiB/s × Gateway count, the Gateway network capacity becomes the bottleneck. You can scale up the number of Gateways.

Dynamic Table Monitoring and Alerting

Starting in Hologres V4.0.8, Dynamic Tables offer monitoring metrics to help you better manage refresh tasks. For more information, see Monitoring and alerting.

Common Monitoring Metric Issues

The FAQ for monitoring metrics topic lists common issues. It helps you diagnose problems faster, identify root causes, and apply fixes, which boosts your self-service capabilities.

Monitoring Metric Alerting

You can set alerts for monitoring metrics in Cloud Monitor to detect anomalies early and minimize the impact on your business. For more information, see Cloud Monitor.