You can manage the resources of Hologres instances based on the metrics displayed on the Monitoring Information tab of the instance details page in the Hologres console. Hologres allows you to view the resource usage of instances, view the execution status of SQL statements, and resolve errors based on the metrics. This topic describes the metrics of Hologres.

Hologres provides the following metrics: Take note of the following items:
  • The Connections metric is not supported in Hologres V0.8.
  • If an instance of a version earlier than V0.9.27 contains more than 1,000 tables, the Real-time Import (RPS) metric may be lost or the metric value may be excessively low for the large amount of data. In this case, data accuracy is affected.
Note You can call API operations provided by CloudMonitor to query the metrics of Hologres instances. For more information, see Request methods.

CPU Utilization (%)

The CPU Utilization (%) metric indicates the CPU utilization of the instance.

In Hologres, computing resources are reserved by backend processes. Even if no queries are in progress, the CPU utilization is not zero.

Hologres supports multi-core parallel computing. In many cases, the CPU utilization can rise to 100% during a single query. This indicates that the computing resources are being fully utilized.

If the instance is heavily loaded, the CPU utilization remains close to 100%. In this case, the instance performance may be affected. You must analyze specific business scenarios to identify the cause. You can perform the following operations:
  • Check whether a large amount of offline data is imported by using INSERT operations, and whether the amount of data is still increasing.
  • Check whether data is queried or written with high queries per second (QPS) performance.
  • Check whether hybrid loads exist in the preceding or other scenarios.

If your instance cannot run normally due to insufficient resources, you can upgrade the specifications of the instance. This allows you to handle complex queries on larger amounts of data.

Memory Usage (%)

The Memory Usage (%) metric indicates the memory usage of the instance.

In Hologres, memory resources are reserved by backend processes. Even if no queries are in progress, specific metadata or index data is still loaded to the memory to improve the computing speed. If no queries are in progress, a memory usage of 30% to 40% is normal.

If the memory usage continues to rise or even approaches 100%, system performance, instance stability, and instance performance may degrade. The following section describes the possible causes, impacts, and solutions of this issue:
  • Causes
    • The number of tables and data volume grow and overwhelm the computing capacity of the instance. Memory usage increases proportionally with the volume of metadata and the number of indexes. As the numbers of tables and indexes and the data volume increase, memory usage grows.
    • Indexes are not properly set. For example, bitmap indexes are created or dictionary encoding is enabled for excessive fields.
  • Impacts
    • Instance stability is affected. For example, if a large amount of metadata occupies the memory space available for queries, errors such as SERVER_INTERNAL_ERROR, ERPC_ERROR_CONNECTION_CLOSED, and Total memory used by all existing queries exceeded memory limitation may occur during queries.
    • Instance performance is affected. If a large amount of metadata occupies the cache space available for queries, cache hits decrease and query latency increases.
  • Solutions
    • Delete data that is no longer needed to query to release the occupied memory.
    • Set indexes properly. You can delete unnecessary bitmap indexes or disable dictionary encoding in specific business scenarios.
    • Upgrade the specifications of the instance to increase its computing and storage resources. We recommend that you upgrade an instance based on specific scenarios.
      • In scenarios where disk data can be read at a specific latency and no strict requirement is imposed on the response time (RT), we recommend that you select an appropriate instance type based on the volume of your data. One compute unit (CU) that includes 1 CPU core and 4 GB of memory can store 50 GB to 100 GB of data. Every 32 CUs can store up to 3 TB of data.
      • In serving scenarios that require a short RT, we recommend that you cache all hotspot data in the memory. By default, the cache accounts for 30% of total memory. In such scenarios, 1.3 GB of memory out of 1 CU is used for data cache, and table metadata also occupies part of the data cache. For example, in a scenario that requires a short RT, 100 GB of hotspot data needs to be cached in the memory. After the data is read and decompressed, it occupies more than 100 GB of memory. In this case, at least 320 GB of memory is required, which corresponds to at least 96 CUs.

Storage Usage (bytes)

The Storage Usage (bytes) metric indicates the size of the logical disk that is used to store the data of all databases in the instance.

If you are using the subscription billing method and the quota of the storage resources is used up, the billing method is changed to pay-as-you-go. In this case, you can upgrade the specifications of the instance to increase the storage resources of the instance. You can also configure a lifecycle for the data of the instance based on your needs and delete redundant data in real time. Otherwise, you may be charged for extra storage. For information about how to configure a lifecycle, see CREATE TABLE. You can query the storage size of a table or database in Hologres by executing SQL statements. For more information, see Query the storage sizes of tables and databases.

Connections

The Connections metric indicates the total number of SQL connections in a Hologres instance, including Java Database Connectivity (JDBC) connections to PostgreSQL databases in the active and idle states.

The number of SQL connections in an instance depends on the instance type of the instance. For more information, see Instance types.

If one of the following conditions is met, the number of SQL connections reaches the upper limit. In this case, you must check whether a connection leak has occurred and release connections. For more information, see Manage connections.
  • The number of SQL connections reaches or even exceeds the value of the max_connections parameter. In this case, you can execute the show max_connections; statement to view the maximum number of connections allowed for the instance.
  • The FATAL: sorry, too many clients already connection limit exceeded for superusers error occurs.
  • The FATAL: remaining connection slots are reserved for non-replication superuser connections error occurs.

QPS

The QPS metric indicates the average number of SQL statements that are executed per second, including the SELECT, INSERT, UPDATE, and DELETE statements. This metric is updated every 20 seconds. If only a single SQL statement is executed within 20 seconds, the QPS of SQL statements is 0.05 (1/20 = 0.05).

Query Latency (ms)

The Query Latency (ms) metric indicates the average RT of SQL statements, including the SELECT, INSERT, UPDATE, and DELETE statements. You can query slow query logs to analyze the query latency. For more information, see Query and analyze slow query logs.

Real-time Import (RPS)

The real-time Import (RPS) metric indicates the number of entries that are imported or updated per second by using SQL statements or SDKs.

The insert RPS indicates the number of entries that are imported per second by using foreign tables, COPY statements, or Hologres tables.

The update RPS indicates the number of entries that are imported per second by executing the UPDATE or DELETE statement.

The SDK RPS indicates the number of entries that are imported or updated per second by using SDKs. Data can be imported by using SDKs in the following ways:

I/O Throughput (Bytes)

The I/O Throughput (Bytes) metric indicates the I/O throughput of the instance. This metric describes the amount of data read from or written to an instance and reflects how busy the read or write operations are. You can understand the workloads on the instance based on the I/O Throughput (Bytes) metric and troubleshoot relevant issues in real time.