This topic provides answers to frequently asked questions about the metrics of Hologres.

How do I view and terminate connections when the Connections metric value is large?

The Connections metric indicates the total number of SQL connections in a Hologres instance, including Java Database Connectivity (JDBC) connections to PostgreSQL databases in the active and idle states. The number of connections in an instance depends on the instance type of the instance. If the number of connections in an instance is large or exceeds the maximum number of connections allowed for the instance or if one of the following errors is reported, you must check whether a connection leak has occurred:
  • FATAL: sorry, too many clients already connection limit exceeded for superusers
  • FATAL: remaining connection slots are reserved for non-replication superuser connections
You can go to the Connectivity page in the HoloWeb console to terminate unexpected or idle connections. For more information, see Manage connections in the HoloWeb console.
Note You can view all connections in an instance by using the superuser account.

How do I view slow queries when the Query Latency metric value is large?

You can go to the Historical Slow Query page on the Diagnostics and Optimization tab of the HoloWeb console to view slow queries. For more information, see Query and analyze slow query logs.

You can use one of the following methods to decrease the query latency:
  • Write data when a small number of queries are in progress or reduce the number of concurrent write operations to improve query efficiency. If you write data from external tables, you can perform one of the following operations to decrease concurrency:
    -- Set the maximum number of queries to be performed at a time in MaxCompute. Default value: 128. To prevent one query from affecting other queries or even causing system unavailability, we recommend that you set this parameter to a small value. 
    set hg_experimental_foreign_table_executor_max_dop = 32; --We recommend that you set this parameter to decrease concurrency.
    
    -- Set the number of data entries to be read at a time in a MaxCompute table. Default value: 8192. 
    set hg_experimental_query_batch_size = 1024;
    
    --Read data from the ORC table.
    set hg_experimental_enable_access_odps_orc_via_holo = on;
    
    -- Set the size of each shard of a MaxCompute table. Default value: 64. Unit: MB. The shard size affects the concurrency. If a table is large in size, you can increase the value of this parameter to prevent excessive shards from deteriorating query performance. 
    set hg_experimental_foreign_table_split_size = 512MB;
  • Optimize the queries to improve efficiency.
  • Scale the Hologres instance on which the queries are performed. For more information, see Instances.

How do I view active queries when the metric values of the instance are large?

You can go to the Active Query page on the Diagnostics and Optimization tab of the HoloWeb console to view active queries. For more information, see View the details of a query task.

How do I cancel a query that remains active for an extended period of time and set the timeout period?

You can go to the Active Query page on the Diagnostics and Optimization tab of the HoloWeb console to view active queries and cancel a query by clicking Kill in the Operation column. You can set the timeout period for a query by executing SQL statements on the SQL Editor tab. For more information, see Manage queries.

How do I troubleshoot high memory usage?

The Memory Usage metric indicates the memory usage of the instance. In Hologres, memory resources are reserved by backend processes. Even when no queries are in progress, metadata, indexes, and data cache of data tables are loaded to the memory to facilitate archiving and computing. Therefore, the memory usage is not zero. When no queries are in progress, a memory usage of 30% to 40% is normal.

In some cases, memory usage may continue to rise or even approach 100%. This can occur due to the following possible causes:
  • The number of tables and data volume grow, which overwhelms the computing capacity of the instance. The memory usage increases proportionally with the volume of metadata and the number of indexes. As the numbers of tables and indexes and the data volume increase, the memory usage grows.
  • The index set is not suitable. For example, many bitmap indexes are created or dictionary encoding is enabled for a table that contains a large number of TEXT columns. In this case, you can modify the bitmap index or dictionary encoding property of the table. For more information, see ALTER TABLE.

However, if the memory usage continues to increase and remains close to 100%, memory resources may have become a bottleneck of the instance and the stability or performance of the instance may be affected. For example, when a large amount of metadata occupies the memory space available for queries, errors such as SERVER_INTERNAL_ERROR, ERPC_ERROR_CONNECTION_CLOSED, and Total memory used by all existing queries exceeded memory limitation may occur during queries. When a large amount of metadata occupies the cache space available for queries, cache hits decrease and query latency increases.

When the memory usage remains close to 100%, we recommend that you perform the following operations:
  • Delete the data that is no longer queried to release the occupied memory.
  • Set indexes appropriately. You can delete unnecessary bitmap indexes or disable dictionary encoding in specific business scenarios.
  • Upgrade the specifications of the instance to increase its computing and storage resources. We recommend that you upgrade an instance based on specific scenarios.
    • In scenarios where disk data can be read at a specific latency and the response time (RT) is not strict, we recommend that you select an appropriate instance type based on the volume of your data. One compute unit (CU) that includes 1 CPU core and 4 GB of memory can support data storage of 50 GB to 100 GB.
    • In serving scenarios that require a short RT, we recommend that you cache all hotspot data in the memory. By default, the cache accounts for 30% of the total memory. In such scenarios, 1.3 GB of memory out of 1 CU is used for data cache, and table metadata also occupies part of the data cache. For example, in a scenario that requires a short RT, 100 GB of hotspot data needs to be cached in the memory. After the data is read and decompressed, it takes up more than 100 GB of memory. In this case, at least 320 GB of memory is required, which corresponds to at least 96 CUs.

Why is the CPU utilization as high as 100% even when only a single query is in progress?

The CPU Utilization metric indicates the CPU utilization of the instance. Hologres supports multi-core parallel computing. In most cases, the CPU utilization during a single query can increase to 100%. This indicates that the computing resources are being fully utilized. A high CPU utilization is not an issue. A CPU utilization with slow queries and writes is an issue. You must perform comprehensive analysis on the issue.

What do I do if the CPU utilization remains close to 100%?

If the CPU utilization remains close to 100%, the instance is heavily loaded. In this case, CPU resources have become a bottleneck of the instance. You must analyze specific business scenarios and queries to identify the cause to the high CPU utilization. You can perform the following operations:
  • Check whether a large amount of offline data is imported by using INSERT operations, and whether the amount of data is still increasing.

    You can reduce the amount of data to be imported or import the data during off-peak hours.

  • Check whether data is queried or written with high queries per second (QPS) to exhaust the CPU resources.

    You can reduce the amount of data to be written and allow queries to complete first.

  • Check whether a large amount of data is available in active queries.

    You can go to the Active Query page of the HoloWeb console to check for queries that remain active and cancel queries that do not meet your expectations. For more information, see Manage queries.

  • Scale the instance to accommodate more complex queries or larger data volumes. For more information, see Instances.