Troubleshoot high CPU utilization on PostgreSQL - PolarDB

When a PolarDB for PostgreSQL cluster experiences high or sustained CPU utilization, use this guide to identify the root cause and apply the appropriate fix.

In this topic, you can learn:

How to determine whether increased workloads are driving CPU usage
How to identify slow queries using pg_stat_statements and pg_stat_activity
How to detect sequential scans and high buffer reads
How to cancel or terminate problem queries and optimize query performance

Causes

High CPU utilization typically has one of two root causes:

Increasing workloads — more active connections consume more compute resources. This is the most common cause.
Slow queries — one or more queries with poor performance hold CPU for an extended period.

Diagnose increasing workloads

Check whether the cluster has more active connections than usual.

If your cluster has a monitoring system, view the active connection count trend in the monitoring dashboard.

If no monitoring system is available, connect to the database and run:

SELECT COUNT(*) FROM pg_stat_activity WHERE state NOT LIKE 'idle';

pg_stat_activity is a built-in system view in PolarDB for PostgreSQL. Each row represents one running backend process. The state column indicates the current state of that process:

State	Description
`active`	The process is executing a query
`idle`	The process is waiting for new commands from the client
`idle in transaction`	The process is in a transaction but not executing a query
`idle in transaction (aborted)`	The process is in a transaction with a statement error
`fastpath function call`	The process is executing a fast-path function
`disabled`	Status tracking is disabled for this process

If the active connection count is higher than normal, the cause is increased workload.

Diagnose slow queries

If active connections are within normal range but CPU is still high, one or more slow queries are likely responsible. A single query with no index or a poorly optimized plan can hold the CPU for a long time.

Note

When CPU utilization approaches 100%, all queries slow down, which causes the slow query log (controlled by log_min_duration_statement) to fill with entries. This makes the log difficult to use for pinpointing the actual cause. Use pg_stat_statements instead.

Diagnose slow queries

Find slow queries by execution time

The pg_stat_statements extension records statistics of all SQL statements during the optimization and execution phases. It uses shared memory, so its name is added to the shared_preload_libraries parameter.

If the extension is not yet created in your database, run:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Because pg_stat_statements accumulates statistics continuously, clear all existing data before starting to diagnose so the results reflect only the current period.

Clear existing statistics:

-- Clear all existing statistics in the current database.
SELECT pg_stat_reset();
-- Clear all existing statistics collected by the pg_stat_statements extension.
SELECT pg_stat_statements_reset();

Wait 1–2 minutes for the database to collect enough data.

Query the top five statements by total planning time and total execution time:

SELECT * FROM pg_stat_statements ORDER BY total_plan_time DESC LIMIT 5;
SELECT * FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 5;

Find slow queries by buffer reads

Tables without indexes on frequently queried columns require sequential scans (full table reads). All rows are loaded into memory and filtered there, which significantly increases CPU usage.

Run the following query to find the top five statements by buffer reads:

SELECT * FROM pg_stat_statements
ORDER BY shared_blks_hit + shared_blks_read DESC
LIMIT 5;

To find tables with the most sequential scan activity, query pg_stat_user_tables. The following example returns the top five tables with more than 100,000 live tuples and at least one sequential scan, ordered by total tuples read via sequential scans:

SELECT * FROM pg_stat_user_tables
WHERE n_live_tup > 100000 AND seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 5;

Find long-running queries

Use pg_stat_activity to identify queries that have been running for an unusually long time.

Run the following query to find the top five longest-running non-idle queries, with elapsed time in seconds:

SELECT
    *,
    extract(epoch FROM (NOW() - xact_start)) AS xact_stay,
    extract(epoch FROM (NOW() - query_start)) AS query_stay
FROM pg_stat_activity
WHERE state NOT LIKE 'idle%'
ORDER BY query_stay DESC
LIMIT 5;

To narrow results to queries on a specific table that have been running for more than 10 seconds, run:

SELECT * FROM pg_stat_activity
WHERE
    state NOT LIKE 'idle%' AND
    query ILIKE '%Table name%' AND
    NOW() - query_start > interval '10s';

Solutions

Once you have identified the problem queries, apply one or more of the following fixes.

Cancel or terminate problem queries

If a small number of unexpected queries are consuming most of the CPU, cancel or terminate them using their pid value from the pid column of pg_stat_activity.

SELECT pg_cancel_backend(pid);
SELECT pg_terminate_backend(pid);

After either operation succeeds, CPU utilization returns to normal.

Update table statistics for the query optimizer

If the slow query is necessary, update the statistics on the tables it uses. Fresh statistics let the query optimizer generate a better execution plan.

Note

Running ANALYZE consumes CPU resources. Run it during off-peak hours.

ANALYZE <Table name>;

Create indexes on frequently scanned columns

For tables that require frequent sequential scans, create indexes on the columns used in filter conditions. An index allows the database to locate matching rows directly, without scanning every row in the table and filtering in memory.