All Products
Search
Document Center

ApsaraDB RDS:Performance Insight

Last Updated:Mar 28, 2026

Performance Insight is a load monitoring and diagnostics tool for ApsaraDB RDS for PostgreSQL. It visualizes database load through average active sessions (AAS), surfaces the SQL statements and wait events driving that load, and lets you drill down across multiple dimensions to resolve performance issues quickly.

Prerequisites

Before you begin, ensure that you have:

  • An RDS instance running PostgreSQL 13 or later

  • An instance in the High-availability Edition or Cluster Edition

  • A minor engine version of 20240530 or later

Enable Performance Insight

  1. Log on to the ApsaraDB RDS console. On the Instances page, select the region where your instance resides, and click the instance ID.

  2. In the left navigation pane, choose Monitoring And Alerts, then click the Performance Insight tab.

  3. Click Enable Performance Insight. In the dialog box, click OK.

    To disable the feature, click Disable Performance Insight on the Performance Insight tab.

    image

  4. Wait a few moments for data to appear on the Performance Insight page.

Important

Performance Insight is in public preview and is free of charge. Data is retained for 7 days. The official release will let you extend the retention period as needed. You will be notified in advance of any charges.

Analyze performance with Performance Insight

After enabling the feature, select a time range and click View to analyze performance. Follow this workflow to diagnose performance issues:

  1. Check the AAS chart for load spikes. The average active sessions (AAS) chart shows total instance load in real time. Peaks correspond to performance bottlenecks.

  2. Identify the dominant wait events. The AAS chart breaks down load by wait event type. The dimension with the highest AAS contribution points to the bottleneck category — for example, lock waits or I/O waits.

  3. Find the top SQL statements. Switch to the SQL tab in the multi-dimensional load table. The top entries show which queries consume the most resources or wait longest.

  4. Trace the source. Use the Hosts and Databases tabs to identify the client IP address and database generating the load.

image

Key performance metric charts

At the top of the page, trend charts track the following metrics over the selected time range:

MetricDescription
CPU/memory utilizationPercentage of CPU and memory resources in use
Session connectionsNumber of active connections to the instance
Transactions Per Second (TPS)Rate of committed transactions
Input/output operations per second (IOPS)Disk I/O throughput

Multi-dimensional load breakdown

Below the AAS chart, switch between dimension tabs to identify the source of high load:

TabWhat it shows
SQLSQL statements consuming the most resources
UserDatabase users generating the highest load
DatabasesDatabases with the most active sessions
WaitsWait event types contributing most to load
HostsClient host names or IP addresses
ApplicationsApplication names connected to the database
Session TypeTypes of current sessions

Troubleshoot lock contention from a slow SQL spike

Symptom

Monitoring data shows a sudden spike in slow SQL statements and a significant increase in application response time.

Step 1: Check performance metric trends

On the key performance metric charts, look for a sharp rise in active sessions. In this example, the session count jumped from a normal level of about a dozen to over 1,100 at 00:34 — well beyond the concurrent processing capacity of the 16-core PostgreSQL instance.

image

Step 2: Identify the wait events

The AAS chart shows the wait event distribution. In this case, two events dominate:

  • Lock/transactionid: A transaction ID lock wait, typically caused by a long-running transaction or a deadlock.

  • Lock/tuple: A row-level lock wait, indicating severe concurrent write conflicts.

The number of active sessions exceeds the theoretical processing limit of the 16-core CPU, confirming severe lock contention.

image

Step 3: Find the responsible SQL statements

Switch to the SQL tab. The top two queries show 220 and 119 sessions waiting for lock resources — these statements are the root of the lock wait chain.

image

Step 4: Trace the source

Check the Hosts and Databases tabs to identify where the load originates:

  • Source client: 140.205.XXX.XXX

  • Target database: perf_test

image

Root cause

Fault type: Lock contention avalanche

The client at 140.205.XXX.XXX initiated high-concurrency Data Manipulation Language (DML) operations on the perf_test database — likely involving hot spot data updates or large-scale transaction processing. Without connection limits or lock wait timeout controls, new connections flooded in as each one entered the lock wait queue, creating a runaway lock contention cycle.

Solutions

Immediate measures:

  1. Limit connections from the problematic client:

    ALTER ROLE target_user CONNECTION LIMIT 10;   -- target_user: the database username
  2. Terminate long-waiting sessions:

    -- Review the sessions before terminating them
    SELECT
        pid,                                    -- Process ID (session identifier)
        usename,                                -- Database username
        state,                                  -- Session state (active or idle)
        wait_event,                             -- Specific wait event type
        now() - query_start AS query_duration,  -- Duration of the current query
        left(query, 50) AS query_preview        -- Preview of the SQL statement (first 50 characters)
    FROM pg_stat_activity
    WHERE datname = 'perf_test'                 -- Target database
      AND client_addr = '140.205.XXX.XXX'       -- Source client IP address
      AND state = 'active'
      AND wait_event_type = 'Lock'
      AND pid <> pg_backend_pid()               -- Exclude the current session
      AND now() - query_start > interval '5 minutes';
    
    -- After confirming the list, terminate the sessions
    SELECT pg_terminate_backend(pid)
    FROM pg_stat_activity
    WHERE datname = 'perf_test'
      AND client_addr = '140.205.XXX.XXX'
      AND state = 'active'
      AND wait_event_type = 'Lock'
      AND pid <> pg_backend_pid()
      AND now() - query_start > interval '5 minutes';
    
    -- Verify that the target sessions are terminated
    SELECT pid, usename, state, query
    FROM pg_stat_activity
    WHERE datname = 'perf_test'
      AND client_addr = '140.205.XXX.XXX';

Long-term optimizations:

  1. Deploy a connection pool: Use a connection pooler such as PgBouncer to cap the maximum number of concurrent connections.

  2. Configure timeout parameters:

    -- Set the lock wait timeout
    ALTER DATABASE perf_test SET lock_timeout = '30s';
    -- Set the statement execution timeout
    ALTER DATABASE perf_test SET statement_timeout = '60s';
  3. Optimize application logic:

    • Reduce transaction granularity to avoid long-running transactions.

    • Use optimistic locking or distributed lock mechanisms for hot spot data.

    • Implement read/write splitting to route read-only queries to read-only instances.

  4. Set monitoring thresholds: Configure alerts for active connection counts and lock wait times to catch problems before they escalate.