pg_profile (Resource Monitoring) - PolarDB - Alibaba Cloud Documentation Center

pg_profile is a third-party extension supported by PolarDB for PostgreSQL. It collects statistics on resource-intensive activities in a target database to help you analyze and optimize the database. The extension is based on PostgreSQL statistical views and several community extensions. It is written entirely in PL/pgSQL and requires no external libraries or software. Use it with pg_cron to schedule periodic sampling and diagnose historical workloads — for example, to answer questions like "what was causing the slowdown two hours ago?"

Supported versions

PostgreSQL version	Minimum minor engine version
PostgreSQL 16	2.0.16.9.9.0
PostgreSQL 15	2.0.15.14.6.0
PostgreSQL 14	2.0.14.10.18.0

View the minor engine version in the console or by running SHOW polardb_version;. If your cluster does not meet the version requirement, upgrade the minor engine version.

How it works

pg_profile uses two dedicated databases within the same cluster:

Server database: The database being monitored. It has the statistics extensions installed and accepts sampling connections.
Profile database: The database where pg_profile is installed. It connects to the server database over dblink, pulls the latest statistics, and stores them locally.

Because of the dblink dependency, both databases must reside in the same PolarDB for PostgreSQL cluster.

Once you have two or more samples, call get_report() to generate an HTML report that summarizes resource-intensive SQL statements and objects between any two sample points.

Prerequisites

Before you begin, make sure you have:

A PolarDB for PostgreSQL cluster running a supported version
A privileged account for the cluster
pg_stat_statements added to the shared_preload_libraries parameter (added by default)

To check or modify the shared_preload_libraries parameter, see Configure cluster parameters. Modifying this parameter restarts the cluster. Plan your operations accordingly.

Optional: Install pg_stat_kcache to collect more information. Without it, pg_profile still works but reports fewer metrics.

Usage notes

Each sampling operation takes several seconds. Take samples no more than once or twice per hour to avoid excessive overhead.
Sample data consumes disk space. pg_profile has an automatic cleanup mechanism, but monitor your data retention to avoid unexpected disk usage.
At least two samples are required before you can generate a report.

Set up pg_profile

Step 1: Create the server database

Connect to your cluster and create the server database:

CREATE DATABASE server;

Connect to the server database and install the statistics extensions:

\c server
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION IF NOT EXISTS pg_stat_kcache;

pg_stat_kcache is optional. Omit it if you do not need the additional metrics it provides in your reports.

Step 2: Create the profile database

Create the profile database:

CREATE DATABASE profile;

Connect to the profile database and install pg_profile:

\c profile
CREATE EXTENSION IF NOT EXISTS pg_profile CASCADE;

Step 3: Create a sampling connection

Run the following from the profile database to register the server database as a monitoring target:

SELECT create_server(
   'server',
   'dbname=server channel_name=localhost user=<privileged_account_of_the_server_database> password=<password_for_the_account>'
);

Parameter	Example	Description
`server` (first argument)	`server`	A user-defined name for this connection.
`dbname`	`server`	The name of the server database.
`channel_name`	`localhost`	The connection channel. Only `localhost` is supported.
`user`	`test_user`	A privileged account for the cluster where the server database resides.
`password`	`T123456!`	The password for the account.

The expected output is similar to:

                                    show_servers
-------------------------------------------------------------------------------------
 (server,"dbname=server channel_name=localhost user=test_user password=****",t,)
(1 row)

To update the connection information later, use set_server_connstr():

SELECT set_server_connstr(
   'server',
   'dbname=server channel_name=localhost user=<privileged_account_of_the_profile_database> password=<password_for_the_account>'
);

Do not change the first argument (server). It identifies the existing connection by name.

Step 4: Take an initial sample

Run the following from the profile database to take the first sample:

SELECT take_sample();

The output has three columns: the server name, the result (OK or an error message), and the elapsed time:

      take_sample
-------------------------
 (server,OK,00:00:02.81)
(1 row)

Step 5: Generate a workload and take a second sample

To capture a meaningful workload difference between samples, run a stress test on the server database. The following example uses pgbench:

pgbench -s 100 -i server -U test_user -h pc-****.pg.rds.aliyuncs.com -p 5432

Parameter	Example	Description
`-s`	`100`	Scale factor. `100` creates 10,000,000 rows in the `pgbench_accounts` table.
`-i`	`server`	The database to initialize.
`-U`	`test_user`	A privileged account for the cluster where the server database resides.
`-h`	`pc-****.pg.rds.aliyuncs.com`	The endpoint of the cluster where the server database resides.
`-p`	`5432`	The port. Use the port shown in the console.

After the test completes, take a second sample from the profile database:

SELECT take_sample();

View all collected samples:

SELECT * FROM show_samples('server');

The expected output is similar to:

 sample |      sample_time       | sizes_collected | dbstats_reset | bgwrstats_reset | archstats_reset
--------+------------------------+-----------------+---------------+-----------------+-----------------
      1 | 2024-02-23 07:29:53+00 | t               |               |                 |
      2 | 2024-02-23 08:11:39+00 | t               |               |                 |

Step 6: Generate a report

Export the data between two samples to an HTML report using one of the following methods.

Method 1: Use the psql interactive command line to write the output to a local file.

\o report_1_2.html
SELECT get_report('server',1,2);

Method 2: Run psql from the command line to write the output directly to a file.

psql -Aqtc "SELECT get_report('server',1,2)" \
   -o report_server_1_2.html -d profile \
   -h <endpoint_of_the_profile_database_cluster> \
   -p <port_of_the_database_cluster> \
   -U <privileged_account_for_the_profile_database_cluster>

Replace the placeholders with the actual values for your profile database cluster.

What's next

Schedule periodic sampling: Use pg_cron to automatically take samples on a schedule. The recommended frequency is one to two times per hour.