Use the pg_squeeze extension to shrink bloated tables and indexes - ApsaraDB RDS

ApsaraDB RDS for PostgreSQL supports the pg_squeeze extension. The extension reorganizes and cleans up data in tables without affecting online read and write operations on your ApsaraDB RDS for PostgreSQL instance. This reduces the disk space that is occupied by tables and indexes and improves space usage.

Prerequisites

The following parameters are configured. For more information, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance.
- The wal_level parameter is set to logical.
- pg_squeeze is added to the value of the shared_preload_libraries parameter. This way, the pg_squeeze extension can be preloaded.
The disk space is sufficient. Additional disk space is required when the pg_squeeze extension shrinks bloated tables and indexes.
When you shrink a full table, the available disk space is at least twice the total size of the table and its indexes. For example, if the total size of the table and its indexes that you want to shrink is 1 GB, an additional 2 GB of disk space is required.
You RDS instance runs a minor engine version of 20240229 or later. For more information about how to view and update the minor engine version of an RDS instance, see Update the minor engine version.

Background information

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to allow multiple transactions to concurrently access the same data without conflicts. MVCC does not directly overwrite or delete the original data when the data is updated or deleted. MVCC marks the original data as a dead tuple. Dead tuples are invisible to new transactions, but continue to occupy disk space. As a result, table become bloated.

You can clean up dead tuples by using methods such as autovacuum. However, this method does not fully eliminate table bloat because the space previously occupied by the dead tuples is not immediately reclaimed. This also leads to table bloat when the amount of unused space is significant. PostgreSQL provides a VACUUM FULL statement to reclaim unused space. However, the VACUUM FULL statement requires Access Exclusive Lock, the most restrictive lock in PostgreSQL, on each table. As a result, tables become inaccessible during the vacuuming process.

Feature description

The pg_squeeze extension rebuilds a table by creating a new copy of the original table, which eliminates the dead tuples. This reclaims the unused disk space from the table and provides the following benefits:

Tables can be cleaned up online, and read and write operations are not blocked.
You can directly execute SQL statements without the need to use a client.

Scenarios

Tables whose data is frequently updated or deleted have a higher table bloat rate.

You can use the following SQL statements to estimate the table bloat rate:

CREATE EXTENSION pgstattuple; --create extension
select *, 1.0 - tuple_len::numeric / table_len as bloat from pgstattuple('your_relation');

Note

The table bloat rate is the ratio of unused space to total space in a table, which is calculated by using the following formula: 1 - Live tuple space/Total space of the table.
The execution of the preceding statements triggers a full table scan.

Usage notes

The table that you want to clean up must have a unique key.
A large amount of I/O resources are consumed during the cleanup process. We recommend that you clean up the table during off-peak hours.

Create or delete the extension

Execute the following statement to create the extension:
```
CREATE EXTENSION pg_squeeze;
```
Note
If you want to clean up tables in multiple databases, you must create the extension in each database.
Execute the following statement to delete the extension:
```
DROP EXTENSION pg_squeeze;
```

Examples

Before you clean up a table, you must create the extension in the destination database.

Temporarily clean up a table

In the following example, the bar table in the public schema is cleaned up.

SELECT squeeze.squeeze_table('public', 'bar');

Automatically detect and clean up a table

Create an automatic cleanup task.
In the following example, the extension is configured to clean up the foo table in the public schema at 22:30 every Wednesday and Friday.
Note
- For more information about the syntax that is used to create an automatic cleanup task, see pg_squeeze.
- Only tables in the current database can be cleaned up.
```
INSERT INTO squeeze.tables (tabschema, tabname, schedule)
VALUES ('public', 'foo', ('{30}', '{22}', NULL, NULL, '{3, 5}'));
```
Enable the automatic cleanup task.
- Manually enable an automatic cleanup task.
  Execute the following SQL statement to manually enable the automatic cleanup task. Only tables in the current database can be cleaned up.
```
SELECT squeeze.start_worker(); -- Enable
```
  Disable the automatic cleanup task.
```
SELECT squeeze.stop_worker();   -- Disable
```
- Automatically enable the automatic cleanup task.
  Configure parameters in the ApsaraDB RDS console and restart the database to automatically enable the cleanup task. For more information, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance.
  In the following example, the database1 and database2 databases are cleaned up by using the rds_superuser account. rds_superuser must be a privileged account.
  Note
  Before the cleanup task is performed, the pg_squeeze extension is separately created in the database1 and database2 databases. For more information, see Create and delete an extension.
```
squeeze.worker_autostart = 'database1 database2'
squeeze.worker_role = rds_superuser
```

References

For more information, see pg_squeeze.