AnalyticDB for PostgreSQL uses multi-version concurrency control (MVCC), which means deleted or updated rows remain physically on disk even after they are no longer visible to new transactions. Over time, this dead data — known as dirty data — accumulates and wastes disk storage. The auto-vacuum feature handles most of this automatically. In most cases, you do not need to configure regular vacuum tasks. If your workload involves heavy UPDATE or DELETE operations, or you want a predictable cleanup schedule, configure regular vacuum tasks manually.
Background
Every row-level UPDATE or DELETE marks the old row version for deletion but leaves it on disk. The same applies to transaction logs and temporary files. If left unchecked, these accumulate and can cause disk exhaustion and degraded query performance.
Vacuum tasks cover four areas:
Delete temporary files — Remove transaction logs and backup files no longer in use.
Reclaim space from deleted rows — Physically remove rows that are marked for deletion.
Optimize table storage — Use partitioned tables and indexes to reduce disk usage and improve access efficiency.
Manage backups — Run periodic backups to prevent data loss, and remove outdated backup files to free storage.
Choose a vacuum approach
Two approaches are available, depending on whether you can tolerate table locks.
| Without table locks | Within a maintenance window | |
|---|---|---|
| SQL | VACUUM | VACUUM FULL + REINDEX |
| Table availability | Tables remain readable and writable | Tables are locked during execution |
| Resource impact | Increased CPU utilization and I/O | Increased CPU utilization and I/O |
| Recommended frequency | Daily to weekly (see below) | At least weekly; daily if most data changes daily |
Use standard VACUUM as your default. It runs without locks and is safe during production hours. Reserve VACUUM FULL for maintenance windows when you need to reclaim the maximum disk space — it acquires an exclusive lock and rewrites the entire table.
Clean up dirty data without locking tables
When you perform UPDATE or DELETE operations on a table, only the affected portion of the table is locked rather than the entire table. This approach improves concurrency performance and reduces response time, though it causes dirty data to accumulate over time.
Run VACUUM as the database owner on each database. Tables remain readable and writable throughout, though CPU utilization and I/O will increase temporarily.
Recommended frequency:
Heavy real-time writes (large numbers of INSERT VALUES, UPDATE, or DELETE statements) — once a day, or at minimum once a week.
Batch updates once a day — once a week, or at minimum once a month.
Execution methods:
pg_cron (recommended) — Use the
pg_cronextension to scheduleVACUUMas a database-native cron job. See pg_cron.Shell script via crontab — Run the following script as a scheduled
crontabtask. It queriespg_stat_databaseto get all databases, skips system databases (template0,template1,postgres,gpdb), and runsVACUUMon each.
#!/bin/bash
export PGHOST=myinst.gpdb.rds.tbsite.net
export PGPORT=3432
export PGUSER=myuser
export PGPASSWORD=mypass
#do not echo command, just get a list of db
dblist=`psql -d postgres -c "copy (select datname from pg_stat_database) to stdout"`
for db in $dblist ; do
#skip the system databases
if [[ $db == template0 ]] || [[ $db == template1 ]] || [[ $db == postgres ]] || [[ $db == gpdb ]] ; then
continue
fi
echo processing $db
#vacuum all tables (catalog tables/user tables)
psql -d $db -e -a -c "VACUUM;"
doneReplace PGHOST, PGPORT, PGUSER, and PGPASSWORD with your instance connection details.
Clean up dirty data within a maintenance window
Run VACUUM FULL and REINDEX as the database owner, who must have full permissions on all objects. Tables are locked during execution and cannot be read or written.
Avoid accessing the database while VACUUM FULL is running.
Steps (run on each database):
Rebuild system catalog indexes:
REINDEX SYSTEM <database_name>;Reclaim space on each data table:
VACUUM FULL <table_name>;For column-oriented tables, also run
REINDEX TABLE <table_name>afterVACUUM FULL.If you create and drop system tables or indexes at high frequency, run
VACUUM FULLperiodically onpg_class,pg_attribute, andpg_index.
Recommended frequency: At least once a week. If most of your data changes every day, run these steps daily.
Execution methods:
pg_cron (recommended) — Use the
pg_cronextension to scheduleVACUUM FULLandREINDEXas a database-native cron job. See pg_cron.Shell script via crontab — Run the following script as a scheduled
crontabtask. It runs a standardVACUUM, rebuilds system catalog indexes withREINDEX SYSTEM, queries all non-partition user tables (excluding partition children via apg_partitionsjoin), and then runsVACUUM FULLandREINDEX TABLEon each.
#!/bin/bash
export PGHOST=myinst.gpdb.rds.tbsite.net
export PGPORT=3432
export PGUSER=myuser
export PGPASSWORD=mypass
#do not echo command, just get a list of db
dblist=`psql -d postgres -c "copy (select datname from pg_stat_database) to stdout"`
for db in $dblist ; do
#skip system databases
if [[ $db == template0 ]] || [[ $db == template1 ]] || [[ $db == postgres ]] || [[ $db == gpdb ]] ; then
continue
fi
echo processing db "$db"
#do a normal vacuum
psql -d $db -e -a -c "VACUUM;"
#reindex system tables firstly
psql -d $db -e -a -c "REINDEX SYSTEM $db;"
#use a temp file to store the table list, which could be vary large
cp /dev/null tables.txt
#query out only the normal user tables, excluding partitions of parent tables
psql -d $db -c "copy (select '\"'||tables.schemaname||'\".' || '\"'||tables.tablename||'\"' from (select nspname as schemaname, relname as tablename from pg_catalog.pg_class, pg_catalog.pg_namespace, pg_catalog.pg_roles where pg_class.relnamespace = pg_namespace.oid and pg_namespace.nspowner = pg_roles.oid and pg_class.relkind='r' and (pg_namespace.nspname = 'public' or pg_roles.rolsuper = 'false' ) ) as tables(schemaname, tablename) left join pg_catalog.pg_partitions on pg_partitions.partitionschemaname=tables.schemaname and pg_partitions.partitiontablename=tables.tablename where pg_partitions.partitiontablename is null) to stdout;" > tables.txt
while read line; do
#some table name may contain the $ sign, so escape it
line=`echo $line |sed 's/\\\$/\\\\\\\$/g'`
echo processing table "$line"
#vacuum full this table, which will lock the table
psql -d $db -e -a -c "VACUUM FULL $line;"
#reindex the table to reclaim index space
psql -d $db -e -a -c "REINDEX TABLE $line;"
done <tables.txt
doneReplace PGHOST, PGPORT, PGUSER, and PGPASSWORD with your instance connection details.