All Products
Search
Document Center

AnalyticDB:Configure regular vacuum tasks

Last Updated:Mar 28, 2026

AnalyticDB for PostgreSQL uses multi-version concurrency control (MVCC), which means deleted or updated rows remain physically on disk even after they are no longer visible to new transactions. Over time, this dead data — known as dirty data — accumulates and wastes disk storage. The auto-vacuum feature handles most of this automatically. In most cases, you do not need to configure regular vacuum tasks. If your workload involves heavy UPDATE or DELETE operations, or you want a predictable cleanup schedule, configure regular vacuum tasks manually.

Background

Every row-level UPDATE or DELETE marks the old row version for deletion but leaves it on disk. The same applies to transaction logs and temporary files. If left unchecked, these accumulate and can cause disk exhaustion and degraded query performance.

Vacuum tasks cover four areas:

  1. Delete temporary files — Remove transaction logs and backup files no longer in use.

  2. Reclaim space from deleted rows — Physically remove rows that are marked for deletion.

  3. Optimize table storage — Use partitioned tables and indexes to reduce disk usage and improve access efficiency.

  4. Manage backups — Run periodic backups to prevent data loss, and remove outdated backup files to free storage.

Choose a vacuum approach

Two approaches are available, depending on whether you can tolerate table locks.

Without table locksWithin a maintenance window
SQLVACUUMVACUUM FULL + REINDEX
Table availabilityTables remain readable and writableTables are locked during execution
Resource impactIncreased CPU utilization and I/OIncreased CPU utilization and I/O
Recommended frequencyDaily to weekly (see below)At least weekly; daily if most data changes daily

Use standard VACUUM as your default. It runs without locks and is safe during production hours. Reserve VACUUM FULL for maintenance windows when you need to reclaim the maximum disk space — it acquires an exclusive lock and rewrites the entire table.

Clean up dirty data without locking tables

When you perform UPDATE or DELETE operations on a table, only the affected portion of the table is locked rather than the entire table. This approach improves concurrency performance and reduces response time, though it causes dirty data to accumulate over time.

Run VACUUM as the database owner on each database. Tables remain readable and writable throughout, though CPU utilization and I/O will increase temporarily.

Recommended frequency:

  • Heavy real-time writes (large numbers of INSERT VALUES, UPDATE, or DELETE statements) — once a day, or at minimum once a week.

  • Batch updates once a day — once a week, or at minimum once a month.

Execution methods:

  • pg_cron (recommended) — Use the pg_cron extension to schedule VACUUM as a database-native cron job. See pg_cron.

  • Shell script via crontab — Run the following script as a scheduled crontab task. It queries pg_stat_database to get all databases, skips system databases (template0, template1, postgres, gpdb), and runs VACUUM on each.

#!/bin/bash
export PGHOST=myinst.gpdb.rds.tbsite.net
export PGPORT=3432
export PGUSER=myuser
export PGPASSWORD=mypass
#do not echo command, just get a list of db
dblist=`psql -d postgres -c "copy (select datname from pg_stat_database) to stdout"`
for db in $dblist ; do
    #skip the system databases
    if [[ $db == template0 ]] ||  [[ $db == template1 ]] || [[ $db == postgres ]] || [[ $db == gpdb ]] ; then
        continue
    fi
    echo processing $db
    #vacuum all tables (catalog tables/user tables)
    psql -d $db -e -a -c "VACUUM;"
done

Replace PGHOST, PGPORT, PGUSER, and PGPASSWORD with your instance connection details.

Clean up dirty data within a maintenance window

Run VACUUM FULL and REINDEX as the database owner, who must have full permissions on all objects. Tables are locked during execution and cannot be read or written.

Important

Avoid accessing the database while VACUUM FULL is running.

Steps (run on each database):

  1. Rebuild system catalog indexes:

    REINDEX SYSTEM <database_name>;
  2. Reclaim space on each data table:

    VACUUM FULL <table_name>;

    For column-oriented tables, also run REINDEX TABLE <table_name> after VACUUM FULL.

  3. If you create and drop system tables or indexes at high frequency, run VACUUM FULL periodically on pg_class, pg_attribute, and pg_index.

Recommended frequency: At least once a week. If most of your data changes every day, run these steps daily.

Execution methods:

  • pg_cron (recommended) — Use the pg_cron extension to schedule VACUUM FULL and REINDEX as a database-native cron job. See pg_cron.

  • Shell script via crontab — Run the following script as a scheduled crontab task. It runs a standard VACUUM, rebuilds system catalog indexes with REINDEX SYSTEM, queries all non-partition user tables (excluding partition children via a pg_partitions join), and then runs VACUUM FULL and REINDEX TABLE on each.

#!/bin/bash
export PGHOST=myinst.gpdb.rds.tbsite.net
export PGPORT=3432
export PGUSER=myuser
export PGPASSWORD=mypass
#do not echo command, just get a list of db
dblist=`psql -d postgres -c "copy (select datname from pg_stat_database) to stdout"`
for db in $dblist ; do
    #skip system databases
    if [[ $db == template0 ]] ||  [[ $db == template1 ]] || [[ $db == postgres ]] || [[ $db == gpdb ]] ; then
        continue
    fi
    echo processing db "$db"
    #do a normal vacuum
    psql -d $db -e -a -c "VACUUM;"
    #reindex system tables firstly
    psql -d $db -e -a -c "REINDEX SYSTEM $db;"
    #use a temp file to store the table list, which could be vary large
    cp /dev/null tables.txt
    #query out only the normal user tables, excluding partitions of parent tables
    psql -d $db -c "copy (select '\"'||tables.schemaname||'\".' || '\"'||tables.tablename||'\"' from (select nspname as schemaname, relname as tablename from pg_catalog.pg_class, pg_catalog.pg_namespace, pg_catalog.pg_roles where pg_class.relnamespace = pg_namespace.oid and pg_namespace.nspowner = pg_roles.oid and pg_class.relkind='r' and (pg_namespace.nspname = 'public' or pg_roles.rolsuper = 'false' ) ) as tables(schemaname, tablename) left join pg_catalog.pg_partitions on pg_partitions.partitionschemaname=tables.schemaname and pg_partitions.partitiontablename=tables.tablename where pg_partitions.partitiontablename is null) to stdout;" > tables.txt
    while read line; do
        #some table name may contain the $ sign, so escape it
        line=`echo $line |sed 's/\\\$/\\\\\\\$/g'`
        echo processing table "$line"
        #vacuum full this table, which will lock the table
        psql -d $db -e -a -c "VACUUM FULL $line;"
        #reindex the table to reclaim index space
        psql -d $db -e -a -c "REINDEX TABLE $line;"
    done <tables.txt
done

Replace PGHOST, PGPORT, PGUSER, and PGPASSWORD with your instance connection details.

What's next