When you perform system update operations such as INSERT VALUES, UPDATE, DELETE, and ALTER TABLE ADD COLUMN, junk data is retained in the system tables and updated data tables. Junk data can reduce system performance and consume a large amount of disk space. We recommend that you clear such data on a regular basis. This topic describes how to clear junk data for AnalyticDB for PostgreSQL.

Auto-vacuum

Auto-vacuum can automatically execute VACUUM statements. Auto-vacuum checks for tables that have a large number of insert, update, or delete operations. When necessary, auto-vacuum executes a VACUUM statement on the tables to vacuum junk data to accelerate queries. By default, when more than half of rows are deleted from a table, auto-vacuum executes a VACUUM statement on the table to vacuum junk data.

If an AnalyticDB for PostgreSQL instance has multiple coordinator nodes, auto-vacuum checks for change operations only on the primary coordinator node. Auto-vacuum cannot be triggered if change operations are performed on secondary coordinator nodes.

Note Auto-vacuum is available only for AnalyticDB for PostgreSQL 20210527 and later minor versions. For more information about how to upgrade the minor version, see Update the minor kernel version of an AnalyticDB for PostgreSQL instance.

Clear junk data without locking tables

You can clear some junk data without the need to lock their corresponding tables by using the following method:

  • Log on to each database as the database owner and execute the VACUUM statement.
  • Frequency:
    • When you perform real-time updates, such as INSERT VALUES, UPDATE, and DELETE, we recommend that you execute the VACUUM statement once a day or at least once a week.
    • If you perform batch updates once a day, we recommend that you execute the VACUUM statement once a week or at least once a month.
  • The system does not lock tables on which the VACUUM statement is being executed. These tables can be read and written during the execution. However, this operation increases CPU utilization and I/O usage, and may affect query performance.
  • You can run the following Linux shell script file as a scheduled crontab task:
#!/bin/bash
export PGHOST=myinst.gpdb.rds.tbsite.net
export PGPORT=3432
export PGUSER=myuser
export PGPASSWORD=mypass
#do not echo command, just get a list of db
dblist=`psql -d postgres -c "copy (select datname from pg_stat_database) to stdout"`
for db in $dblist ; do
    #skip the system databases
    if [[ $db == template0 ]] ||  [[ $db == template1 ]] || [[ $db == postgres ]] || [[ $db == gpdb ]] ; then
        continue
    fi
    echo processing $db
    #vacuum all tables (catalog tables/user tables)
    psql -d $db -e -a -c "VACUUM;"
done

Clear junk data during a maintenance window

You can execute the VACUUM FULL statement to clear all junk data from all tables during a maintenance window by using the following method:

  • Log on to each database as the database owner. The database owner must have full permissions.
    1. Execute the REINDEX SYSTEM <database name> statement.
    2. Execute the VACUUM FULL <table name> and REINDEX TABLE <table name> statements on each data table.
    3. If you create and delete system tables and indexes at a high frequency, we recommend that you execute the VACUUM FULL <table name> statement to maintain the tables on a regular basis. The system tables include pg_class, pg_attribute, and pg_index. Note: We recommend that you do not access the database when you execute the VACUUM FULL <table name> statement.
  • The VACUUM FULL statement must be executed at least once a week. If the majority of your data is updated every day, execute the VACUUM FULL statement once a day.
  • The system locks tables on which the VACUUM FULL or REINDEX statement is being executed. These tables cannot be read or written during the execution. This operation increases CPU utilization and I/O usage.
  • You can run the following Linux shell script file as a scheduled crontab task:
#!/bin/bash
export PGHOST=myinst.gpdb.rds.tbsite.net
export PGPORT=3432
export PGUSER=myuser
export PGPASSWORD=mypass
#do not echo command, just get a list of db
dblist=`psql -d postgres -c "copy (select datname from pg_stat_database) to stdout"`
for db in $dblist ; do
    #skip system databases
    if [[ $db == template0 ]] ||  [[ $db == template1 ]] || [[ $db == postgres ]] || [[ $db == gpdb ]] ; then
        continue
    fi
    echo processing db "$db"
    #do a normal vacuum
    psql -d $db -e -a -c "VACUUM;"
    #reindex system tables firstly
    psql -d $db -e -a -c "REINDEX SYSTEM $db;"
    #use a temp file to store the table list, which could be vary large
    cp /dev/null tables.txt
    #query out only the normal user tables, excluding partitions of parent tables
    psql -d $db -c "copy (select '\"'||tables.schemaname||'\".' || '\"'||tables.tablename||'\"' from (select nspname as schemaname, relname as tablename from pg_catalog.pg_class, pg_catalog.pg_namespace, pg_catalog.pg_roles where pg_class.relnamespace = pg_namespace.oid and pg_namespace.nspowner = pg_roles.oid and pg_class.relkind='r' and (pg_namespace.nspname = 'public' or pg_roles.rolsuper = 'false' ) ) as tables(schemaname, tablename) left join pg_catalog.pg_partitions on pg_partitions.partitionschemaname=tables.schemaname and pg_partitions.partitiontablename=tables.tablename where pg_partitions.partitiontablename is null) to stdout;" > tables.txt
    while read line; do
        #some table name may contain the $ sign, so escape it
        line=`echo $line |sed 's/\\\$/\\\\\\\$/g'`
        echo processing table "$line"
        #vacuum full this table, which will lock the table
        psql -d $db -e -a -c "VACUUM FULL $line;"
        #reindex the table to reclaim index space
        psql -d $db -e -a -c "REINDEX TABLE $line;"
    done <tables.txt
done