When you perform system update operations such as INSERT VALUES, UPDATE, DELETE, and ALTER TABLE ADD COLUMN, junk data is retained in the system table and updated data table. The junk data diminishes the system performance and consumes a large amount of disk space. Therefore, we recommend that you clear such data on a regular basis. This topic describes how to clear junk data for AnalyticDB for PostgreSQL.
Clear junk data without locking tables
You can clear some junk data without locking their corresponding tables by using the following method:
-
Log on to each database as the database owner, and execute the
VACUUM
statement. -
Frequency:
- When you perform real-time update operations, such as INSERT VALUES, UPDATE, and DELETE, we recommend that you execute the VACUUM statement once a day or at least once a week.
- If you only perform batch updates once a day, we recommend that you execute the VACUUM statement once a week or at least once a month.
-
The system does not lock tables on which the VACUUM statement is being executed. You can read data from the tables or write data into them when the VACUUM statement is executed. However, this operation increases CPU utilization and I/O usage, and may affect query performance.
-
You can run the following Linux Shell script file as a scheduled crontab task:
#! /bin/bash
export PGHOST=myinst.gpdb.rds.tbsite.net
export PGPORT=3432
export PGUSER=myuser
export PGPASSWORD=mypass
#do not echo command, just get a list of db
dblist=`psql -d postgres -c "copy (select datname from pg_stat_database) to stdout"`
for db in $dblist ; do
#skip the system databases
if [[ $db == template0 ]] || [[ $db == template1 ]] || [[ $db == postgres ]] || [[ $db == gpdb ]] ; then
continue
fi
echo processing $db
#vacuum all tables (catalog tables/user tables)
psql -d $db -e -a -c "VACUUM;"
done
Clear junk data during a maintenance window
You can execute the VACUUM FULL statement to clear all junk data in all tables during a maintenance window by using the following method:
-
Log on to each database as the database owner. The database owner must have full permissions.
- Execute the
REINDEX SYSTEM <database name>
statement. - Execute the
VACUUM FULL <table name>
andREINDEX TABLE <table name>
statements on each data table. - If you create and delete system tables and indexes at a high frequency, we recommend that you execute the VACUUM FULL <table name> statement to maintain the tables on a regular basis. The system tables include pg_class, pg_attribute, and pg_index. Note: We recommend that you do not access the database when you execute the VACUUM FULL <table name> statement.
- Execute the
-
The VACUUM FULL statement must be executed at least once a week. If the majority of your data is updated every day, execute the VACUUM FULL statement once a day.
-
The system locks tables on which the VACUUM FULL or REINDEX statement is being executed and these tables cannot be read or written during the execution. This operation increases CPU utilization and I/O usage.
-
You can run the following Linux Shell script file as a scheduled crontab task:
#! /bin/bash
export PGHOST=myinst.gpdb.rds.tbsite.net
export PGPORT=3432
export PGUSER=myuser
export PGPASSWORD=mypass
#do not echo command, just get a list of db
dblist=`psql -d postgres -c "copy (select datname from pg_stat_database) to stdout"`
for db in $dblist ; do
#skip system databases
if [[ $db == template0 ]] || [[ $db == template1 ]] || [[ $db == postgres ]] || [[ $db == gpdb ]] ; then
continue
fi
echo processing db "$db"
#do a normal vacuum
psql -d $db -e -a -c "VACUUM;"
#reindex system tables firstly
psql -d $db -e -a -c "REINDEX SYSTEM $db;"
#use a temp file to store the table list, which could be vary large
cp /dev/null tables.txt
#query out only the normal user tables, excluding partitions of parent tables
psql -d $db -c "copy (select '\"'||tables.schemaname||'\".' || '\"'||tables.tablename||'\"' from (select nspname as schemaname, relname as tablename from pg_catalog.pg_class, pg_catalog.pg_namespace, pg_catalog.pg_roles where pg_class.relnamespace = pg_namespace.oid and pg_namespace.nspowner = pg_roles.oid and pg_class.relkind='r' and (pg_namespace.nspname = 'public' or pg_roles.rolsuper = 'false' ) ) as tables(schemaname, tablename) left join pg_catalog.pg_partitions on pg_partitions.partitionschemaname=tables.schemaname and pg_partitions.partitiontablename=tables.tablename where pg_partitions.partitiontablename is null) to stdout;" > tables.txt
while read line; do
#some table name may contain the $ sign, so escape it
line=`echo $line |sed 's/\\\$/\\\\\\\$/g'`
echo processing table "$line"
#vacuum full this table, which will lock the table
psql -d $db -e -a -c "VACUUM FULL $line;"
#reindex the table to reclaim index space
psql -d $db -e -a -c "REINDEX TABLE $line;"
done <tables.txt
done