edit-icon download-icon

Scheduled maintenance tasks

Last Updated: Jan 02, 2018

When an update operation (including INSERT VALUES, UPDATE, DELETE, and ALTER TABLE ADD COLUMN) is performed on a system, junk data that may no longer be used is left in the system table and the updated data table. This junk data reduces the system performance and takes up a large amount of disk space. We recommend that you clear such data on a regular basis by following the methods provided in this document.

Clear junk data without locking the table

You can clear some junk data without locking the table. The method is as follows.

  • Command: connect to every database, log on to the database as the database owner, and run the VACUUM command.

  • Frequency: at least once a day.

    • If data is updated in real time (that is, INSERT VALUES, UPDATE, and DELETE operations are performed continuously), we recommend that you run the VACUUM command once every two hours.
    • If data update is performed by bulk once a day, you can run the command once after the bulk update every day.
  • Impact to the system: no table is locked, and tables can be read and written to normally. But it may increase the CPU and I/O usage, and impact query performance.

  • Example: you can use the following Linux Shell script file and run it as a scheduled crontab task.

  1. #!/bin/bash
  2. export PGHOST=myinst.gpdb.rds.tbsite.net
  3. export PGPORT=3432
  4. export PGUSER=myuser
  5. export PGPASSWORD=mypass
  6. #do not echo command, just get a list of db
  7. dblist=`psql -d postgres -c "copy (select datname from pg_stat_database) to stdout"`
  8. for db in $dblist ; do
  9. #skip the system databases
  10. if [[ $db == template0 ]] || [[ $db == template1 ]] || [[ $db == postgres ]] || [[ $db == gpdb ]] ; then
  11. continue
  12. fi
  13. echo processing $db
  14. #vacuum all tables (catalog tables/user tables)
  15. psql -d $db -e -a -c "VACUUM;"
  16. done

Clear junk data during maintenance windows

You can clear all junk data during maintenance windows of services when services are suspended. The method is as follows.

  • Command: connect to every database, and log on to the database as the database owner (you must have the owner permission for all the operation objects).

    1. Run the REINDEX SYSTEM <database name> command.
    2. Run the VACUUM FULL <table name>, REINDEX TABLE <table name> command on every data table (non-system table).
  • Frequency: at least once per week. If almost all the data is updated every day, you can run the command once a day.

  • Impact to the system: the command locks tables for VACUUM FULL or REINDEX and these tables become not readable or writable. This may increase the CPU and I/O usage.

  • Example: you can use the following Linux Shell script file and run it as a scheduled crontab task.

  1. #!/bin/bash
  2. export PGHOST=myinst.gpdb.rds.tbsite.net
  3. export PGPORT=3432
  4. export PGUSER=myuser
  5. export PGPASSWORD=mypass
  6. #do not echo command, just get a list of db
  7. dblist=`psql -d postgres -c "copy (select datname from pg_stat_database) to stdout"`
  8. for db in $dblist ; do
  9. #skip system databases
  10. if [[ $db == template0 ]] || [[ $db == template1 ]] || [[ $db == postgres ]] || [[ $db == gpdb ]] ; then
  11. continue
  12. fi
  13. echo processing db "$db"
  14. #do a normal vacuum
  15. psql -d $db -e -a -c "VACUUM;"
  16. #reindex system tables firstly
  17. psql -d $db -e -a -c "REINDEX SYSTEM $db;"
  18. #use a temp file to store the table list, which could be vary large
  19. cp /dev/null tables.txt
  20. #query out only the normal user tables, excluding partitions of parent tables
  21. psql -d $db -c "copy (select '\"'||tables.schemaname||'\".' || '\"'||tables.tablename||'\"' from (select nspname as schemaname, relname as tablename from pg_catalog.pg_class, pg_catalog.pg_namespace, pg_catalog.pg_roles where pg_class.relnamespace = pg_namespace.oid and pg_namespace.nspowner = pg_roles.oid and pg_class.relkind='r' and (pg_namespace.nspname = 'public' or pg_roles.rolsuper = 'false' ) ) as tables(schemaname, tablename) left join pg_catalog.pg_partitions on pg_partitions.partitionschemaname=tables.schemaname and pg_partitions.partitiontablename=tables.tablename where pg_partitions.partitiontablename is null) to stdout;" > tables.txt
  22. while read line; do
  23. #some table name may contain the $ sign, so escape it
  24. line=`echo $line |sed 's/\\\$/\\\\\\\$/g'`
  25. echo processing table "$line"
  26. #vacuum full this table, which will lock the table
  27. psql -d $db -e -a -c "VACUUM FULL $line;"
  28. #reindex the table to reclaim index space
  29. psql -d $db -e -a -c "REINDEX TABLE $line;"
  30. done <tables.txt
  31. done
Thank you! We've received your feedback.