Background information

After data in a table is deleted or updated, the physical storage layer does not delete the data but marks it as invisible. In this situation, a number of holes are left in data pages. When the system reads data, these holes are loaded with the data pages. This slows down data scans. Therefore, you must reclaim space regularly.

How to reclaim space

Use a VACUUM statement to rearrange the data in a table and reclaim disk space for better data read performance. The statement syntax is as follows:
VACUUM [FULL] [FREEZE] [VERBOSE] [table];

A VACUUM statement organizes data on data pages, whereas a VACUUM FULL statement moves data across data pages. A VACUUM operation runs faster. A VACUUM FULL operation reclaims more space but requests exclusive locks. We recommend that you perform a VACUUM operation on system tables once a week.

Precautions

VACUUM:

  • VACUUM statements cannot reclaim all space of a table.
  • For tables whose data is frequently updated in real time, execute a VACUUM statement once a day.
  • If data is updated in batches once a day, you can execute the statement once after the batch update is complete.
  • The system does not lock tables on which VACUUM statements are being executed. You can read data from the tables or write data into them while the statements are executed. However, VACUUM statements increase CPU utilization and I/O usage, which may affect query performance.

VACUUM FULL:

  • VACUUM FULL statements reclaim space by rearranging data in tables and can reclaim all space occupied by holes. For tables with most data updates, we recommend that you execute VACUUM FULL statements as soon as possible.
  • VACUUM FULL statements must be executed at least once a week. If the majority of your data is updated every day, execute a VACUUM FULL statement once a day.
  • The system locks tables on which VACUUM FULL statements are being executed. You can neither read data from the tables nor write data into them while the statements are executed. VACUUM FULL statements also increase CPU utilization and I/O usage. We recommend that you execute VACUUM FULL statements in the maintenance window.

Query the tables for which VACUUM statements are required

AnalyticDB for PostgreSQL provides a gp_bloat_diag view to measure the ratio of actual pages to expected pages. After you use an ANALYZE statement to collect statistics on tables, execute the following statement to check this view:

gpadmin=# SELECT * FROM gp_toolkit.gp_bloat_diag;
bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages |                bdidiag                
----------+------------+------------+-------------+-------------+---------------------------------------
    21488 | public     | t1         |          97 |           1 | significant amount of bloat suspected
(1 row)

The results only include tables with moderate or significant data bloat. If the ratio of actual pages to expected pages is greater than 4 but less than 10, moderate data bloat is reported. If the ratio is greater than 10, significant data bloat is reported. For these tables, we recommend that you use VACUUM FULL statements to reclaim space.

When to use VACUUM FREEZE

Each transaction executed by AnalyticDB for PostgreSQL has a unique transaction ID (XID). XIDs monotonically increase, with an upper limit of two billion.

If the XID of a database approaches the value of the xid_stop_limit-xid_warn_limit parameter (500000000 by default), AnalyticDB for PostgreSQL returns a warning to the SQL statement that executes the transaction that corresponds to the XID to ensure that the XID does not exceed the limit.
WARNING: database "database_name" must be vacuumed within number_of_transactions transactions

In this situation, you can manually execute a VACUUM FREEZE statement on the current database to reduce the XID.

If you ignore this warning and XIDs continue to increase to a value greater than the value of the xid_stop_limit parameter (1000000000 by default), AnalyticDB for PostgreSQL rejects new transactions and returns the following message:
FATAL: database is not accepting commands to avoid wraparound data loss in database "database_name"

In this situation, you must submit a ticket to contact Alibaba Cloud technical support and resolve the issue.