After you execute UPDATE or DELETE statements in a database, data that is deleted is labeled as invisible but remains as holes on data pages. As a result, a large amount of time is required to scan data because the holes are also scanned. You can reclaim table storage in a regular manner to improve data reading efficiency.
AnalyticDB for PostgreSQL provides the auto-vacuum feature to automatically perform dirty data cleanup operations on tables in the background.
To clean up table data at the earliest opportunity after you perform a large number of update or delete operations in AnalyticDB for PostgreSQL, you can also manually execute the
VACUUMstatement on a database or a table.
Storage reclamation methods
You can execute the following VACUUM statement to rearrange table data and reclaim table storage to improve data reading performance:
VACUUM [FULL] [FREEZE] [VERBOSE] [table];VACUUM: performs the VACUUM operation on all tables.VACUUM [table]: performs the VACUUM operation on a specific table to clean up dirty data and release the storage of dirty data in the column-oriented table or the holes at the end of the row-oriented table. This operation does not block read and write operations on the table.VACUUM FULL [table]: performs the VACUUM FULL operation on a specific table to release all storage of dirty data in the table. This operation applies for an exclusive lock to block read and write operations on the table.VACUUM FREEZE [table]: performs the VACUUM FREEZE operation on a specific table to reduce the XID age of the table. This operation does not block read and write operations on the table.VACUUM VERBOSE [table]: displays the logs of dirty data cleanup during the VACUUM operation.
Usage notes
By default, the auto-vacuum feature is enabled for AnalyticDB for PostgreSQL instances. The system automatically cleans up dirty data and reduces the XID age.
After you perform a large number of update or delete operations on a table, you can execute the
VACUUM FULLstatement to release all storage of dirty data in the table.ImportantThe
VACUUM FULLstatement applies for an exclusive lock to block read and write operations on the table.You can execute the
VACUUMstatement only by using the initial account or a privileged account that has the RDS_SUPERUSER permission. If you do not have the required permissions, theVACUUMstatement does not take effect.
Query tables that require the VACUUM operation
AnalyticDB for PostgreSQL provides the intelligent data bloat diagnostics feature. This feature helps you find bloated tables and query tables that require the VACUUM operation.
AnalyticDB for PostgreSQL uses a view named gp_bloat_diag to measure the ratio of actual pages to expected pages. You can execute the ANALYZE TABLE statement to collect statistics and check the gp_bloat_diag view.
SELECT * FROM gp_toolkit.gp_bloat_diag;The following result includes tables that have moderate or significant data bloat. You can execute the VACUUM FULL statement to reclaim the storage of bloated tables.
bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------------------------------------
21488 | public | t1 | 97 | 1 | significant amount of bloat suspected
(1 row)If the ratio of actual pages to expected pages (bdirelpages / bdiexppages) is greater than 4 and less than 10, the table is considered to have moderate data bloat. If the ratio is greater than 10, the table is considered to have significant data bloat.
Readable during storage reclamation
After you perform a large number of update or delete operations on a table, you can execute the VACUUM FULL statement to release all storage of dirty data in the table. However, the VACUUM FULL statement applies for an exclusive lock to block read and write operations on the table. This affects online business. To resolve this issue, the following section provides a cleanup method to ensure that tables remain readable during storage reclamation.
How it works
The essence of this method is to achieve storage reclamation by using data redistribution of tables. The system creates a temporary table with the same schema as the original table in the background, writes all data from the original table to the temporary table, swaps the metadata of the two tables, and then deletes the temporary table. Because data is completely rewritten, no holes exist in the new data. This way, storage is effectively reclaimed.
Scenarios
The auto-vacuum feature may fail to reclaim table storage in a timely manner due to the lock yielding mechanism. In this case, active maintenance is required.
If you execute the
VACUUM FULLstatement on a large table that has severe data bloat, an exclusive lock exists for an extended period of time and blocks read and write operations on the table. This affects online business.
Usage notes
This method generates a data replica during storage reclamation, which may result in an increase in disk usage within a short time.
This method rebuilds indexes in the background, which also blocks read and write operations during this period. Therefore, this method is not suitable for tables with large indexes, such as vector indexes.
Procedure
Create the following function in a database:
CREATE OR REPLACE FUNCTION reorganize_table(table_name regclass) RETURNS void AS $$ DECLARE distribution_policy text; alter_sql text; BEGIN SELECT pg_get_table_distributedby(table_name) INTO distribution_policy; alter_sql := format('ALTER TABLE ONLY %s SET WITH (REORGANIZE=true) %s READABLE', table_name, distribution_policy); EXECUTE alter_sql; END; $$ LANGUAGE plpgsql;Invoke the function to reclaim dirty data.
SELECT reorganize_table('schema_name.table_name');