All Products
Search
Document Center

PolarDB:Challenges and optimization solutions for VACUUM operations

Last Updated:Mar 07, 2025

VACUUM is a key database maintenance command in PolarDB for PostgreSQL. You can use VACUUM to reclaim storage space, prevent transaction ID wraparound, update statistics, and improve concurrency performance. Proper use of VACUUM can ensure the efficient operation and long-term stability of databases. However, you must exercise caution when you use the VACUUM command because the command can significantly affect database performance. Improper use of the command may lead to negative impacts. This topic describes the challenges in performing VACUUM operations and the corresponding optimization solutions in PolarDB for PostgreSQL.

Large table vacuuming

Challenges

In most cases, large database tables occupy multiple disk blocks. A large number of I/O operations are generated when scanning and vacuuming operations are performed on the tables. The performance of VACUUM may be limited and slowed down by various factors. If garbage data is not cleared at the earliest opportunity due to the slow speed of VACUUM, the bloated table results in an increased number of data pages that must be scanned. This prolongs the vacuuming duration.

The traditional methods used to address slow VACUUM on large tables include data distribution adjustment and parameter tuning. However, these methods provide limited improvements in vacuuming efficiency.

Optimization solutions

  • VACUUM in PolarDB for PostgreSQL is optimized by using an asynchronous prefetching method. During the vacuuming process, blocks that are sequentially read are grouped and sent in advance to concurrent I/O worker processes for prefetching. This method accelerates the vacuuming process by more than two-fold.

  • The system optimizes the dead tuple caching structure to enable faster querying and processing of dead tuples in the vacuuming process.

image

Transaction ID wraparound

Challenges

PostgreSQL uses 32-bit transaction IDs (XIDs) to implement multi-version concurrency control (MVCC). However, the 32-bit XID has a limited value range, with a maximum value of 4,294,967,295. PostgreSQL uses a transaction ID wraparound protection mechanism to prevent data unavailability caused by the exhaustion of XIDs. If the difference between the oldest active transaction and the newest transaction exceeds 2.1 billion (approximately half of the 32-bit XID space), the system enters the wraparound protection state. In this state, the database shuts down and rejects new write transactions. To resolve the issue, you must perform a VACUUM operation in single-user mode.

Optimization solutions

To effectively address the transaction ID wraparound issue, PolarDB for PostgreSQL provides custom parameters, including pre-trigger for wraparound, wraparound warning threshold, and superuser bypass parameter. PolarDB for PostgreSQL reserves transaction IDs to minimize the risk of downtime caused by the exhaustion of XIDs.

  • Pre-trigger for wraparound: PolarDB for PostgreSQL reserves a specific number of transaction IDs as a buffer. When the available transaction IDs are about to be exhausted, the system pre-triggers the wraparound process When the transaction ID usage reaches the reserved limit, the database stops allocating transaction IDs.

  • Wraparound warning threshold: the warning threshold for transaction ID usage. When transaction ID usage exceeds the threshold, the SQL operation that consumes transaction IDs triggers a WARNING-level alert.

  • Superuser bypass parameter: the number of additional transaction IDs that the superusers can use when transaction ID wraparound is triggered. In this case, superusers can continue using the transaction IDs but receive reminders to execute VACUUM FREEZE on the affected databases to reclaim transaction IDs.

Lock contention caused by VACUUM TRUNCATE

Challenges

VACUUM TRUNCATE is a mechanism in PostgreSQL that releases unused pages at the end of a table to the operating system. Although this mechanism can effectively reduce the disk space occupied by tables, it introduces significant challenges related to lock contention and blocking.

To truncate the unused pages at the end of a table, PostgreSQL applies an exclusive lock on the entire table. This results in the following issues:

  • Blocked read and write operations: During the locked period, all read (SELECT) and write operations are blocked.

  • Blocked DDL operations: DDL operations, such as adding indexes and altering the table schema, are also blocked.

In scenarios in which large tables are frequently written, the exclusive lock imposed by VACUUM TRUNCATE can cause significant disruptions. For mission-critical systems, even brief periods of blocking can lead to service outages or degraded user experience. The conventional optimization method is to adjust the TRUNCATE trigger parameters.

Optimization solutions

To address the lock contention issues caused by VACUUM TRUNCATE, PolarDB for PostgreSQL provides the following optimization solutions:

  • Optimized I/O read efficiency: PolarDB for PostgreSQL uses asynchronous parallel I/O to significantly reduce the time consumed by I/O operations during the TRUNCATE phase and improve the overall efficiency of the VACUUM TRUNCATE process.

  • Customizable TRUNCATE duration control: PolarDB for PostgreSQL allows you to specify custom durations for the TRUNCATE phase. You can configure the locking duration to ensure that the TRUNCATE operation does not conflict with critical business operations.

  • Configurable maintenance windows: PolarDB for PostgreSQL allows you to specify a maintenance window. You can schedule VACUUM TRUNCATE operations during off-peak hours. The system automatically performs VACUUM operations, including VACUUM TRUNCATE, during the maintenance window. This prevents negative impacts during peak business hours and improves system stability and reliability.