Bulk import operations on standard PostgreSQL are bottlenecked by per-row WAL writes, per-page storage extensions, and synchronous commit latency. PolarDB for PostgreSQL addresses these bottlenecks with a built-in bulk import infrastructure that bypasses buffer management, batches page writes, and makes transaction commit asynchronous — significantly reducing the time required to build large indexes, materialized views, and tables.
Supported versions
This feature requires one of the following minor engine versions:
| PostgreSQL version | Minimum minor engine version |
|---|---|
| PostgreSQL 18 | 2.0.18.0.1.0 |
| PostgreSQL 17 | 2.0.17.2.1.0 |
| PostgreSQL 16 | 2.0.16.6.2.0 |
| PostgreSQL 15 | 2.0.15.12.4.0 |
| PostgreSQL 14 | 2.0.14.13.28.0 |
To check your cluster's minor engine version, run SHOW polardb_version; or check the minor engine version in the PolarDB console.
If your cluster runs an earlier version:
PostgreSQL 15 and later: Upgrade the minor engine version.
PostgreSQL 14 and earlier: Use create or refresh materialized views for bulk writes.
Supported statements
The following statements use the bulk import infrastructure automatically:
| Statement | Notes |
|---|---|
CREATE INDEX | |
REINDEX | |
VACUUM FULL | |
ALTER TABLE SET TABLESPACE | |
CREATE MATERIALIZED VIEW | |
REFRESH MATERIALIZED VIEW | |
CREATE TABLE AS / SELECT INTO | Depends on wal_level; see note below |
CREATE TABLE ASandSELECT INTOuse the bulk import infrastructure only whenwal_levelis set belowlogical. Whenwal_levelislogical, these statements use the bulk insert interface of the Multi-Insert Table AM infrastructure instead.
How it works
When data is imported into new objects inside an uncommitted transaction, those objects are invisible to other processes. PolarDB exploits this property to apply four targeted optimizations across the import pipeline.
Bypassing buffer management
Because imported data stays invisible until the transaction commits, pages can be constructed directly in the process's private memory — without allocating space from the shared memory Buffer Pool. This eliminates lock contention on the Buffer Pool and speeds up page construction.
Bulk extending pages
Standard imports extend storage one page at a time, which accumulates significant I/O latency on cloud storage. PolarDB instead pre-aggregates multiple pages in private memory and flushes them to storage in a single bulk extension, spreading the latency cost across many pages at once.
Bulk recording WAL records
Standard heap table and index imports write one Write-Ahead Logging (WAL) record per row. PolarDB waits until a full page is constructed in private memory, then records the entire page as a single XLOG_FPI (Full Page Image) WAL record. Because XLOG_FPI records capture a complete page snapshot rather than incremental row-level changes, this approach is more compact and reduces WAL metadata overhead.
Asynchronous commit
When a transaction commits, all pages constructed in private memory must become durable. Rather than waiting for each page to reach storage before returning from commit, PolarDB copies the pages to the Buffer Pool and marks them as dirty immediately after bulk page construction completes. The commit returns quickly, and a background dirty-page flushing process writes the pages to storage asynchronously, in parallel with ongoing page construction.
Data consistency after a crash is guaranteed because XLOG_FPI WAL records are already written before the commit returns — recovery replays those records to restore all committed pages.
Configure polar_bulk_write_maxpages
The polar_bulk_write_maxpages parameter sets the maximum number of pages to accumulate in private memory before triggering a bulk flush. When the threshold is reached, the system performs a bulk page extension, writes WAL records in bulk, and copies the pages to the Buffer Pool.
SHOW polar_bulk_write_maxpages;
polar_bulk_write_maxpages
---------------------------
1MB
(1 row)The default value of 1MB is tuned for general workloads and does not need to be changed in most cases.