All Products
Search
Document Center

PolarDB:Bulk import infrastructure

Last Updated:Mar 28, 2026

Bulk import operations on standard PostgreSQL are bottlenecked by per-row WAL writes, per-page storage extensions, and synchronous commit latency. PolarDB for PostgreSQL addresses these bottlenecks with a built-in bulk import infrastructure that bypasses buffer management, batches page writes, and makes transaction commit asynchronous — significantly reducing the time required to build large indexes, materialized views, and tables.

Supported versions

This feature requires one of the following minor engine versions:

PostgreSQL versionMinimum minor engine version
PostgreSQL 182.0.18.0.1.0
PostgreSQL 172.0.17.2.1.0
PostgreSQL 162.0.16.6.2.0
PostgreSQL 152.0.15.12.4.0
PostgreSQL 142.0.14.13.28.0

To check your cluster's minor engine version, run SHOW polardb_version; or check the minor engine version in the PolarDB console.

If your cluster runs an earlier version:

Supported statements

The following statements use the bulk import infrastructure automatically:

StatementNotes
CREATE INDEX
REINDEX
VACUUM FULL
ALTER TABLE SET TABLESPACE
CREATE MATERIALIZED VIEW
REFRESH MATERIALIZED VIEW
CREATE TABLE AS / SELECT INTODepends on wal_level; see note below
CREATE TABLE AS and SELECT INTO use the bulk import infrastructure only when wal_level is set below logical. When wal_level is logical, these statements use the bulk insert interface of the Multi-Insert Table AM infrastructure instead.

How it works

When data is imported into new objects inside an uncommitted transaction, those objects are invisible to other processes. PolarDB exploits this property to apply four targeted optimizations across the import pipeline.

Bypassing buffer management

Because imported data stays invisible until the transaction commits, pages can be constructed directly in the process's private memory — without allocating space from the shared memory Buffer Pool. This eliminates lock contention on the Buffer Pool and speeds up page construction.

Bulk extending pages

Standard imports extend storage one page at a time, which accumulates significant I/O latency on cloud storage. PolarDB instead pre-aggregates multiple pages in private memory and flushes them to storage in a single bulk extension, spreading the latency cost across many pages at once.

Bulk recording WAL records

Standard heap table and index imports write one Write-Ahead Logging (WAL) record per row. PolarDB waits until a full page is constructed in private memory, then records the entire page as a single XLOG_FPI (Full Page Image) WAL record. Because XLOG_FPI records capture a complete page snapshot rather than incremental row-level changes, this approach is more compact and reduces WAL metadata overhead.

Asynchronous commit

When a transaction commits, all pages constructed in private memory must become durable. Rather than waiting for each page to reach storage before returning from commit, PolarDB copies the pages to the Buffer Pool and marks them as dirty immediately after bulk page construction completes. The commit returns quickly, and a background dirty-page flushing process writes the pages to storage asynchronously, in parallel with ongoing page construction.

Data consistency after a crash is guaranteed because XLOG_FPI WAL records are already written before the commit returns — recovery replays those records to restore all committed pages.

Configure polar_bulk_write_maxpages

The polar_bulk_write_maxpages parameter sets the maximum number of pages to accumulate in private memory before triggering a bulk flush. When the threshold is reached, the system performs a bulk page extension, writes WAL records in bulk, and copies the pages to the Buffer Pool.

SHOW polar_bulk_write_maxpages;
 polar_bulk_write_maxpages
---------------------------
 1MB
(1 row)

The default value of 1MB is tuned for general workloads and does not need to be changed in most cases.

Related topics