How to use cross-machine parallel acceleration to create materialized views - PolarDB

This topic describes the cross-node parallel acceleration feature for creating materialized views.

Scope

This feature is available for the following versions of PolarDB for PostgreSQL:

PostgreSQL 14 with minor engine version 2.0.14.6.6.0 or later
PostgreSQL 11 with minor engine version 2.0.11.9.30.0 or later

Note

You can view the minor engine version number in the console or by running the SHOW polardb_version; statement. If the minor engine version does not meet the requirement, upgrade the minor engine version.

Background information

A materialized view is a database object that contains query results. Unlike a regular view, a materialized view stores both its definition and a copy of the data from when it was created. If the data in the materialized view is inconsistent with the data in the view definition, you can refresh the materialized view to maintain consistency. A materialized view essentially pre-computes the query in its definition for later reuse.

The CREATE TABLE AS syntax creates a new table from the results of a query. The schema of the new table is identical to the output columns of the query.
The SELECT INTO syntax creates a new table and writes the query results into it, instead of returning the data to the client. The schema of the new table is identical to the output columns of the query.

How it works

The PostgreSQL kernel uses the same logic to process the creation and refresh of materialized views, and statements that use the CREATE TABLE AS and SELECT INTO syntax. This is because these operations all perform similar steps at the database level. The process consists of the following two main steps:

Data scan: Executes the query defined in the view, CREATE TABLE AS statement, or SELECT INTO statement to scan for matching data.
Data write: Writes the scanned data to a new materialized view or table.

PolarDB for PostgreSQL introduces an optimization for the data scan step: ePQ parallel scans. When a large volume of data needs to be scanned, this optimization significantly improves the performance of Data Definition Language (DDL) statements and reduces their running time:

ePQ parallel scans: The ePQ feature executes queries in view definitions in parallel using the I/O bandwidth and computing resources of multiple compute nodes. This improves the utilization of both computing resources and bandwidth.
Batch writing: Instead of writing each scanned tuple to a table or materialized view individually, tuples are accumulated in memory and then written in a single batch. This reduces the overhead of recording the write-ahead log (WAL) and lowers the frequency of page locking.

Usage

You can set the polar_px_enable_create_table_as parameter to enable or disable ePQ parallel scans to accelerate the query process in CREATE TABLE AS statements. The valid values are:

ON (Default): Enables ePQ parallel scans for CREATE TABLE AS statements.
OFF: Disables ePQ parallel scans for CREATE TABLE AS statements.

SET polar_px_enable_create_table_as = ON;

Note

This parameter takes effect only when the ePQ feature is enabled (the polar_enable_px parameter is set to ON).
Because of limitations in the ePQ feature, this optimization does not support the CREATE TABLE AS ... WITH OIDS syntax. For statements that use this syntax, the system falls back to the built-in PostgreSQL optimizer to generate an execution plan. The query is then run on a single node by the PostgreSQL executor.